By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 11 • Date Nov. 2010

Filter Results

Displaying Results 1 - 15 of 15
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (41 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Enhancing the Performance of Symmetric-Key Cryptography via Instruction Set Extensions

    Page(s): 1505 - 1518
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (529 KB) |  | HTML iconHTML  

    In this paper, instruction set extensions for a reduced instruction set computer processor are presented to improve the software performance of the data encryption standard (DES), the triple DES, the international data encryption algorithm (IDEA), and the advanced encryption standard (AES) algorithms. The most computationally intensive operations of each algorithm are off-loaded to a set of newly defined instructions. The additional hardware required to support these instructions is integrated into the processor's data path. For each of the targeted algorithms, comparisons are presented between traditional software implementations and new implementations that take advantage of the extended instruction set architecture. Results show that the utilization of the proposed instructions significantly reduces program code size, and improves encryption and decryption throughput. Moreover, the additional hardware resources required to support the instruction set extensions increase the total area of the processor by less than 65%. Finally, it will be shown that the throughputs for triple DES, IDEA, and AES are approximately the same when accelerated via instruction set extensions. This allows for seamless and transparent algorithm agility as one algorithm may be easily replaced by another algorithm with minimal performance degradation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architectural Enhancement and System Software Support for Program Code Integrity Monitoring in Application-Specific Instruction-Set Processors

    Page(s): 1519 - 1532
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1309 KB) |  | HTML iconHTML  

    Program code in a computer system can be altered either by malicious security attacks or by various faults in microprocessors. At the instruction level, all code modifications are manifested as bit flips. In this paper, we present a generalized methodology for monitoring code integrity at run-time in application-specific instruction-set processors. We embed monitoring microoperations in machine instructions, so the processor is augmented with a hardware monitor automatically. The monitor observes the processor's execution trace at run-time, checks whether it aligns with the expected program behavior, and signals any mismatches. Since the monitor works at a level below the instructions, the monitoring mechanism cannot be bypassed by software or compromised by malicious users. We discuss the ability and limitation of such monitoring mechanism for detecting both soft errors and code injection attacks. We propose two different schemes for managing the monitor, the operating system (OS) managed and application controlled, and design the constituent components within the monitoring architecture. Experimental results show that with an effective hash function implementation, our microarchitectural support can detect program code integrity compromises at a high probability with small area overhead and little performance degradation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selection of a Fault Model for Fault Diagnosis Based on Unique Responses

    Page(s): 1533 - 1543
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (619 KB) |  | HTML iconHTML  

    In this paper, we describe a preprocessing step to fault diagnosis of an observed response obtained from a faulty chip. In this step, a fault model for diagnosing the observed response is selected. This step allows fault diagnosis to be performed based on a single fault model after identifying the most appropriate one. We describe a specific implementation of this preprocessing step based on what is referred to as the unique output response of a fault model. As an example, we apply it to the diagnosis of multiple stuck-at faults, selecting between single and double stuck-at faults as the fault model for diagnosis. Experimental results demonstrate improvements compared to diagnosis based on single stuck-at faults, and compared to diagnosis based on both single and double stuck-at faults. We also discuss the use of a subset of double stuck-at faults for diagnosis, and the application of the proposed preprocessing step with other fault models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A MIMO Decoder Accelerator for Next Generation Wireless Communications

    Page(s): 1544 - 1555
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (915 KB) |  | HTML iconHTML  

    In this paper, we present a multi-input-multi-output (MIMO) decoder accelerator architecture that offers versatility and reprogrammability while maintaining a very high performance-cost metric. The accelerator is meant to address the MIMO decoding bottlenecks associated with the convergence of multiple high-speed wireless standards onto a single device. It is scalable in the number of antennas, bandwidth, modulation format, and most importantly, present and emerging decoder algorithms. It features a Harvard-like architecture with complex vector operands and a deeply pipelined fixed-point complex arithmetic processing unit. When implemented on a Xilinx Virtex-4 LX200FF1513 field-programmable gate array (FPGA), the design occupied 43% of overall FPGA resources. The accelerator shows an advantage of up to three orders of magnitude (1000 times) in power-delay product for typical MIMO decoding operations relative to a general purpose DSP. When compared to dedicated application-specific IC (ASIC) implementations of mmse MIMO decoders, the accelerator showed a degradation of 340%-17%, depending on the actual ASIC being considered. In order to optimize the design for both speed and area, specific challenges had to be overcome. These include: definition of the processing units and their interconnection; proper dynamic scaling of the signal; and memory partitioning and parallelism. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variational Capacitance Extraction and Modeling Based on Orthogonal Polynomial Method

    Page(s): 1556 - 1566
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (826 KB) |  | HTML iconHTML  

    In this paper, we propose a novel statistical capacitance extraction method for interconnect conductors considering process variations. The new method is called statCap, where orthogonal polynomials are used to represent the statistical processes in a deterministic way. We first show how the variational potential coefficient matrix is represented in a first-order form using Taylor expansion and orthogonal decomposition. Then, an augmented potential coefficient matrix, which consists of the coefficients of the polynomials, is derived. After this, corresponding augmented system is solved to obtain the variational capacitance values in the orthogonal polynomial form. Finally, we present a method to extend statCap to the second-order form to give more accurate results without loss of efficiency compared to the linear models. We show the derivation of the analytic second-order orthogonal polynomials for the variational capacitance integral equations. Experimental results show that statCap is two orders of magnitude faster than the recently proposed statistical capacitance extraction method based on the spectral stochastic collocation approach and many orders of magnitude faster than the Monte Carlo method for several practical conductor structures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Variation-Tolerant Keeper Architecture for High-Performance Low-Power Wide Fan-In Dynamic or Gates

    Page(s): 1567 - 1577
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1166 KB) |  | HTML iconHTML  

    Dynamic gates have been excellent choice in the design of high-performance modules in modern microprocessors. The only limitation of dynamic gates is their relatively low noise margin compared to that of standard CMOS gates. Traditionally, this issue has been resolved by employing a pMOS keeper circuit that compensates for leakage current of the pull-down nMOS network. In the earlier technology nodes, the keeper circuit could improve reliability of the dynamic gates with minor performance penalty. However, aggressive scaling trends of CMOS technology along with increasing levels of process variations have reduced effectiveness of the traditional keeper approach. This is because to maintain an acceptable noise margin level in deep sub-100 nm technologies, large pMOS keepers must be employed, which generates substantial contention between the keeper and the pull-down network, and hence results in severe loss of performance and high power consumption. This problem is more severe in wide fan-in dynamic gates due to the large number of leaky nMOS devices connected to the dynamic node. In this paper, a novel variation-tolerant keeper architecture is proposed, which is capable of significantly reducing contention and improving performance and power consumption. Using circuit simulations, the overall improved characteristics of the proposed keeper are demonstrated in comparison to those of the traditional as well as several state-of-the-art keepers. The proposed keeper exhibits the lowest delay deviation under different levels of process variations. Also, it is shown that for an eight-input or gate, in presence of 15% Vth fluctuations, the proposed architecture can lead to 20%, 15%, and more than 40% reduction in power consumption, mean delay, and standard deviation of delay, respectively, when compared to the traditional keeper circuit. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Incremental Component Implementation Selection in System Synthesis

    Page(s): 1578 - 1589
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1044 KB) |  | HTML iconHTML  

    Incremental design methods can substantially improve products' time-to-market through efficient handling of engineering change orders (ECO). In this paper, we present a methodology for incrementally solving component implementation selection problem (CISP) in face of local or non-local perturbations. CISP, which refers to judicious selection of components implementation under system timing constraint, is a generic problem that implicitly or explicitly appears in many stages of CAD flow. For a commonly-used formulation of CISP, we discuss necessary and sufficient conditions for optimality of the solution. Based on the optimality conditions, we develop an algorithm that maintains both validity and optimality of a solution under incremental changes. We evaluated our approach by incrementally updating the threshold voltage assignment solution for a netlist going through engineering changes. On average, our method ran 283 times faster than the full solver, while delivering the same results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Yield-Driven Near-Threshold SRAM Design

    Page(s): 1590 - 1598
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (986 KB) |  | HTML iconHTML  

    Voltage scaling is desirable in static RAM (SRAM) to reduce energy consumption. However, commercial SRAM is susceptible to functional failures when VDD is scaled down. Although several published SRAM designs scale VDD to 200-300 mV, these designs do not sufficiently consider SRAM robustness, limiting them to small arrays because of yield constraints, and may not correctly target the minimum energy operation point. We examine the effects on area and energy for the differential 6T and 8T bit cells as VDD is scaled down, and the bit cells are either sized and doped, or assisted appropriately to maintain the same yield as with full VDD. SRAM robustness is calculated using importance sampling, resulting in a seven-order run-time improvement over Monte Carlo sampling. Scaling 6T and 8T SRAM VDD down to 500 mV and scaling 8T SRAM to 300 mV results in a 50% and 83% dynamic energy reduction, respectively, with no reduction in robustness and low area overhead, but increased leakage per bit. Using this information, we calculate the supply voltage for a minimum total energy operation (VMIN) based on activity factor and find that it is significantly higher for SRAM than for logic. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Don't-Care Gating (DCG) TCAM Design Used in Network Routing Table

    Page(s): 1599 - 1607
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1071 KB) |  | HTML iconHTML  

    This paper presents a low-power ternary content addressable memory (TCAM) design, in which we propose the don't-care gating (DCG) scheme that aims to reduce the TCAM power dissipated in the search-line (SL) switching activity. By exploiting the vertically continuous “don't-care” feature, the DCG scheme can effectively reduce the average SL power consumption per switch. In addition, we also develop the refined search enable (RSE) technique to eliminate the unnecessary SL switching activity in the quiet pattern. By reducing both the SL switching activity and the average switching power, the proposed design can minimize the TCAM SL power consumption. For a 128 × 32 TCAM, the best configuration we examined shows that when the gating granularity is 16, with a 1.3% search performance improvement, the DCG scheme combined with the RSE technique can achieve 72%~79% SL energy reduction. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Temperature-Insensitive Dual- V_{\rm th} Synthesis for Nanometer CMOS Technologies Under Inverse Temperature Dependence

    Page(s): 1608 - 1620
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1451 KB) |  | HTML iconHTML  

    With the scaling of CMOS technologies, the gap between nominal supply voltage and threshold voltage has decreased significantly. This trend is further amplified in low-power nanometer libraries, which feature cells with identical size and functionality, but different threshold voltages. As a consequence, different cells may have different delay behaviors as the temperature varies within a circuit. For instance, cells with low-threshold devices may experience an increase in delay when temperature increases, whereas cells using high-threshold devices may experience the opposite behavior. The latter effect, also known as inverse temperature dependence (ITD), poses new challenges to circuit designers. Besides making timing analysis more difficult, ITD has important and unforeseeable consequences for power-aware logic synthesis. This paper describes the impact that ITD may have on the design of nanometer circuits. We also provide a threshold voltage assignment algorithm for dual threshold voltage synthesis, which guarantees temperature-insensitive operation of the circuits, together with a significant reduction of both leakage and total power consumption. Experiments performed on a set of standard benchmarks show timing compliance at any operating temperature, and an average leakage reduction around 28% compared to circuits synthesized with a standard synthesis flow that does not take ITD into account. We also apply our proposed synthesis algorithm to a realistic case study consisting of a 32-bit, IEEE-754 floating point unit. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variable-Latency Adder (VL-Adder) Designs for Low Power and NBTI Tolerance

    Page(s): 1621 - 1624
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (406 KB) |  | HTML iconHTML  

    In this paper, we proposed a new adder design called variable-latency adder (VL-adder). This technique allows the adder to work at a lower supply voltage than that required by a conventional adder while maintaining the same throughput. The VL-adder design can be further modified to overcome the effects of negative bias temperature instability (NBTI) on circuit delay. By applying VL-adder concept to a 64-bit carry-select adder design, more than 40% energy saving is obtained when a similar throughput is maintained. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu