By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 7 • Date July 2011

Filter Results

Displaying Results 1 - 23 of 23
  • Table of contents

    Publication Year: 2011 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (45 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Publication Year: 2011 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Signal Acquisition of High-Speed Periodic Signals Using Incoherent Sub-Sampling and Back-End Signal Reconstruction Algorithms

    Publication Year: 2011 , Page(s): 1125 - 1135
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2990 KB) |  | HTML iconHTML  

    This paper presents a high-speed periodic signal acquisition technique using incoherent sub-sampling and back-end signal reconstruction algorithms. The signal reconstruction algorithms employ a frequency domain analysis for frequency estimation, and suppression of jitter-induced sampling noise. By switching the sampling rate of a digitizer, the analog frequency value of the sampled signal can be recovered. The proposed signal reconstruction uses incoherent sub-sampling to reduce hardware complexity. The results of simulation and hardware experiments indicate that the proposed signal reconstruction algorithms are able to reconstruct multi-tone high-speed periodic signals in the discrete time domain. The new signal acquisition technique simplifies signal acquisition hardware for testing and characterization of high-speed analog and digital signals. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Systematic Design of RSA Processors Based on High-Radix Montgomery Multipliers

    Publication Year: 2011 , Page(s): 1136 - 1146
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (896 KB) |  | HTML iconHTML  

    This paper presents a systematic design approach to provide the optimized Rivest-Shamir-Adleman (RSA) processors based on high-radix Montgomery multipliers satisfying various user requirements, such as circuit area, operating time, and resistance against side-channel attacks. In order to involve the tradeoff between the performance and the resistance, we apply four types of exponentiation algorithms: two variants of the binary method with/without Chinese Remainder Theorem (CRT). We also introduces three multiplier-based datapath-architectures using different intermediate data forms: 1) single form, 2) semi carry-save form, and 3) carry-save form, and combined them with a wide variety of arithmetic components. Their radices are parameterized from 28 to 2128. A total of 242 datapaths for 1024-bit RSA processors were obtained for each radix. The potential of the proposed approach is demonstrated through an experimental synthesis of all possible processors with a 90-nm CMOS standard cell library. As a result, the smallest design of 861 gates with 118.47 ms/RSA to the fastest design of 0.67 ms/RSA at 153thinspace 862 gates were obtained. In addition, the use of the CRT technique reduced the RSA operation time of the fastest design to 0.24 ms. Even if we employed the exponentiation algorithm resistant to typical side-channel attacks, the fastest design can perform the RSA operation in less than 1.0 ms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Delay-Based Dual-Rail Precharge Logic

    Publication Year: 2011 , Page(s): 1147 - 1153
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (557 KB) |  | HTML iconHTML  

    This paper investigates the design of a dual-rail precharge logic family whose power consumption is insensitive to unbalanced load conditions thus allowing adopting a semi-custom design flow (automatic place and route) without any constraint on the routing of the complementary wires. The proposed logic is based on a novel encoding concept where the information is represented in the time domain rather than in the spatial domain as in a standard dual-rail logic. In this work, a logic family which exploits the proposed concept has been implemented. Implementation details and simulation results are reported which show a power consumption independent of the sequence of processed data and routing capacitances. An improvement in the energy consumption balancing up to 50 times and an area reduction up to 60% with respect to the state of the art have been obtained. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Prediction and Comparison of High-Performance On-Chip Global Interconnection

    Publication Year: 2011 , Page(s): 1154 - 1166
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1741 KB) |  | HTML iconHTML  

    As process technology scales, numerous interconnect schemes have been proposed to mitigate the performance degradation caused by the scaling of on-chip global wires. In this paper, we review current on-chip global interconnect structures and develop simple models to analyze their architecture-level performance. We propose a general framework to design and optimize a new category of global interconnect based on on-chip transmission line (T-line) technology. We perform a group of experiments using six different global interconnection structures to discover their differences in terms of latency, energy per bit, throughput, area, and signal integrity over several technology nodes. Our results show that T-line structures have the potential to outperform conventional repeated RC wires at future technology nodes to achieve higher performance while using less power and improving the reliability of wire communication. Our results also show that on-chip equalization is helpful to improve throughput, signal integrity, and power efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Power Control Technique on Power-Gated Circuitries

    Publication Year: 2011 , Page(s): 1167 - 1180
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1128 KB) |  | HTML iconHTML  

    An adaptive power control (APC) system on power-gated circuitries is proposed. The core technique is a switching state determination mechanism as an alternative of critical path replicas. It is intrinsically tolerant of process, voltage, and temperature (PVT) variations because it directly monitors the behavior of VDDV node. The APC system includes a multi-mode power gating network, a voltage sensor, a variable threshold comparator, a slack detection block, and a bank of bidirectional shift registers. By dynamically configuring the size of power gating devices, an average of 56.5% unused slack resulted from worst case margins or input pattern change can be further utilized. A 32-64 bit multiply-accumulate (MAC) unit is fabricated using UMC 90-nm standard process CMOS technology as a test vehicle. The measurement results of test chips exhibit an average of 12.39% net power reduction. A 7.96× leakage reduction is reported by power gating the MAC unit. For the 32-bit multiplier of MAC, the area and power overhead of proposed APC system are 5% and 1.08%, respectively. Most of the overhead is contributed by power gating devices and their control signal buffers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IR-Drop Aware Clustering Technique for Robust Power Grid in FPGAs

    Publication Year: 2011 , Page(s): 1181 - 1191
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (754 KB) |  | HTML iconHTML  

    IR-drop management in the power supply network of a chip is one of the critical design challenges in nanometer VLSI circuits. Techniques developed for application-specific integrated circuits cannot be directly applied for IR drop management in field-programmable gate arrays (FPGAs) because of the programmable nature of FPGAs. This paper proposes a novel clustering technique for improving the supply voltage profile in power grid of FPGAs. The proposed clustering technique not only improves the minimum voltage at any node in the circuit, but also reduces the variance in supply voltage across the nodes in the power grid. Results indicate that a reduction of up to 36% in IR-drop and 27% in spatial Vdd variation can be achieved using the proposed clustering technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Impacts of NBTI/PBTI and Contact Resistance on Power-Gated SRAM With High- \kappa Metal-Gate Devices

    Publication Year: 2011 , Page(s): 1192 - 1204
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (805 KB) |  | HTML iconHTML  

    The threshold voltage (VTH) drifts induced by negative bias temperature instability (NBTI) and positive bias temperature instability (PBTI) weaken PFETs and high-k metal-gate NFETs, respectively. These long-term VTH drifts degrade SRAM cell stability, margin, and performance, and may lead to functional failure over the life of usage. Meanwhile, the contact resistance of CMOS device increases sharply with technology scaling, especially in SRAM cells with minimum size and/or sub-ground rule devices. The contact resistance, together with NBTI/PBTI, cumulatively worsens the SRAM stability, and leads to severe SRAM performance degradation. Furthermore, most state-of-the-art SRAMs are designed with power-gating structures to reduce leakage currents in Standby or Sleep mode. The power switches could suffer NBTI or PBTI degradation and have large contact resistances. This paper presents a comprehensive analysis on the impacts of NBTI and PBTI on power-gated SRAM arrays with high-k metal-gate devices and the combined effects with the contact resistance on SRAM cell stability, margin, and performance. NBTI/PBTI tolerant sense amplifier structures are also discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Electrical Model of Microcontrollers for the Prediction of Electromagnetic Emissions

    Publication Year: 2011 , Page(s): 1205 - 1217
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1653 KB) |  | HTML iconHTML  

    This work presents a new methodology to derive the equivalent circuit of the parasitic paths that propagate switching noise in mixed-signals integrated circuits and that usually lead to unintended crosstalk and electromagnetic emission issues. The methodology is based on small-signal analyses performed at the individual analog and digital microcontroller building block level, on electromagnetic simulations carried out to describe the power supply network at the chip- and at the package-level and on the analysis of the chip layout and process technology data to model the parasitic coupling paths through the semiconductor substrate. The electrical model of microcontrollers generated according to this methodology, consists of a low-complex and linear netlist that provides insight regarding the relationships between the design parameters and noise propagation paths in the early design phases, through simulations in a SPICE-like environment. The proposed approach does not rely on experimental test results. In this paper, the electric model of an 8 bit microcontroller, which validity has been proved by scattering parameter and conducted emission measurements, is derived. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A High Precision Fast Locking Arbitrary Duty Cycle Clock Synchronization Circuit

    Publication Year: 2011 , Page(s): 1218 - 1228
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3843 KB) |  | HTML iconHTML  

    This study proposes a high precision fast locking arbitrary duty cycle clock synchronization (HPCS) circuit. This HPCS is capable of synchronizing the external clock and the internal clock in three clock cycles. By using three innovative techniques, the proposed HPCS can also reduce the clock skew between the external clock and the internal clock in a chip. First, by modifying the mirror control circuit, the HPCS operates correctly with an arbitrary duty cycle (25%-75%) clock signal. Second, the HPCS works precisely and ignores the effect of output load changes by moving the measurement delay line beyond the output driver. Finally, the HPCS can enhance the resolution between the external clock and internal clock with a fine tuning structure. After phase locking, the maximum static phase error is less than 20 ps. The proposed chip is fabricated in a TSMC 130-nm CMOS process, and has an operating frequency range from 300 to 600 MHz. At 600 MHz, the power consumption and rms jitter are 2.4 mW and 3.06 ps, respectively. The active area of this chip is 0.3 × 0.13 mm2. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reduced-Complexity Decoder Architecture for Non-Binary LDPC Codes

    Publication Year: 2011 , Page(s): 1229 - 1238
    Cited by:  Papers (31)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (590 KB) |  | HTML iconHTML  

    Non-binary low-density parity-check (NB-LDPC) codes can achieve better error-correcting performance than binary LDPC codes when the code length is moderate at the cost of higher decoding complexity. The high complexity is mainly caused by the complicated computations in the check node processing and the large memory requirement. In this paper, a novel check node processing scheme and corresponding VLSI architectures are proposed for the Min-max NB-LDPC decoding algorithm. The proposed scheme first sorts out a limited number of the most reliable variable-to-check (v-to-c) messages, then the check-to-variable (c-to-v) messages to all connected variable nodes are derived independently from the sorted messages without noticeable performance loss. Compared to the previous iterative forward-backward check node processing, the proposed scheme not only significantly reduced the computation complexity, but eliminated the memory required for storing the intermediate messages generated from the forward and backward processes. Inspired by this novel c-to-v message computation method, we propose to store the most reliable v-to-c messages as “compressed” c-to-v messages. The c-to-v messages will be recovered from the compressed format when needed. Accordingly, the memory requirement of the overall decoder can be substantially reduced. Compared to the previous Min-max decoder architecture, the proposed design for a (837, 726) code over GF(25) can achieve the same throughput with only 46% of the area. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Adaptive High Speed Manipulation Architecture for Fast Variable Padding Frequency Domain Motion Estimation

    Publication Year: 2011 , Page(s): 1239 - 1248
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1208 KB) |  | HTML iconHTML  

    Motion estimation (ME) consumes up to 70% of the entire video encoder's computations and is, therefore, the main encoding-time consuming process. Discrete cosine transform (DCT)-based phase correlation along with dynamic padding (DP) are the recently evolved frequency domain ME (FDME) techniques that promise to efficiently reduce the computational complexity of the ME process. DP uses dynamic padding thresholds to select the proper search area size according to a pre-estimated set of motion vectors (MVs). The main drawbacks of using conventional DP in the frequency domain are two-fold. First, the dynamic thresholds need to be estimated in the pixel (IDCT) domain which increases complexity. Second, the mismatched transformed search area is formed from different successive transformed blocks, which would lead to an inaccurate ME if the search area is not manipulated. In this paper, an efficient low complexity algorithm and high speed architecture are proposed to implement an adaptive manipulation unit engine (MUE). The MUE, the main module of the FDME system, adaptively decides the padding size and forges a matched transformed search area from the successive transformed blocks. Additionally, the proposed utilized dynamic thresholds are efficiently estimated in the frequency domain (FD). The MUE architecture is presented with two different design implementations trading off the VLSI design parameters. Implementation and simulation results project that the proposed MUE, when integrated in a whole FDME system, can perform ME for 60 fps of 4CIF video at 172 MHz. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Buffer Controller-Based Multiple Processing Element Utilization for Dataflow Synthesis

    Publication Year: 2011 , Page(s): 1249 - 1262
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1306 KB) |  | HTML iconHTML  

    This paper presents an effective design methodology which maps a complex system represented as a dataflow graph to a reconfigurable target architecture having multi-core processors and programmable logics. In order to synchronize data transfers between two processing blocks mapped to different processors (alternatively, one block is mapped to a processor and the other is realized as a hardware), we propose a mapping methodology that exploits the buffer-based dataflow, a new representation technique for realizing data-centric applications in reconfigurable platforms. From the buffer-based dataflow and estimated execution times of functional blocks and data transfers, the proposed methodology creates a mapped partition and generates the template code which runs on the processors of the target platform. We also use a processor initiation scheme to prevent wrong operations from happening when actual execution takes longer than estimated. Our proposed mapping methodology and the generated template code are evaluated with the SystemC model and Xilinx ISE. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Hardware Implementation of a Run-Time Scheduler for Reconfigurable Systems

    Publication Year: 2011 , Page(s): 1263 - 1276
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1361 KB) |  | HTML iconHTML  

    New generation embedded systems demand high performance, efficiency, and flexibility. Reconfigurable hardware can provide all these features. However, the costly reconfiguration process and the lack of management support have prevented a broader use of these resources. To solve these issues we have developed a scheduler that deals with task-graphs at run-time, steering its execution in the reconfigurable resources while carrying out both prefetch and replacement techniques that cooperate to hide most of the reconfiguration delays. In our scheduling environment, task-graphs are analyzed at design-time to extract useful information. This information is used at run-time to obtain near-optimal schedules, escaping from local-optimum decisions, while only carrying out simple computations. Moreover, we have developed a hardware implementation of the scheduler that applies all the optimization techniques while introducing a delay of only a few clock cycles. In the experiments our scheduler clearly outperforms conventional run-time schedulers based on as-soon-as-possible techniques. In addition, our replacement policy, specially designed for reconfigurable systems, achieves almost optimal results both regarding reuse and performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discretized Network Flow Techniques for Timing and Wire-Length Driven Incremental Placement With White-Space Satisfaction

    Publication Year: 2011 , Page(s): 1277 - 1290
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (658 KB) |  | HTML iconHTML  

    We present a novel incremental placement methodology called FlowPlace for significantly reducing critical path delays of placed standard-cell circuits without appreciable increase in wire length (WL). FlowPlace includes: 1) a timing-driven (TD) analytical global placer TAN that uses accurate pre-route delay functions and minimizes a combination of linear and quadratic objective functions; 2) a discretized network-flow-based detailed placer DFP that has new and effective techniques for performing TD/WL-driven incremental placement while satisfying row-width (white space) constraints; 3) new and accurate unrouted net delay models that are suitable for an analytical placer; and 4) an effective probability-based WL-cost function in detailed placement for reducing WL deterioration while performing TD-incremental placement. We ran FlowPlace on three sets of benchmarks with up to 210 K cells. Starting from WL-optimized placements done by Dragon 2.23, and using purely timing-driven incremental placement, we are able to obtain up to 33.4% and an average of 17.3% improvement in circuit delays at an average of 9.0% WL increase. When incorporating both timing and WL costs in the objective functions of global and detailed placement, the average WL increase reduces to 5.8%, a 35% relative reduction, while the average delay improvement is 15.7%, which is only relatively 9% worse. The run time of our incremental placement method is only about 10% of the run time of Dragon 2.23. Furthermore, starting from an already timing-optimized placement done by TD-Dragon, we still obtain up to 10% and an average of 6.5% delay improvement with a 6.1% WL deterioration; the run time is about 6% of TD-Dragon's. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Provably High-Probability White-Space Satisfaction Algorithm With Good Performance for Standard-Cell Detailed Placement

    Publication Year: 2011 , Page(s): 1291 - 1304
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (688 KB) |  | HTML iconHTML  

    In this paper, we propose an effective white-space (i.e., row length) constraint satisfaction technique embedded in a network flow based detailed placer for standard cell designs that is suitable for both incremental as well as full detailed placement. The highlight of our method is a provable high-probability of obtaining a legal placement even under tight white space (WS) constraints. This high success probability of our method stems from our flexibility of allowing a well-controlled temporary WS constraint violation in the detailed placement process. The flexibility also helps improve the solution quality of the detailed placer, measured by the deterioration of the optimization metric from the global placement solution. We tested our WS constraint-satisfaction method controlled temporary violations (CTV) on two sets of benchmarks for both incremental and full placement applications, and for timing as well as wire length (WL) optimization problems. We obtained legal solutions for all circuits in reasonable times under a 3% WS constraint. For example, for a 210 k-cell circuit td-ibm18: 1) for the timing-driven incremental placement application, we obtain the final placement in 900 secs with a 35.2% delay reduction compared to an initial WL-optimized placement done by Dragon 2.23 and 2) for the full timing-driven placement problem, we obtain the final placement in less than 2.5 h with a timing improvement of 29.8% compared to the state-of-the-art WL-driven detailed placer of NTUplace3-LE. We also tested two internal methods, one that disallows any temporary WS violation, and another which moves cells from WS violated rows to non-full rows in a heuristic manner. The first method cannot legalize all benchmarks, and CTV is 41%-86% relatively better in delay and WL metrics than the two internal methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Sub-1 V, 26 \mu W, Low-Output-Impedance CMOS Bandgap Reference With a Low Dropout or Source Follower Mode

    Publication Year: 2011 , Page(s): 1305 - 1309
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (909 KB) |  | HTML iconHTML  

    We present a low-power bandgap reference (BGR), functional from sub-1 V to 5 V supply voltage with either a low dropout (LDO) regulator or source follower (SF) output stage, denoted as the LDO or SF mode, in a 0.5-μm standard digital CMOS process with Vtn ≈ 0.6 V and |Vtp| ≈ 0.7 V at 27°C. Both modes operate at sub-1 V under zero load with a power consumption of around 26 μW. At 1 V (1.1 V) supply, the LDO (SF) mode provides an output current up to 1.1 mA (0.35 mA), a load regulation of ±8.5 mV/mA (±33 mV/mA) with approximately 10 μs transient, a line regulation of ±4.2 mV/V ( ±50 μV/V), and a temperature compensated reference voltage of 0.228 V (0.235 V) with a temperature coefficient around 34 ppm/°C from -20°C to 120 °C. At 1.5 V supply, the LDO (SF) mode can further drive up to 9.6 mA (3.2 mA) before the reference voltage falls to 90% of its nominal value. Such low-supply-voltage and high-current-driving BGR in standard digital CMOS processes is highly useful in portable and switching applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 140 Mb/s to 1.96 Gb/s Referenceless Transceiver With 7.2 \mu s Frequency Acquisition Time

    Publication Year: 2011 , Page(s): 1310 - 1315
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1412 KB) |  | HTML iconHTML  

    This paper presents a design of a wide-range transceiver without an external reference clock. The self-biased and multi-band PLL with self-initialization technique is used for the wide-operating range of 140 Mb/s to 1.96 Gb/s and fast frequency acquisition time of 7.2 μs. A linear phase detector which has no dead-zone problem is proposed for a phase adjustment with a low-jitter performance. The RMS jitter of the recovered clock is 11.4 ps at 70 MHz operation. The overall transceiver consumes 388 mW at 2.5 V supply and occupies 3.41 mm2 in a 0.25-μm 1P5M CMOS technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Performance Evaluation of Radiation Hardened Latches for Nanoscale CMOS

    Publication Year: 2011 , Page(s): 1315 - 1319
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (160 KB) |  | HTML iconHTML  

    Deep sub-micrometer/nano CMOS circuits are more sensitive to externally induced radiation phenomena that are likely to cause the occurrence of so-called soft errors. Therefore, the tolerance of the circuit to the soft errors is a strict requirement in nanoscale circuit designs. Since the traditional error tolerant methods result in significant cost penalties in terms of power, area, and performance, the development of low-cost hardened designs for storage cells (such as latches and memories) is of increasing importance. This paper proposes three new hardened designs for CMOS latches at 32 nm feature size; these circuits are Schmitt trigger based, while the third one utilizes a cascode configuration in the feedback loop. The Cascode ST latch has 112% higher critical charge than the conventional reference latch with only 10% area increase. A novel design metric (QPAR) for latches is introduced to assess the overall design effectiveness such as area, performance, power, and soft error tolerance. The novel metric (QPAR) shows the proposed cascode ST latch achieves up to 36% improvement in terms of QPAR compared with the existing hardening designs. Monte Carlo analysis has confirmed the robustness of the proposed hardened latches to process, voltage, and temperature (PVT) variations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Aggressive Runtime Leakage Control Through Adaptive Light-Weight V_{\rm th} Hopping With Temperature and Process Variation

    Publication Year: 2011 , Page(s): 1319 - 1323
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (606 KB) |  | HTML iconHTML  

    The increasing leakage power consumption and stringent thermal constraint necessitate more aggressive leakage control techniques. Power gating and body biasing are widely used for standby leakage control. Their large energy overhead for performing mode transition is the major obstacle for more aggressive leakage control. Temperature and process variation (TV/PV) further magnify the overhead problem, leading to so-called “corner case leakage control” problem. Light-weight Vth hopping (LW-VH) is a candidate technique to tackle the energy overhead problem. This paper demonstrates the application of LW-VH on microarchitectural- and RTL-level idleness exploitation with adaptive control techniques for TV/PV compensation. Adaptive LW-VH shows 30% average saving on total CPU leakage at microarchitectural level, and 4% to 15% leakage saving at RTL level. By combining all the techniques proposed in this paper, a three-tier aggressive leakage control system is introduced to fully exploit idleness at all levels. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Publication Year: 2011 , Page(s): 1324
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Publication Year: 2011 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

Krishnendu Chakrabarty
Department of Electrical Engineering
Duke University
Durham, NC 27708 USA
Krish@duke.edu