Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 3 • Date March 2010

Filter Results

Displaying Results 1 - 24 of 24
  • Table of contents

    Publication Year: 2010 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (44 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Publication Year: 2010 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Editorial : New Associate Editor Appointments

    Publication Year: 2010 , Page(s): 345 - 346
    Save to Project icon | Request Permissions | PDF file iconPDF (496 KB)  
    Freely Available from IEEE
  • Dual Supply Voltages and Dual Clock Frequencies for Lower Clock Power and Suppressed Temperature-Gradient-Induced Clock Skew

    Publication Year: 2010 , Page(s): 347 - 355
    Cited by:  Papers (9)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (856 KB) |  | HTML iconHTML  

    Two new clocking methodologies based on supply voltage and frequency scaling are proposed in this paper for lowering the power consumption and the temperature-fluctuation-induced skew without degrading the clock frequency. The clock signal is distributed globally at a scaled supply voltage with a single clock frequency with the first clocking methodology. Alternatively, dual supply voltages and dual signal frequencies are employed with the second methodology that provides enhanced power savings. The optimum supply voltage that minimizes clock skew is 44% lower than the nominal supply voltage in a 0.18 ??m TSMC CMOS technology. Novel multi-threshold voltage level converters and frequency multipliers are employed at the leaves of the clock trees in order to maintain the synchronous system performance. The temperature-fluctuation-induced skew and the power consumption are reduced by up to 80% and 76%, respectively, with the proposed dual supply voltage and dual frequency clock distribution networks as compared to a standard clock tree operating at the nominal supply voltage with a single clock frequency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing SRAM Power Using Fine-Grained Wordline Pulsewidth Control

    Publication Year: 2010 , Page(s): 356 - 364
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB) |  | HTML iconHTML  

    Embedded SRAM dominates modern SoCs, and there is a strong demand for SRAM with lower power consumption while achieving high performance and high density. However, the large increase of process variations in advanced CMOS technologies is considered one of the biggest challenges for SRAM designers. In the presence of large process variations, SRAMs are expected to consume larger power to ensure correct read operations and meet yield targets. In this paper, we propose a new architecture that significantly reduces the array switching power for SRAM. The proposed architecture combines built-in self-test and digitally controlled delay elements to reduce the wordline pulsewidth for memories while ensuring correct read operations, hence reducing the switching power. Monte Carlo simulations using a 1-Mb SRAM macro in an industrial 45-nm technology are used to verify the power saving for the proposed architecture. For a 48-Mb memory density, a 27% reduction in array switching power can be achieved for a read access yield target of 95%. In addition, the proposed system can provide larger power saving as process variations increase, which makes it an attractive solution for 45-nm-and-below technologies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Latency and Energy of Checkpointed Superscalar Register Alias Tables

    Publication Year: 2010 , Page(s): 365 - 377
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1697 KB) |  | HTML iconHTML  

    This paper investigates how the latency and energy of register alias tables (RATs) vary as a function of the number of global checkpoints (GCs), processor issue width, and window size. It improves upon previous RAT checkpointing work that ignored the actual latency and energy tradeoffs and focused solely on evaluating performance in terms of instructions per cycle (IPC). This work utilizes measurements from the full-custom checkpointed RAT implementations developed in a commercial 130-nm fabrication technology. Using physical- and architectural-level evaluations together, this paper demonstrates the tradeoffs among the aggressiveness of the RAT checkpointing, performance, and energy. This paper also shows that, as expected, focusing on IPC alone incorrectly predicts performance. The results of this study justify checkpointing techniques that use very few GCs (e.g., four). Additionally, based on full-custom implementations for the checkpointed RATs, this paper presents analytical latency and energy models. These models can be useful in the early stages of architectural exploration where actual physical implementations are unavailable or are hard to develop. For a variety of RAT organizations, our model estimations are within 6.4% and 11.6% of circuit simulation results for latency and energy, respectively. This range of accuracy is acceptable for architectural-level studies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Crosstalk-Induced Delay, Noise, and Interconnect Planarization Implications of Fill Metal in Nanoscale Process Technology

    Publication Year: 2010 , Page(s): 378 - 391
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2774 KB) |  | HTML iconHTML  

    In this paper, we investigate the crosstalk-induced delay, noise, and chemical mechanical polishing (CMP)-induced thickness-variation implications of dummy fill generated using rule-based wire track fill techniques and CMP-aware model-based methods for designs implemented in 65 nm process technology. The results indicate that fill generated using rule-based and CMP-aware model-based methods can have a significant impact on parasitic capacitance, interconnect planarization, and individual path delay variation. Crosstalk-induced delay and noise are significantly reduced in the grounded-fill cases, and designs with floating fill also experience a reduction in average crosstalk-induced delay and noise, which is in contrast to the predictions of previous studies on small-scale interconnect structures. When crosstalk effects are included in the analysis, the observed delay behavior is significantly different from the delay modeled without considering crosstalk effects. Consequently, crosstalk-induced delay and noise must be simultaneously considered in addition to parasitic capacitance and interconnect planarization when developing future fill generation methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DFT and Minimum Leakage Pattern Generation for Static Power Reduction During Test and Burn-In

    Publication Year: 2010 , Page(s): 392 - 400
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB) |  | HTML iconHTML  

    This paper presents a design for testability and minimum leakage pattern generation technique to reduce static power during test and burn-in for nanometer technologies. This technique transforms the minimum leakage pattern generation problem into a pseudo-Boolean optimization (PBO) problem. Nonlinear objective functions of leakage power are approximated by linear ones such that this problem can be solved efficiently by an existing PBO solver. A partitioning-based algorithm is applied for control point insertion and also CPU time reduction. Experimental results on the IEEE ISCAS'89 benchmark circuits using Taiwan Semiconductor Manufacturing Company 90-nm technology show that, for large circuits, the static power is reduced from 8.3% (without partition) to 17.47% (with 64 partitions). Besides, the overall CPU time is reduced from 3600 s (without partition) to 83 s (with 64 partitions). This technique reduces the static power without changing the manufacturing process or library cells. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Path Selection for Transition Path Delay Faults

    Publication Year: 2010 , Page(s): 401 - 409
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (460 KB) |  | HTML iconHTML  

    We propose a path selection criterion to improve the coverage of small delay defects. Under this criterion, every line in the circuit is covered by one of the longest testable paths or subpaths that goes through it. Earlier criteria that considered only complete paths (from inputs to outputs) did not use longest testable subpaths, which may be longer than the longest complete testable paths. Earlier criteria that considered subpaths considered only subpaths of longest paths. We apply the proposed criterion to a delay fault model called the transition path delay fault model. This model was introduced to capture both small and large delay defects. We present experimental results to demonstrate that consideration of subpaths improves the circuit coverage relative to the case where only complete paths are allowed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of Asynchronous Circuits for High Soft Error Tolerance in Deep Submicrometer CMOS Circuits

    Publication Year: 2010 , Page(s): 410 - 422
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1179 KB) |  | HTML iconHTML  

    As the devices are scaling down, the combinational logic will become susceptible to soft errors. The conventional soft error tolerant methods for soft errors on combinational logic do not provide enough high soft error tolerant capability with reasonably small performance penalty. This paper investigates the feasibility of designing quasi-delay insensitive (QDI) asynchronous circuits for high soft error tolerance. We analyze the behavior of null convention logic (NCL) circuits in the presence of particle strikes, and propose an asynchronous pipeline for soft-error correction and a novel technique to improve the robustness of threshold gates, which are basic components in NCL, against particle strikes by using Schmitt trigger circuit and resizing the feedback transistor. Experimental results show that the proposed threshold gates do not generate soft errors under the strike of a particle within a certain energy range if a proper transistor size is applied. The penalties, such as delay and power consumption, are also presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FPGA Design for Timing Yield Under Process Variations

    Publication Year: 2010 , Page(s): 423 - 435
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (569 KB) |  | HTML iconHTML  

    Yield loss due to timing failures results in diminished returns for field-programmable gate arrays (FPGAs), and is aggravated under increased process variations in scaled technologies. The uncertainty in the critical delay of a circuit under process variations exists because the delay of each logic element in the circuit is no longer deterministic. Traditionally, FPGAs have been designed to manage process variations through speed binning, which works well for inter-die variations, but not for intra-die variations resulting in reduced timing yield for FPGAs. FPGAs present a unique challenge because of their programmability and unknown end user application. In this paper, a novel architecture and computer-aided design co-design technique is proposed to improve the timing yield. Experimental results indicate that the use of proposed design technique can achieve timing yield improvement of up to 68%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploration of Heterogeneous FPGAs for Mapping Linear Projection Designs

    Publication Year: 2010 , Page(s): 436 - 449
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (863 KB) |  | HTML iconHTML  

    In many applications, a reduction of the amount of the original data or a representation of the original data by a small set of variables is often required. Among many techniques, the linear projection is often chosen due to its computational attractiveness and good performance. For applications where real-time performance and flexibility to accommodate new data are required, the linear projection is implemented in field-programmable gate arrays (FPGAs) due to their fine-grain parallelism and reconfigurability properties. Currently, the optimization of such a design is considered as a separate problem from the basis calculation leading to suboptimal solutions. In this paper, we propose a novel approach that couples the calculation of the linear projection basis, the area optimization problem, and the heterogeneity exploration of modern FPGAs. The power of the proposed framework is based on the flexibility to insert information regarding the implementation requirements of the linear basis by assigning a proper prior distribution to the basis matrix. Results from real-life examples on modern FPGA devices demonstrate the effectiveness of our approach, where up to 48% reduction in the required area is achieved compared to the current approach, without any loss in the accuracy or throughput of the design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interstratum Connection Design Considerations for Cost-Effective 3-D System Integration

    Publication Year: 2010 , Page(s): 450 - 460
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1647 KB) |  | HTML iconHTML  

    Emerging 3-D multistrata system integration offers the capability for high density interstratum interconnects that have short lengths and low parasitics. However, 3-D integration is only one way to accomplish system integration and it must compete against established system integration options such as system-on-a-chip (SoC) and system-in-a-package. We discuss multiple tradeoffs that need to be carefully considered for choosing 3-D integration over other integration schemes. The first step toward enabling 3-D design is characterizing the new interstratum connection elements, microconnects and through-Si vias, in a bonded 3-D technology. We have used both analytical- and simulation-based approaches to analyze the parasitic characteristics of interstratum connections between bonded 3-D stratum, and have compared the interstratum power and performance with SoC global interconnects, taking into account the impact of technology scaling. The specific elements in an interstratum connection and their electrical properties strongly depend on the choice of 3-D integration architecture, such as face-to-face, back-to-face, or the presence of redistribution layer for bonding. We present an adaptive interstratum IO circuit technique to drive various types of interstratum connections and thus enable 3-D die reuse across multiple 3-D chips. The 3-D die/intellectual property reuse concept with the adaptive interstratum IO design can be applied to design 3-D ready dice to amortize additional 3-D costs associated with strata design, test, and bonding process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of a CMOS Broadband Transimpedance Amplifier With Active Feedback

    Publication Year: 2010 , Page(s): 461 - 472
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (863 KB) |  | HTML iconHTML  

    In this paper, a novel current-mode transimpedance amplifier (TIA) exploiting the common gate input stage with common source active feedback has been realized in CHRT 0.18 ??m -1.8 V RFCMOS technology. The proposed active feedback TIA input stage is able to achieve a low input impedance similar to that of the well-known regulated cascode (RGC) topology. The proposed TIA also employs series inductive peaking and capacitive degeneration techniques to enhance the bandwidth and the gain. The measured transimpedance gain is 54.6 dB?? with a -3 dB bandwidth of about 7 GHz for a total input parasitic capacitance of 0.3 pF. The measured average input referred noise current spectral density is about 17.5 pA/??{Hz} up to 7 GHz. The measured group delay is within 65 ?? 10 ps over the bandwidth of interest. The chip consumes 18.6 mW DC power from a single 1.8 V supply. The mathematical analysis of the proposed TIA is presented together with a detailed noise analysis based on the van der Ziel MOSFET noise model. The effect of the induced gate noise in a broadband TIA is included. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Low-Cost VLSI Implementation for Efficient Removal of Impulse Noise

    Publication Year: 2010 , Page(s): 473 - 481
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3266 KB) |  | HTML iconHTML  

    Image and video signals might be corrupted by impulse noise in the process of signal acquisition and transmission. In this paper, an efficient VLSI implementation for removing impulse noise is presented. Our extensive experimental results show that the proposed technique preserves the edge features and obtains excellent performances in terms of quantitative evaluation and visual quality. The design requires only low computational complexity and two line memory buffers. Its hardware cost is quite low. Compared with previous VLSI implementations, our design achieves better image quality with less hardware cost. Synthesis results show that the proposed design yields a processing rate of about 167 M samples/second by using TSMC 0.18 ??m technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of High-Throughput Fully Parallel LDPC Decoders Based on Wire Partitioning

    Publication Year: 2010 , Page(s): 482 - 489
    Cited by:  Papers (15)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (915 KB) |  | HTML iconHTML  

    We present a method to design high-throughput fully parallel low-density parity-check (LDPC) decoders. With our method, a decoder's longest wires are divided into several short wires with pipeline registers. Log-likelihood ratio messages transmitted along with these pipelined paths are thus sent over multiple clock cycles, and the decoder's critical path delay can be reduced while maintaining comparable bit error rate performance. The number of registers inserted into paths is estimated by using wiring information extracted from initial placement and routing information with a conventional LDPC decoder, and thus only necessary registers are inserted. Also, by inserting an even number of registers into the longer wires, two different codewords can be simultaneously decoded, which improves the throughput at a small penalty in area. We present our design flow as well as post-layout simulation results for several versions of a length-1024, (3,6)-regular LDPC code. Using our technique, we achieve a maximum uncoded throughput of 13.21 Gb/s with an energy consumption of 0.098 nJ per uncoded bit at E b/N0 = 5 dB. This represents a 28% increase in throughput, a 30% decrease in energy per bit, and a 1.6% increase in core area with respect to a conventional parallel LDPC decoder, using a 90-nm CMOS technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compressive Acquisition CMOS Image Sensor: From the Algorithm to Hardware Implementation

    Publication Year: 2010 , Page(s): 490 - 500
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4417 KB) |  | HTML iconHTML  

    In this paper, a new design paradigm referred to as compressive acquisition CMOS image sensors is introduced. The idea consists of compressing the data within each pixel prior to storage, and hence, reducing the size of the memory required for digital pixel sensor. The proposed compression algorithm uses a block-based differential coding scheme in which differential values are captured and quantized online. A time-domain encoding scheme is used in our CMOS image sensor in which the brightest pixel within each block fires first and is selected as the reference pixel. The differential values between subsequent pixels and the reference within each block are calculated and quantized, using a reduced number of bits as their dynamic range is compressed. The proposed scheme enables reduced error accumulation as full precision is used at the start of each block, while also enabling reduced memory requirement, and hence, enabling significant silicon area saving. A mathematical model is derived to analyze the performance of the algorithm. Experimental results on a field-programmable gate-array (FPGA) platform illustrate that the proposed algorithm enables more than 50% memory saving at a peak signal-to-noise ratio level of 30 dB with 1.5 bit per pixel. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic and Leakage Energy Minimization With Soft Real-Time Loop Scheduling and Voltage Assignment

    Publication Year: 2010 , Page(s): 501 - 504
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (272 KB) |  | HTML iconHTML  

    With the shrinking of technology feature sizes, the share of leakage in total power consumption of digital systems continues to grow. Traditional dynamic voltage scaling (DVS) fails to accurately address the impact of scaling on system power consumption as the leakage power increases exponentially. The combination of DVS and adaptive body biasing (ABB) is an effective technique to jointly optimize dynamic and leakage energy dissipation. In this paper, we propose an optimal soft real-time loop scheduling and voltage assignment algorithm, loop scheduling and voltage assignment to minimize energy, to minimize both dynamic and leakage energy via DVS and ABB. Voltage transition overhead has been considered in our approach. We conduct simulations on a set of digital signal processor benchmarks based on the power model of 70 nm technology. The simulation results show that our approach achieves significant energy saving compared to that of the integer linear programming approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Supply Switching With Ground Collapse for Low-Leakage Register Files in 65-nm CMOS

    Publication Year: 2010 , Page(s): 505 - 509
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (925 KB) |  | HTML iconHTML  

    Power-gating has been widely used to reduce subthreshold leakage current. However, the extent of leakage saving through power-gating diminishes with technology scaling due to gate leakage of data-retention circuit elements. Furthermore, power-gating involves substantial increase of area and wirelength. A circuit technique called supply switching with ground collapse (SSGC) has recently been proposed to overcome the limitation of power-gating. The circuit technique is successfully applied to the register file of ARM9 microprocessor in a 1.2 V, 65-nm CMOS process, and the measured result is reported for the first time. The leakage current is reduced by a factor of 960 on average of 83 dies at 25??C , and by a factor of 150 at 85??C. Compared to a register file implemented in conventional power-gating, leakage current is cut by a factor of 2.2, demonstrating that SSGC can be a substitute for power-gating in nanometer CMOS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 5-bit 3.2-GS/s Flash ADC With a Digital Offset Calibration Scheme

    Publication Year: 2010 , Page(s): 509 - 513
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (418 KB) |  | HTML iconHTML  

    In high-speed Flash analog-to-digital converters (ADCs), preamplifiers are often placed in front of a comparator to reduce metastability errors and enhance comparison speed. The accuracy of a Flash ADC is mainly limited by the random offsets of preamplifiers and comparators. This paper presents a 5-b Flash ADC with a digital random offset calibration scheme. For calibration, programmable resistive devices are used as the loading devices of the second-stage preamplifiers. By adjusting the calibration resistors, the input-referred offset voltage of each comparator is reduced to be less than 1/2 LSB. Fabricated in a 0.13-??m CMOS process, experimental results show that the ADC consumes 120 mW from a 1.2-V supply and occupies a 0.18- mm2 active area. After calibration, the peak differential non-linearity (DNL) and integral non-linearity (INL) are 0.24 and 0.39 LSB, respectively. At 3.2-GS/s operation, the effective number of bits is 4.54 b, and the effective resolution bandwidth is 600 MHz. This ADC achieves figures of merit of 3.07 and 4.30 pJ/conversion-step at 2 and 3.2 GS/s, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Publication Year: 2010 , Page(s): 514
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE
  • Access over 1 million articles - The IEEE Digital Library [advertisement]

    Publication Year: 2010 , Page(s): 515
    Save to Project icon | Request Permissions | PDF file iconPDF (370 KB)  
    Freely Available from IEEE
  • IEEE Foundation [advertisement]

    Publication Year: 2010 , Page(s): 516
    Save to Project icon | Request Permissions | PDF file iconPDF (320 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Publication Year: 2010 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

Krishnendu Chakrabarty
Department of Electrical Engineering
Duke University
Durham, NC 27708 USA
Krish@duke.edu