By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 12 • Date Dec. 2010

Filter Results

Displaying Results 1 - 20 of 20
  • Table of contents

    Publication Year: 2010 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (44 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Publication Year: 2010 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Properties of Digital Switching Currents in Fully CMOS Combinational Logic

    Publication Year: 2010 , Page(s): 1625 - 1638
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (851 KB) |  | HTML iconHTML  

    In this paper, we present a model to derive statistical properties of digital noise due to logic transitions of gates in a fully CMOS combinational circuit. Switching activity of logic gates in a digital system is a deterministic process, depending on both circuit parameters and input signals. However, the huge number of logic blocks in a complex IC makes digital switching a cognitively stochastic process. For a combinational logic network, we can model digital switching currents as stationary shot noise processes, deriving both their amplitude distributions and their power spectral densities. From the spectra of digital currents, we can also calculate the spectral components and the rms value of disturbances injected into the on-chip power supply lines. The stochastic model for switching currents has been validated by comparing theoretical results with circuit simulations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Effective Gated Clock Tree Design Based on Activity and Register Aware Placement

    Publication Year: 2010 , Page(s): 1639 - 1648
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (829 KB) |  | HTML iconHTML  

    Clock gating is one of the most effective techniques to reduce clock tree power. Although it has already been studied considerably, most of the previous works are restricted to either register transfer level (RTL) or clock tree synthesis stage. Clock gating design at RTL is coarse and it pays no attention to the physical information, therefore, it often results in large wirelength overhead. While if clock gating is considered only at clock tree synthesis, the optimization space is largely limited due to the fixing of registers. To fully use the logical and physical information between registers, we propose a new flow for low-power gated clock tree design in this work. It mainly includes three parts: gated clock tree aware register placement, gated clock tree construction, and incremental placement. Compared with the previous works on clock gating, our algorithm reduces the clock tree power with much fewer gating logics, therefore, the overhead to the placement is also reduced. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TABS: Temperature-Aware Layout-Driven Behavioral Synthesis

    Publication Year: 2010 , Page(s): 1649 - 1659
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (635 KB) |  | HTML iconHTML  

    With rising power densities in modern VLSI circuits, thermal effects are becoming important in the design of ICs. Elevated chip temperatures have an adverse impact on performance, reliability, power consumption, and cooling costs. To ensure adequate thermal management, all phases of the design flow must account for thermal effects on their design decisions. We present a two-stage simulated annealing-based high-level synthesis technique that combines power minimization with temperature-aware scheduling, binding, and floorplanning. In our technique, the first stage of the simulated annealing algorithm creates a low-power solution, which is then iteratively improved by the second stage to minimize estimated on-chip peak temperature using accurate module-level temperature estimation. We show that minimizing average power alone does not guarantee minimal peak temperatures. However, our approach consistently finds solutions that have lower on-chip peak temperatures and uniform on-chip temperature distributions, compared to a traditional low-power synthesis methodology that minimizes average power. Experiments show that our method reduces peak temperatures on average by 12% and up to 16%, compared to a traditional low-power synthesis algorithm that minimizes average power. These improvements in chip-level temperature distributions are achieved with a modest increase in chip area of under 15% on average. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SRAM Leakage Reduction by Row/Column Redundancy Under Random Within-Die Delay Variation

    Publication Year: 2010 , Page(s): 1660 - 1671
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (890 KB) |  | HTML iconHTML  

    Share of leakage in total power consumption of static RAM (SRAM) memories is increasing with technology scaling. Reverse body biasing increases threshold voltage (Vth), which exponentially reduces subthreshold leakage, but it increases SRAM access delay. Traditionally, when all cells of an SRAM block used to have almost the same delay, within-die variations are increasingly widening the delay distribution of cells even within a single SRAM block, and hence, most of these cells are substantially faster than the delay set for the entire block. Consequently, after the reverse body biasing and the resulting delay rise, only a small number of cells violate the original delay of the SRAM block; we propose to replace them with sufficient number of spare rows/columns of SRAM. Our experiments show that the leakage can be reduced by up to 40% in a 90-nm predictive technology by adding less than ten spare columns to an 8-kB SRAM array for a negligible penalty in delay, dynamic power, and area in the presence of 3% uncorrelated random delay variation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Low Overhead High Test Compression Technique Using Pattern Clustering With $n$-Detection Test Support

    Publication Year: 2010 , Page(s): 1672 - 1685
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (809 KB) |  | HTML iconHTML  

    This paper presents a test data compression scheme that can be used to further improve compressions achieved by linear-feedback shift register (LFSR) reseeding. The proposed compression technique can be implemented with very low hardware overhead. The test data to be stored in the automatic test equipment (ATE) memory are much smaller than that for previously published schemes, and the number of test patterns that need to be generated is smaller than other weighted random pattern testing schemes. The proposed technique can be extended to generate test patterns that achieve high n-detection fault coverage. This technique compresses a regular 1-detection test cube set instead of an n-detection test cube set, which is typically n times larger. Hence, the volume of compressed test data for n-detection test is comparable to that for 1-detection test. Experimental results on a large industry design show that over 1600X compression is achievable by the proposed scheme with the test sequence length, which is comparable to that of highly compacted deterministic patterns. Experimental results on n -detection test show that test patterns generated by the proposed decompressor can achieve very high 5-detection stuck-at fault coverage and high compression for large benchmark circuits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable ECO Cells for Timing Closure and IR Drop Minimization

    Publication Year: 2010 , Page(s): 1686 - 1695
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1138 KB) |  | HTML iconHTML  

    Unused spare cells occur inevitably in traditional engineering change order (ECO) design flow. It results in inefficient area usage, more leakage, and more IR drop impacts. To tackle these problems, a reconfigurable cell is proposed, which serves the dual purposes of decoupling capacitance and spare cell in this paper. Before ECO is applied, these cells are preplaced as decoupling capacitors. When ECO is applied, these cells are configured as functional cells. To demonstrate the efficiency of our configurable cell, we propose an algorithm for timing closure and IR drop minimization. Compared with traditional ECO flow, our method shows 15% reduction in maximum IR drop and 9% reduction in leakage before applying ECO, and 7% reduction in maximum IR drop after applying ECO, with 10% area of spare cells. In addition, we show that there remain less unsolved timing-violation paths after applying our ECO timing optimization flow due to less IR drop and free selection of ECO gate type. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancing the Area Efficiency of FPGAs With Hard Circuits Using Shadow Clusters

    Publication Year: 2010 , Page(s): 1696 - 1709
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (876 KB) |  | HTML iconHTML  

    There is a dramatic logic density gap between field-programmable gate arrays (FPGAs) and application-specific integrated circuits, and this gap is the main reason FPGAs are not cost-effective in high-volume applications. Modern FPGAs narrow this gap by including “hard” circuits such as memories and multipliers, which are very efficient when they are used. However, if these hard circuits are not used, they go wasted (including the very expensive programmable routing that surrounds the logic), and have a negative impact on logic density. In this paper, we present an architectural concept, called shadow clusters, which seeks to mitigate this loss. A shadow cluster is a standard FPGA logic “cluster” (typically consisting of a group of lookup tables and flip-flops) that is placed “behind” every hard circuit, and can programmably, through simple, small multiplexers, replace the hard circuit in the event it is not needed. A shadow cluster is effective because the largest area cost, by far, in an FPGA is for the programmable routing that connects the logic. The shadow cluster area cost is small, and yet it enables more consistent employment of the programmable routing across applications with varying demand for hard circuits. We introduce new terminology to describe the economics of hard circuits on FGPAs, and provide a scientific way to measure the area effectiveness. We measure the area efficiency of FPGAs with and without shadow clusters, and show that a modern commercial architecture (with a fixed ratio of multipliers to soft logic) would gain 4.7% in area efficiency by employing shadow clusters. Indeed, every architecture we studied under “reasonable” conditions never showed a loss of area efficiency. Furthermore, we show that most area-efficient architecture that employs the shadow cluster concept is 12.5% better than the most area-efficient architecture without shadow clusters. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design Paradigm for Robust Spin-Torque Transfer Magnetic RAM (STT MRAM) From Circuit/Architecture Perspective

    Publication Year: 2010 , Page(s): 1710 - 1723
    Cited by:  Papers (19)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1888 KB) |  | HTML iconHTML  

    Spin-torque transfer magnetic RAM (STT MRAM) is a promising candidate for future embedded applications. It combines the desirable attributes of current memory technologies such as SRAM, DRAM, and flash memories (fast access time, low cost, high density, and non-volatility). It also solves the critical drawbacks of conventional MRAM technology: poor scalability and high write current. However, variations in process parameters can lead to a large number of cells to fail, severely affecting the yield of the memory array. In this paper, we analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations. Based on the model, we performed a thorough analysis of the impact of design parameters on parametric failures due to process variations. To achieve high memory yield without incurring expensive technology modification, we developed an efficient design paradigm from circuit and/or architecture perspective-to improve the robustness and integration density. The proposed technique effectively relaxes or completely decouples the conflicting design requirements for read stability, writability and cell area. It can be used at an early stage of the design cycle for yield enhancement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design Margin Exploration of Spin-Transfer Torque RAM (STT-RAM) in Scaled Technologies

    Publication Year: 2010 , Page(s): 1724 - 1734
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (955 KB) |  | HTML iconHTML  

    We propose a magnetic and electric level spin-transfer torque random access memory (STT-RAM) cell model to simulate the write operation of an STT-RAM. The model of a magnetic tunneling junction (MTJ) is modified to take into account the electrical response of the MOS transistor that is connected to the MTJ. A dynamic design flow is also proposed to minimize any unnecessary design margin in an STT-RAM cell design by leveraging from the new STT-RAM cell model. The design of an STT-RAM cell with a one-transistor-one-MTJ (1T1J) structure shows that our technique can reduce more than 22% of the STT-RAM cell area, compared with a conventional STT-RAM cell model at a TSMC 90-nm technology node. The performance and the reliability of the memory cell were unaffected. By using our model, we analyzed the scalability of STT-RAM technology down to a 22-nm Bulk-CMOS technology node. The tradeoffs among the MTJ switching current, the thermal stability of the MTJ and the MOS transistor driving strength are discussed. Some magnetic- and circuit-level solutions to achieve 9F2 STT-RAM cell area at 22-nm technology node are also discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Sensitivity Analysis of Power Signal Methods for Detecting Hardware Trojans Under Real Process and Environmental Conditions

    Publication Year: 2010 , Page(s): 1735 - 1744
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4171 KB) |  | HTML iconHTML  

    Trust in reference to integrated circuits addresses the concern that the design and/or fabrication of the integrated circuit (IC) may be purposely altered by an adversary. The insertion of a hardware Trojan involves a deliberate and malicious change to an IC that adds or removes functionality or reduces its reliability. Trojans are designed to disable and/or destroy the IC at some future time or they may serve to leak confidential information covertly to the adversary. Trojans can be cleverly hidden by the adversary to make it extremely difficult for chip validation processes, such as manufacturing test, to accidentally discover them. This paper investigates the sensitivity of a power supply transient signal analysis method for detecting Trojans. In particular, we focus on determining the smallest detectable Trojan, i.e., the least number of gates a Trojan may have and still be detected, using a set of process simulation models that characterize a TSMC 0.18 μm process. We also evaluate the sensitivity of our Trojan detection method in the presence of measurement noise and background switching activity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 64-Mb Chain FeRAM With Quad BL Architecture and 200 MB/s Burst Mode

    Publication Year: 2010 , Page(s): 1745 - 1752
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1694 KB) |  | HTML iconHTML  

    A 64-Mb chain ferroelectric RAM (chainFeRAM) is fabricated using 130-nm 3-metal CMOS technology. A newly developed quad bitline architecture, which combines folded bitline configuration with shield bitline scheme, eliminates bitline-bitline (BL-BL) coupling noise. The quad bitline architecture also reduces the number of sense amplifiers and activated bitlines, resulting in the reduction of die size by 6.5% and cell array power consumption by 28%. Fast read/write of 60-ns cycle time as well as reliability improvement are realized by two high-speed error checking and correcting (ECC) techniques: 1) fast pre-parity calculation ECC sequence and 2) all-“0”-write-before-data-write scheme. Moreover, among nonvolatile memories reported so far, the 64 Mb chain FeRAM has achieved the highest read/write bandwidth of 200 MB/s with ECC. The chip size is 87.5 mm2 with average cell size of 0.7191 μm2. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 692-nW Advanced Encryption Standard (AES) on a 0.13- \mu m CMOS

    Publication Year: 2010 , Page(s): 1753 - 1757
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (290 KB) |  | HTML iconHTML  

    This paper presents a very low power/area design for the advanced encryption standard (AES) based on an 8-bit data path. The average measured core power on a 0.13-μm CMOS using a 100-kHz clock and a core voltage of 0.75 V is 692 nW. The core area is 21 000 μm2 and the latency is 356 cycles. This design further challenges the low-resource end of the design space and is the first reported submicrowatt design for the AES; it has significant power-latency-area performance improvements over the previous state-of-the-art application-specific IC (ASIC) implementations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Chip-Area Efficient Voltage Regulator for VLSI Systems

    Publication Year: 2010 , Page(s): 1757 - 1762
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (683 KB) |  | HTML iconHTML  

    This paper presents an error amplifier structure to improve load regulation of low-voltage low-dropout regulators. The proposed error amplifier has ultrawide swing to extend the high-gain region so that the size of power transistor can be reduced. Experimental results show that the required power transistor size is reduced by 25% to achieve similar performance in load regulation. Moreover, extra power consumption and increase of silicon area are not significant. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Diagnosis of MRAM Write Disturbance Fault

    Publication Year: 2010 , Page(s): 1762 - 1766
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (572 KB) |  | HTML iconHTML  

    In this paper, we propose a new test method to detect write disturbance fault (WDF) for magnetic RAM (MRAM). Furthermore, an adaptive diagnosis algorithm (ADA) is also introduced to identify and diagnose the WDF for MRAM. The proposed test method can evaluate process stability and uniformity. We also develop a built-in self-test (BIST) circuit that supports the proposed WDF diagnosis test method. A 1-Mb toggle MRAM prototype chip with the proposed BIST circuit has been designed and fabricated using a special 0.15-μm CMOS technology. The BIST circuit overhead is only about 0.05% with respect to the 1-Mb MRAM. The test time is reduced by about 30% as compared with the test method without using the decision write mechanism. The chip measurement results show the efficiency of our proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Low Error and High Performance Multiplexer-Based Truncated Multiplier

    Publication Year: 2010 , Page(s): 1767 - 1771
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (806 KB) |  | HTML iconHTML  

    This paper proposes a novel adaptive pseudo-carry compensation truncation (PCT) scheme, which is derived for the multiplexer based array multiplier. The proposed method yields low average error among existing truncation methods. The new PCT based truncated array multiplier outperforms other existing truncated array multipliers by as much as 25% in terms of silicon area and delay, and consumes about 40% less dynamic power than the full-width multiplier for 32-bit operation. The proposed truncation scheme is applied to an image compression algorithm. Due to its low truncation error, the mean square errors (MSE) of various reconstructed images are found to be comparable to those obtained with full-precision multiplication. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Search for T-VLSI Editor-in-Chief

    Publication Year: 2010 , Page(s): 1772
    Save to Project icon | Request Permissions | PDF file iconPDF (145 KB)  
    Freely Available from IEEE
  • 2010 Index IEEE Transactions on Very Large Scale Integration (VLSI) Systems Vol. 18

    Publication Year: 2010 , Page(s): 1773 - 1796
    Save to Project icon | Request Permissions | PDF file iconPDF (283 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Publication Year: 2010 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

Krishnendu Chakrabarty
Department of Electrical Engineering
Duke University
Durham, NC 27708 USA
Krish@duke.edu