By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 1 • Date Jan. 2011

Filter Results

Displaying Results 1 - 23 of 23
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Power-Efficient Explicit-Pulsed Dual-Edge Triggered Sense-Amplifier Flip-Flops

    Page(s): 1 - 9
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (601 KB) |  | HTML iconHTML  

    A novel explicit-pulsed dual-edge triggered sense-amplifier flip-flop (DET-SAFF) for low-power and high-performance applications is presented in this paper. By incorporating the dual-edge triggering mechanism in the new fast latch and employing conditional precharging, the DET-SAFF is able to achieve low-power consumption that has small delay. To further reduce the power consumption at low switching activities, a clock-gated sense-amplifier (CG-SAFF) is engaged. Extensive post-layout simulations proved that the proposed DET-SAFF exhibits both the low-power and high-speed properties, with delay and power reduction of up to 43.3% and 33.5% of those of the prior art, respectively. When the switching activity is less than 0.5, the proposed CG-SAFF demonstrates its superiority in terms of power reduction. During zero input switching activity, CG-SAFF can realize up to 86% in power saving. Lastly, a modification to the proposed circuit has led to an improved common-mode rejection ratio (CMRR) DET-SAFF. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Quasi-Static Voltage Scaling for Energy Minimization With Time Constraints

    Page(s): 10 - 23
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (777 KB) |  | HTML iconHTML  

    Supply voltage scaling and adaptive body biasing (ABB) are important techniques that help to reduce the energy dissipation of embedded systems. This is achieved by dynamically adjusting the voltage and performance settings according to the application needs. In order to take full advantage of slack that arises from variations in the execution time, it is important to recalculate the voltage (performance) settings during runtime, i.e., online. However, optimal voltage scaling algorithms are computationally expensive, and thus, if used online, significantly hamper the possible energy savings. To overcome the online complexity, we propose a quasi-static voltage scaling (QSVS) scheme, with a constant online time complexity O(1). This allows to increase the exploitable slack as well as to avoid the energy dissipated due to online recalculation of the voltage settings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SRAM Write-Ability Improvement With Transient Negative Bit-Line Voltage

    Page(s): 24 - 32
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1614 KB) |  | HTML iconHTML  

    Increasing variations in device parameters significantly degrades the write-ability of SRAM cells in deep sub-100 nm CMOS technology. In this paper, a transient negative bit-line voltage technique is presented to improve write-ability of SRAM cell. Capacitive coupling is used to generate a transient negative voltage at the low-going bit-line during Write operation without using any on-chip or off-chip negative voltage source. Statistical simulations in a 45-nm PD/SOI technology show a 103X reduction in the Write-failure probability with the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Pattern Matching Algorithm for Memory Architecture

    Page(s): 33 - 41
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (609 KB) |  | HTML iconHTML  

    Network intrusion detection system is used to inspect packet contents against thousands of predefined malicious or suspicious patterns. Because traditional software alone pattern matching approaches can no longer meet the high throughput of today's networking, many hardware approaches are proposed to accelerate pattern matching. Among hardware approaches, memory-based architecture has attracted a lot of attention because of its easy reconfigurability and scalability. In order to accommodate the increasing number of attack patterns and meet the throughput requirement of networks, a successful network intrusion detection system must have a memory-efficient pattern-matching algorithm and hardware design. In this paper, we propose a memory-efficient pattern-matching algorithm which can significantly reduce the memory requirement. For Snort rule sets, the new algorithm achieves 21% of memory reduction compared with the traditional Aho-Corasick algorithm. In addition, we can gain 24% of memory reduction by integrating our approach to the bit-split algorithm which is the state-of-the-art memory-based approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Compact SD2 Positive Integer Triangular Array Division Circuit

    Page(s): 42 - 51
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (745 KB) |  | HTML iconHTML  

    Division is the highest latency arithmetic operation in present digital architectures and high-performance computing systems; as such drives the demand for efficient hardware division units. Accordingly, this paper proposes a novel architecture for a nonrestoring divisor based on the radix-2 signed-digit (SD2) representation. This notation has been chosen to achieve fast computation, as proposed by Avizienis (IEEE Transactions on Electronic Computers, vol. EC-10, no. 3, pp. 389-400, Sep. 1961), but the architecture presented in this paper, due to its structure and the definition of the cell implementing its architecture, saves area as well. The proposed divisor architecture is able to achieve a delay of order , similar to the solution presented by Takagi (IEICE Transactions on Fundamentals of Electronics, Communications, and Computer Sciences, E89-A, no. 10, pp. 2874-2881, 2006) being considered as the state of the art, instead of other solutions that give growth. This is in line with the fact that even if our carry-chains have a less impact on the circuit the basic cell is larger compared to the one proposed by Takagi Our cells are larger that those proposed in literature, considering them as single circuit, but considering the overall structure there is a saving of some 40% in the number of gates and a gain of 55% in terms of power saving when compared with the state of the art. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Accuracy Fixed-Width Modified Booth Multipliers for Lossy Applications

    Page(s): 52 - 60
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (919 KB) |  | HTML iconHTML  

    The fixed-width multiplier is attractive to many multimedia and digital signal processing systems which are desirable to maintain a fixed format and allow a little accuracy loss to output data. This paper presents the design of high-accuracy fixed-width modified Booth multipliers. To reduce the truncation error, we first slightly modify the partial product matrix of Booth multiplication and then derive an effective error compensation function that makes the error distribution be more symmetric to and centralized in the error equal to zero, leading the fixed-width modified Booth multiplier to very small mean and mean-square errors. In addition, a simple compensation circuit mainly composed of the simplified sorting network is also proposed. Compared to the previous circuits, the proposed error compensation circuit can achieve a tiny mean error and a significant reduction in mean-square error (e.g., at least 12.3% reduction for the 16-bit fixed-width multiplier) while maintaining the approximate hardware overhead. Furthermore, experimental results on two real-life applications also demonstrate that the proposed fixed-width multipliers can improve the average peak signal-to-noise ratio of output images by at least 2.0 dB and 1.1 dB, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 470- \mu{\hbox {W}} 5-GHz Digitally Controlled Injection-Locked Multi-Modulus Frequency Divider With an In-Phase Dual-Input Injection Scheme

    Page(s): 61 - 70
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1712 KB) |  | HTML iconHTML  

    This paper presents a digitally controlled injection-locked multimodulus frequency divider (ILMFD) based on a ring-oscillator using inverter chains for a small area and low power consumption. In the proposed ILMFD, division ratios of 2, 3, 4, 5 and 6 are achieved by using a programmable delay line that changes the self-oscillation frequency of the ring-oscillator. The locking range of the proposed ILMFD is improved by employing a dual-input injection scheme, which unlike previous multiinput injection schemes, does not require distinct phase inputs. A prototype chip implemented in a 0.13-μm CMOS process has an area of 35×33 μm2 and operates at 5 GHz while consuming 470 μW from 1.2 V supply, where 350 μW is dissipated in the core of the ILMFD. The proposed divider is the first reported multimodulus ILFD with digitally controlled division ratios and an in-phase dual-input injection scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring Area and Delay Tradeoffs in FPGAs With Architecture and Automated Transistor Design

    Page(s): 71 - 84
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1047 KB) |  | HTML iconHTML  

    Field-programmable gate arrays (FPGAs) are used in a variety of markets that have differing cost, performance and power consumption requirements. While it would be ideal to serve all these markets with a single FPGA family, the diversity in the needs of these markets means that generally more than one family is appropriate. Consequently, FPGA vendors have moved to provide a diverse set of families that sit at different points in the area-speed-power design space. This paper aims to understand the circuit and architectural design attributes of FPGAs that enable tradeoffs between area and speed, and to determine the magnitude of the possible tradeoffs. This will be useful for architects seeking to determine the number of device families in a suite of offerings, as well as the changes to make between families. We explore a broad range of architectures and circuit designs and developed a transistor sizing tool that automatically optimizes each design. In this paper, we describe this tool and demonstrate that it achieves results that are comparable to past work but with vastly less effort. We then use the designs produced by the tool to explore the range of tradeoffs possible. We find that through architecture and transistor sizing changes it is possible to usefully vary the area of an FPGA by a factor of 2.0 and the performance of an FPGA by a factor of 2.1. We also observe that the range of area and delay tradeoffs possible by varying only the transistor sizing of a single architecture is larger than the ranges observed in past architectural experiments. In addition to transistor size, we note that LUT size is one of the most useful parameters for trading off area and delay. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Lightweight High-Performance Fault Detection Scheme for the Advanced Encryption Standard Using Composite Fields

    Page(s): 85 - 91
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (245 KB) |  | HTML iconHTML  

    The faults that accidently or maliciously occur in the hardware implementations of the Advanced Encryption Standard (AES) may cause erroneous encrypted/decrypted output. The use of appropriate fault detection schemes for the AES makes it robust to internal defects and fault attacks. In this paper, we present a lightweight concurrent fault detection scheme for the AES. In the proposed approach, the composite field S-box and inverse S-box are divided into blocks and the predicted parities of these blocks are obtained. Through exhaustive searches among all available composite fields, we have found the optimum solutions for the least overhead parity-based fault detection structures. Moreover, through our error injection simulations for one S-box (respectively inverse S-box), we show that the total error coverage of almost 100% for 16 S-boxes (respectively inverse S-boxes) can be achieved. Finally, it is shown that both the application-specific integrated circuit and field-programmable gate-array implementations of the fault detection structures using the obtained optimum composite fields, have better hardware and time complexities compared to their counterparts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate Timing and Noise Analysis of Combinational and Sequential Logic Cells Using Current Source Modeling

    Page(s): 92 - 103
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (952 KB) |  | HTML iconHTML  

    A current source model (CSM) for CMOS logic cells is presented, which can be used for accurate noise and delay analysis in CMOS VLSI circuits. CS modeling is broadly considered as the method of choice for modern static timing and noise analysis tools. Unfortunately, the existing CSMs are only applicable to combinational logic cells. In addition to multistage logic nature of the sequential cells, the main difficulty in developing a CSM for these cells is the presence of feedback loops. This paper begins by presenting a highly accurate CSM for combinational logic cells, followed by models for common sequential cells, including latches and master slave flip-flops. The proposed model addresses these problems by characterizing the cell with suitable nonlinear CSs and capacitive components. Given the input and clock voltage waveforms of arbitrary shapes, our new model can accurately compute the output voltage waveform of the sequential cell. Experimental results demonstrate close-to-SPICE waveforms with three orders of magnitude speedup. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Location Cache Design and Performance Analysis for Chip Multiprocessors

    Page(s): 104 - 117
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1651 KB) |  | HTML iconHTML  

    Recent research at Intel suggests that chips with hundreds of processor cores are possible in the not-so-distant future. As the number of cores grows, so does the size of the cache systems required to allow them to operate efficiently. Caches have grown to consume a significant percentage of the power utilized by a processor. In this research, we extend the concept of location cache to support chip multiprocessors (CMPs) systems in combination with low-power L2 caches based upon the gated-ground technique. The combination of these two techniques allows for reductions in both dynamic and leakage power consumption. In this paper, we will present an analysis of the power savings provided by utilizing location caches in a CMP system. The performance of the cache system is evaluated by extending the capability of CACTI and Simics using the SPLASH-2 and ALPBench benchmark suites. These simulation results demonstrate that the utilization of location caches in CMP systems is capable of saving a significant amount of power over equivalent CMP systems that lack location caches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Asynchronous Pixel for an Energy Harvesting CMOS Image Sensor

    Page(s): 118 - 129
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1596 KB) |  | HTML iconHTML  

    This paper proposes a novel energy harvesting technique based on an asynchronous pixel structure and an efficient energy generation scheme, referred to as avalanche energy generation (AEG). The key idea behind using an asynchronous type of pixel is to lower the power consumption by enabling only active pixels to be read-out after which they enter into a power generation mode. In this mode, the on-pixel photodetector itself will be used to harvest the light energy from the environment and make it available to active pixels. A very interesting feature about our proposed approach is that during a frame capture, critical energy is mainly required for starting-up activity. Once a group of pixels have been read-out, the available energy will rise and more array activity will contribute to the generation of more energy, hence creating an avalanche effect. In contrast to other early designs of energy harvesting image sensors, our scheme uses the photodetector itself for power generation. This results in better utilization of the photosensitive area and more importantly an improved energy generation scheme. Detailed power analysis and extensive simulation results are provided in this paper, which validate the proposed concept. Three test structures have been fabricated in AMIS 1-poly, 5-metal CMOS 0.35-m n-well process. The power generation process and event generation have been successfully verified experimentally. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • {\rm C}\Delta {\rm IDDQ} : Improving Current-Based Testing and Diagnosis Through Modified Test Pattern Generation

    Page(s): 130 - 141
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1128 KB) |  | HTML iconHTML  

    This paper presents a novel approach to extending the life of current-based test techniques for the detection and diagnosis of bridging defects. Called CΔIDDQ (Complementary ΔIDDQ), this approach combines a modified test pattern generation with a simple post-processing of IDDQ measurements (namely additions and subtractions) such that the resulting measurement combination equals zero. Consequently, CΔIDDQ eliminates the main current variance sources: wafer-to-wafer, IC-to-IC and vector-to-vector variations; the only remaining source is the measurement variance. The modified test pattern generation is based on the innovative concept of transient-fault test pattern decomposition and the use of layout information to target realistic bridging defect sites. Verification based on logic simulation confirms that the combination of the resulting subset of fault-free IDDQ measurements is equal to 0. Using this promising new technique, bridging defect detection capability can be improved by orders of magnitude. Simulation results also show that this improved detection capability may be necessary even for low-power devices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fixed-State Tests for Delay Faults in Scan Designs

    Page(s): 142 - 146
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (206 KB) |  | HTML iconHTML  

    One of the methods to reduce the power dissipation during scan shifting is based on holding the state inputs to the combinational logic of a circuit constant for the duration of a scan operation. We note that this method also allows a new type of two-pattern scan-based tests to be applied. We refer to these tests as fixed-state tests. These tests have several properties that make them effective as complements to skewed-load and broadside tests, and also allows them to be computed efficiently. We discuss these properties in the context of transition faults. We describe procedures for selecting the constant vector for the state inputs during a scan operation, and for generating fixed-state tests. We present experimental results to demonstrate the transition fault coverage improvements possible with these tests. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Computation of Discharge Current Upper Bounds for Clustered Power Gating

    Page(s): 146 - 151
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (290 KB) |  | HTML iconHTML  

    The capability of accurately estimating an upper bound of the maximum current drawn by a digital macroblock from the ground or power supply line constitutes a major asset of automatic power-gating flows. In fact, the maximum current information is essential to properly size the sleep transistor in such a way that speed degradation and signal integrity violations are avoided. Loose upper bounds can be determined with a reasonable computational cost, but they lead to oversized sleep transistors. On the other hand, exact computation of the maximum drawn current is an NP-hard problem, even when conservative simplifying assumptions are made on gate-level current profiles. In this paper, we present a scalable algorithm for tightening upper bound computation, with a controlled and tunable computational cost. The algorithm exploits state-of-the-art commercial timing analysis engines, and it is tightly integrated into an industrial power-gating flow for leakage power reduction. The results we have obtained on large circuits demonstrate the scalability and effectiveness of our estimation approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-Threshold Voltage FinFET Sequential Circuits

    Page(s): 151 - 156
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (598 KB) |  | HTML iconHTML  

    New multi threshold voltage (multi-Vth) brute-force FinFET sequential circuits with independent-gate bias, work-function engineering, and gate-drain/source overlap engineering techniques are presented in this paper. The total active mode power consumption, the clock power, and the average leakage power of the multi-Vth sequential circuits are reduced by up to 55%, 29%, and 53%, respectively, while maintaining similar speed and data stability as compared to the circuits in a single threshold voltage (single-Vth) tied-32 nm-gate FinFET technology. Furthermore, the area is reduced by up to 21% with the new sequential circuits as compared to the circuits with single-Vth tied-gate FinFETs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Channel Estimator and Aliasing Canceller for Equalizing and Decoding Non-Cyclic Prefixed Single-Carrier Block Transmission via MIMO-OFDM Modem

    Page(s): 156 - 160
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1435 KB) |  | HTML iconHTML  

    Without a cyclic prefix (CP), most single-carrier (SC) transmissions can not adopt frequency-domain equalizer (FDE) directly. This work utilizes frequency-domain channel estimator (FD-CE) and decision-feedback aliasing canceller (DF-AC) to produce single-FFT SC-FDE. In this way, non-CP single-carrier block transmission (SCBT) can be decoded using sphere decoder of MIMO-OFDM modems to support multimode and backward compatibility under an acceptable complexity in IEEE 802.11 very high throughput (VHT). An iV-point FFT is sufficient to measure channel frequency responses (CFR) from .L-sample preambles (L ≤ JV/2). And then, M-bit block codes (M ≤ L) are decodable over frequency domains with DF-AC's help. Simulations and measurements imply that this work can ensure adequate performance, even if there is no CP existed against the distortions of multipath propagation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable SRAM Architecture With Spatial Voltage Scaling for Low Power Mobile Multimedia Applications

    Page(s): 161 - 165
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (836 KB) |  | HTML iconHTML  

    This paper presents a dynamically reconfigurable SRAM array for low-power mobile multimedia application. The proposed structure use a lower voltage for cells storing low-order bits and a nominal voltage for cells storing higher order bits. The architecture allows reconfigure the number of bits in the low-voltage mode to change the error characteristics of the array in run-time. Simulations in predictive 70 nm nodes show that the proposed array can obtain 45% savings in memory power with a marginal (~10%) reduction in image quality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Low-Jitter ADPLL via a Suppressive Digital Filter and an Interpolation-Based Locking Scheme

    Page(s): 165 - 170
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (847 KB) |  | HTML iconHTML  

    In this brief, we present a low-jitter and wide-range all-digital phase-locked loop (ADPLL). This ADPLL achieves low output clock jitter by a number of schemes. First, the phase is locked quickly through a predictive phase-locking scheme. Then, the jitter is further reduced by a suppressive digital loop filter. Finally, an interpolation-based locking scheme is utilized to enhance the resolution of the digitally controlled oscillator (DCO) so as to further reduce the phase error and jitter. Simulation results show that the jitter performance is very close to that of the free-running DCO. Measurement results show that the jitterPk-Pk and jitterRMS are 56 and 7.28 ps, respectively, when the output clock of the ADPLL is running at 600 MHz. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE copyright form

    Page(s): 171 - 172
    Save to Project icon | Request Permissions | PDF file iconPDF (1065 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu