By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 3 • Date March 2011

Filter Results

Displaying Results 1 - 21 of 21
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (44 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Editorial

    Page(s): 349 - 368
    Save to Project icon | Request Permissions | PDF file iconPDF (7775 KB)  
    Freely Available from IEEE
  • Energy and Performance Models for Synchronous and Asynchronous Communication

    Page(s): 369 - 382
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1400 KB) |  | HTML iconHTML  

    Communication costs, which have the potential to throttle design performance as scaling continues, are mathematically modeled and compared for various pipeline methodologies. First-order models are created for common pipeline protocols, including clocked flopped, clocked time-borrowing latch, asynchronous two-phase, four-phase, delay-insensitive, single-track, and source synchronous. The models are parameterized for throughput, energy, and bandwidth. The models share common parameters for different pipeline protocols and implementations to enable a fair apple-to-apple comparison. The accuracy of the models are demonstrated for complete implementations of a subset of the protocols by applying 65-nm process simulated parameter values against the SPICE simulation of full pipeline implementations. One can determine when asynchronous communication is superior at the physical level to synchronous communication in terms of energy for a given bandwidth by applying actual or expected values of the parameters to various design targets. Comparisons between protocols at fixed targets also allow designers to understand tradeoffs between implementations that have a varying process, timing, and design requirements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automating Design of Voltage Interpolation to Address Process Variations

    Page(s): 383 - 396
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1825 KB) |  | HTML iconHTML  

    Post-fabrication tuning provides a promising design approach to mitigate the performance and power overheads of process variation in advanced fabrication technologies. This paper explores design considerations and VLSI-CAD support for a recently proposed post-fabrication tuning knob called voltage interpolation. Successful implementation of this technique requires examination of the design tradeoffs between circuit tuning range and static power overheads within the synthesis flow of the design process, in addition to the implications of place and route. Results from the exploration of the scheme for a 64-core chip-multiprocessor machine using industrial-grade design blocks show that the scheme can be used to mitigate overhead arising from random and correlated within-die process variations. A design using voltage interpolation can match the nominal delay target with a 16% power cost, or for the same power budget, incur only a 13% delay overhead after variations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Information Theoretic Modeling and Analysis for Global Interconnects With Process Variations

    Page(s): 397 - 410
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (825 KB) |  | HTML iconHTML  

    As the CMOS semiconductor technology enters nanometer regime, interconnect processes must be compatible with device roadmaps and meet manufacturing targets at the specified wafer size. The resulting ubiquitous process variations cause errors in data delivering through interconnects. This paper proposes an Information Theory based design method to accommodate process variations. Different from the traditional delay based design metric, the current approach uses achievable rate to relate interconnect designs directly to communication applications. More specifically, the data communication over a typical interconnect, a bus, subject to process variations (“uncertain” bus), is defined as a communication problem under uncertainty. A data rate, called the achievable rate, is computed for such a bus, which represents the lower bound on the maximal data rate attainable over the bus. When a data rate applied over the bus is smaller than the achievable data rate, a reliable communication can be guaranteed regardless of process variations, i.e., a bit error rate arbitrarily close to zero is achievable. A single communication strategy to combat the process variations is proposed whose code rate is equal to the computed achievable rate. The simulations show that the variations in the interconnect resistivity could have the most harmful effect regarding the achievable rate reduction. Also, the simulations illustrate the importance of taking into account bus parasitic parameters correlations when measuring the influence of the process variations on the achievable rates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Decoding-Aware Compression of FPGA Bitstreams

    Page(s): 411 - 419
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (609 KB) |  | HTML iconHTML  

    Bitstream compression is important in reconfigurable system design since it reduces the bitstream size and the memory requirement. It also improves the communication bandwidth and thereby decreases the reconfiguration time. Existing research in this field has explored two directions: efficient compression with slow decompression or fast decompression at the cost of compression efficiency. This paper proposes a novel decode-aware compression technique to improve both compression and de compression efficiencies. The three major contributions of this paper are: 1) smart placement of compressed bitstreams that can significantly decrease the overhead of decompression engine; 2) selection of profitable parameters for bitstream compression; and 3) efficient combination of bitmask-based compression and run length encoding of repetitive patterns. Our proposed technique outperforms the existing compression approaches by 15%, while our decompression hardware for variable-length coding is capable of operating at the speed closest to the best known field-programmable gate array-based decoder for fixed-length coding. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Matrix Codes for Reliable and Cost Efficient Memory Chips

    Page(s): 420 - 428
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (840 KB) |  | HTML iconHTML  

    This paper presents a method to protect memories against multiple bit upsets and to improve manufacturing yield. The proposed method, called a Matrix code, combines Hamming and Parity codes to assure the improvement of reliability and yield of the memory chips in the presence of high defects and multiple bit-upsets. The method is evaluated using fault injection experiments. The results are compared to well-known techniques such as Reed-Muller and Hamming codes. The proposed technique performs better than the Hamming codes and achieves comparable performance with Reed-Muller codes with very favorable implementation gains such as 25% reduction in area and power consumption. It also achieves reliability increase by more than 50% in some cases. Further, the yield benefits provided by the proposed method, measured by the yield improvements per cost metric, is up to 300% better than the ones provided by Reed-Muller codes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High Performance and Area Efficient Flexible DSP Datapath Synthesis

    Page(s): 429 - 442
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1719 KB) |  | HTML iconHTML  

    This paper presents a new methodology for the synthesis of high performance flexible datapaths, targeting computationally intensive digital signal processing kernels of embedded applications. The proposed methodology is based on a novel coarse-grained reconfigurable/flexible architectural template, which enables the combined exploitation of the horizontal and vertical parallelism along with the operation chaining opportunities found in the application's behavioral description. Efficient synthesis techniques exploiting these architectural optimization concepts from a higher level of abstraction are presented and analyzed. Extensive experimentation showed average latency and area reductions up to 33.9% and 53.9%, respectively, and higher hardware area utilization, compared to previously published high performance coarse-grained reconfigurable datapaths. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FA-STAC: An Algorithmic Framework for Fast and Accurate Coupling Aware Static Timing Analysis

    Page(s): 443 - 456
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1025 KB) |  | HTML iconHTML  

    This paper presents an algorithmic framework for fast and accurate static timing analysis considering coupling. With technology scaling to smaller dimensions, the impact of coupling induced delay variations can no longer be ignored. Timing analysis considering coupling is iterative, and can have considerably larger run-times than a single pass approach. We propose two different classes of coupling delay models: heuristic-based coupling model and current source-based coupling model, and present techniques to increase the convergence rate of timing analysis when such coupling models are employed. Our proposed coupling model show promising accuracy improvements compared to SPICE. Experimental results on ISCAS85 benchmarks validates the effec tiveness of our efficient iteration scheme. Our iteration algorithm obtained speedups of up to 62.1 % using a heuristic coupling model while 2.7 x using a current-based coupling model in comparison to traditional approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability Analysis and Optimization of Power-Gated ICs

    Page(s): 457 - 468
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1206 KB) |  | HTML iconHTML  

    Power gating is an efficient technique for reducing the leakage power of electronic devices by disconnecting the power supply from blocks idle for long periods of time. Disconnecting gated blocks causes changes in the current densities of the grid branches and vias. For some gating configurations, dc current densities may increase in some grid locations to the extent that they violate electromigration (EM) constraints. In this paper, we analyze the EM and infrared (IR) voltage drop effects in gated global power grids. Based on our analyses, we develop a global grid sizing algorithm to satisfy the reliability constraints on grid branches and vias for all feasible gating configurations. Our experimental results indicate that a grid initially sized for all blocks connected to it may be modified to fulfill EM and IR constraints for multiple gating schedules with only a small area increase. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Row-Based Power-Gating: A Novel Sleep Transistor Insertion Methodology for Leakage Power Optimization in Nanometer CMOS Circuits

    Page(s): 469 - 482
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1287 KB) |  | HTML iconHTML  

    Leakage power has become a serious concern in nanometer CMOS technologies, and power-gating has shown to offer a viable solution to the problem with a small penalty in performance. This paper focuses on leakage power reduction through automatic insertion of sleep transistors for power-gating. In particular, we propose a novel, layout-aware methodology that facilitates sleep transistor insertion and virtual-ground routing on row-based layouts. We also introduce a clustering algorithm that is able to handle simultaneously timing and area constraints, and we extend it to the case of multi- Vt sleep transistors to increase leakage savings. The results we have obtained on a set of benchmark circuits show that the leakage savings we can achieve are, by far, superior to those obtained using existing power-gating solutions and with much tighter timing and area constraints. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of Last-Level On-Chip Cache Using Spin-Torque Transfer RAM (STT RAM)

    Page(s): 483 - 493
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1737 KB) |  | HTML iconHTML  

    Because of its high storage density with superior scalability, low integration cost and reasonably high access speed, spin-torque transfer random access memory (STT RAM) appears to have a promising potential to replace SRAM as last-level on-chip cache (e.g., L2 or L3 cache) for microprocessors. Due to unique operational characteristics of its storage device magnetic tunneling junction (MTJ), STT RAM is inherently subject to a write latency versus read latency tradeoff that is determined by the memory cell size. This paper first quantitatively studies how different memory cell sizing may impact the overall computing system performance, and shows that different computing workloads may have conflicting expectations on memory cell sizing. Leveraging MTJ device switching characteristics, we further propose an STT RAM architecture design method that can make STT RAM cache with relatively small memory cell size perform well over a wide spectrum of computing benchmarks. This has been well demonstrated using CACTI-based memory modeling and computing system performance simulations using SimpleScalar. Moreover, we show that this design method can also reduce STT RAM cache energy consumption by up to 30% over a variety of benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comprehensive Analysis and Control of Design Parameters for Power Gated Circuits

    Page(s): 494 - 498
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (515 KB) |  | HTML iconHTML  

    Power gating in circuits is one of the effective technologies to allow low leakage and high performance operations. The key design considerations in the power mode transitions of power gating technology are minimizing the wakeup delay (for achieving high performance), the peak current (for reducing power supply/ground noise), and the total size of sleep transistors (for reducing area/design complexity). This work aims to analyze and establish the relations between the three important design parameters: 1) the maximum current flowing from/to power/ground; 2) the wakeup (sleep to active mode transition) delay; and 3) the total size of sleep transistors. With the understanding of relations between the parameters, we propose solution to the problem of finding logic clusters and their wakeup schedule that minimize the wakeup delay while satisfying the peak current constraint in wakeup time and performance loss constraint in normal operation. Specifically, we solve the problem by formulating it into repeated (incremental) applications of finding a maximum clique in a graph. From the experiments using ISCAS benchmarks, it is shown that our proposed technique is able to explore the search space, finding solutions with 71% and 30% reduced sizes of sleep transistors and 39% and 54% reduced wakeup delay, compared to the results by the previous work. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A VLSI Architecture and the FPGA Prototype for MPEG-2 Audio/Video Decoding

    Page(s): 499 - 503
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (173 KB) |  | HTML iconHTML  

    This paper details our experience of developing an MPEG-2 audio/video decoder which operates at main level/main profile, 720 × 480 4:2:0 at 29.97 frames per second, with audio at 16 bits, 48 000 samples per second. The design has been developed with a focus on energy-efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Digitally Testable \Sigma -\Delta Modulator Using the Decorrelating Design-for-Digital-Testability

    Page(s): 503 - 507
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (697 KB) |  | HTML iconHTML  

    This paper demonstrates a digitally testable second-order Σ - Δ modulator. The modulator under test (MUT) employs the decorrelating design-for-digital-testability (D3T) scheme to provide two operation modes: the normal mode and the digital test mode. In the digital test mode, the input switched-capacitor network of the D3T modulator is reconfigured as two sub-digital-to-charge converters (sub-DCCs). Each of the sub-DCCs accepts a Σ - Δ modulated bit-stream as its test stimulus. By repetitively inputting the DCCs with the same Σ - Δ modulated bit-stream but with different delays, the DCCs incorporates with the integrator to generate the analog stimulus in the digital test mode. The analog stimulus is analogous to the result of filtering the bit-stream with a two-nonzero-term FIR decorrelating term. Consequently, the D3T MUT suffers less from the undesired shaped noise of the digital stimuli, and achieves better digital test accuracy. Measurement results show that the digital tests present a peak signal-to-noise-and-distortion ratio (SNDR) of 80.1 dB at an oversampling ratio of 128. The SNDR results of the digital tests differ from their conventional analog counterparts by no more than 2 dB except for the -3.2 dBFS test. The analog hardware overhead of the D3T MUT only consists of 13 switches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Runtime Resonance Noise Reduction with Current Prediction Enabled Frequency Actuator

    Page(s): 508 - 512
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (265 KB) |  | HTML iconHTML  

    Power delivery network (PDN) is a distributed resistance-inductance-capacitance (RLC) network with its dominant resonance frequency in the low-to-middle frequency range. Though high-performance chips' working frequencies are much higher than this resonance frequency in general, chip runtime loading frequency is not. When a chip executes a chunk of instructions repeatedly, the induced current load may have harmonic components close to this resonance frequency, causing excessive power integrity degradation. Existing PDN design solutions are, however, mainly targeted at reducing high-frequency noise and not effective to suppress such resonance noise. In this work, we propose a novel approach to proactively suppress this type of noise. A method based on the high dimension generalized Markov process is developed to predict current load variation. Based on such prediction, a clock frequency actuator design is proposed to proactively select an optimal clock frequency to suppress the resonance. To the best of our knowledge, this is the first in-depth study on proactively reducing instruction loop induced PDN resonance noise at the runtime. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Speed Algorithms and Architectures for Range Reduction Computation

    Page(s): 512 - 516
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (264 KB) |  | HTML iconHTML  

    Range reduction is a crucial step for accuracy in trigonometric functions evaluation. This paper shows and compares a set of algorithms for additive range reduction computation and their corresponding application-specific integrated circuit implementations (ensuring an accuracy of one unit in the last place). A word-serial architecture implementation has been used as a reference for clearer comparisons. Besides, a new table-based pipelined architecture for range reduction has also been proposed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effective Hybrid Test Program Development for Software-Based Self-Testing of Pipeline Processor Cores

    Page(s): 516 - 520
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (218 KB) |  | HTML iconHTML  

    This paper presents an effective hybrid test program for the software-based self-testing (SBST) of pipeline processor cores. The test program combines a deterministically developed program which explores different levels of processor core information and a block-based random program which consists of a combination of in-order instructions, random-order instructions, return instructions, as well as instruction sequences used to trigger exception/interrupt requests. Due to the complementary nature of this hybrid test program, it can achieve processor fault coverage that is comparable to the performance of the conventional scan chain method. The test response observation methods and their impacts on fault coverage are also investigated. We present the concept of micro observation versus macro observation and show that the most effective method of using SBST is through a multiple input signature register connected to the processor local bus, while conventional methods that observe only the program results in the memory lead to significantly less processor fault coverage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-Chip Interconnect Analysis of Performance and Energy Metrics Under Different Design Goals

    Page(s): 520 - 524
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (450 KB) |  | HTML iconHTML  

    As semiconductor process technology scales down, interconnect planning presents ever-greater challenges to designers. In this paper, we analyze, evaluate, and compare various metrics with optimized wire configurations in the contexts of different design criteria: delay minimization, delay-power minimization, and delay2 -power minimization. We show how various design criteria influence the configuration, performance, and power consumption of repeated wires. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu