By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 10 • Date Oct. 2010

Filter Results

Displaying Results 1 - 18 of 18
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Fully CMOS-Compatible On-Chip Optical Clock Distribution and Recovery

    Page(s): 1385 - 1398
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2252 KB) |  | HTML iconHTML  

    Clock distribution in the multi-gigahertz range is getting increasingly difficult due to more stringent requirements for skew and jitter on one hand and the deteriorating supply voltage integrity and process variation on the other hand. Global clock network, especially in nanometer CMOS designs with ever increasing die sizes, has become a prominent performance limiter. A potential alternative to traditional interconnect technology for achieving clock distribution beyond 10 GHz while maintaining required skew and jitter budgets is using on-chip optical interconnects. A practical on-chip optical clocking system must be CMOS compatible in order to provide attractive cost effectiveness for system level integration and ease of manufacturing. This paper presents the design of a fully CMOS compatible optical clock distribution and recovery system in a 3.3 V, 0.35-μm CMOS process. Experimental results from the test chip prove the feasibility of providing optical-electrical interface in devices and circuits in a fully CMOS compatible manufacturing environment. Although the test chips were designed in a mature CMOS process technology and the measured performance is low, the test chips demonstrated the feasibility of on-chip optoelectronic integration with fully CMOS compatible process. On-chip optical clock distribution is one of the natural applications of fully CMOS compatible on-chip optical interconnect technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Analysis of a Large-Scale Inductive Interconnect by Block-Structure-Preserved Macromodeling

    Page(s): 1399 - 1411
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1037 KB) |  | HTML iconHTML  

    To efficiently analyze the large-scale interconnect dominant circuits with inductive couplings (mutual inductances), this paper introduces a new state matrix, called VNA, to stamp inverse-inductance elements by replacing inductive-branch current with flux. The state matrix under VNA is diagonal-dominant, sparse, and passive. To further explore the sparsity and hierarchy at the block level, a new matrix-stretching method is introduced to reorder coupled fluxes into a decoupled state matrix with a bordered block diagonal (BBD) structure. A corresponding block-structure-preserved model-order reduction, called BVOR, is developed to preserve the sparsity and hierarchy of the BBD matrix at the block level. This enables us to efficiently build and simulate the macromodel within a SPICE-like circuit simulator. Experiments show that our method achieves up to 7× faster modeling building time, up to 33× faster simulation time, and as much as 67× smaller waveform error compared to SAPOR [a second-order reduction based on nodal analysis (NA)] and PACT (a first-order 2×2 structured reduction based on modified NA). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Multi-Level NAND Flash Memory Storage Reliability Using Concatenated BCH-TCM Coding

    Page(s): 1412 - 1420
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1284 KB) |  | HTML iconHTML  

    By storing more than one bit in each memory cell, multi-level per cell (MLC) NAND flash memories are dominating global flash memory market due to their appealing storage density advantage. However, continuous technology scaling makes MLC NAND flash memories increasingly subject to worse raw storage reliability. This paper presents a memory fault tolerance design solution geared to MLC NAND flash memories. The basic idea is to concatenate trellis coded modulation (TCM) with an outer BCH code, which can greatly improve the error correction performance compared with the current design practice that uses BCH codes only. The key is that TCM can well leverage the multi-level storage characteristic to reduce the memory bit error rate and hence relieve the burden of outer BCH code, at no cost of extra redundant memory cells. The superior performance of such concatenated BCH-TCM coding systems for MLC NAND flash memories has been well demonstrated through computer simulations. A modified TCM demodulation approach is further proposed to improve the tolerance to static memory cell defects. We also address the associated practical implementation issues in case of using either single-page or multi-page programming strategy, and demonstrate the silicon implementation efficiency through application-specific integrated circuit design at 65 nm node. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stochastic Networked Computation

    Page(s): 1421 - 1432
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3035 KB) |  | HTML iconHTML  

    In this paper, the stochastic networked computation (SNC) paradigm for designing robust and energy-efficient systems-on-a-chip in nanoscale process technologies, where robust computation is treated as a statistical estimation problem is presented. The benefits of SNC are demonstrated by employing it to design an energy-efficient and robust pseudonoise-code acquisition system for the wireless CDMA2000 standard (http://www.3gpp2.org). Simulations in IBM's 130-nm CMOS process show that the SNC-based architecture enhances the average probability of detection (PDet) in the presence of process variations by two to three orders of magnitude, reduces power by 31%-39%, and reduces the variation in PDet by one to two orders of magnitude at a typical false-alarm rate of 5% over a conventional architecture. SNC performance in the presence of voltage overscaling and across technology nodes (90, 65, 45, and 32 nm) is also studied. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fault-Tolerant Interconnect Mechanism for NMR Nanoarchitectures

    Page(s): 1433 - 1446
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1708 KB) |  | HTML iconHTML  

    Redundancy techniques, such as N -tuple modular redundancy (NMR), has been widely used to correct faulty behavior of components and achieve high reliability. Almost all redundancy-based strategies rely on a majority voting. The voter, therefore, becomes a critical unit for the correct operation of any NMR system. In this paper, we propose a voterless fault-tolerant strategy to implement a robust NMR system design. We show that using a novel fault-tolerant communication mechanism, namely logic code division multiple access, we can transfer data with extremely low error rates among N modules and completely eliminate the need for a centralized voter unit. Such a highly reliable strategy is vital for future nanosystems in which high defect rate is expected. Experimental results are also reported to verify the concept, clarify the design procedure, and measure the system's reliability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Area-Efficient and Low-Power Multirate Decoder for Quasi-Cyclic Low-Density Parity-Check Codes

    Page(s): 1447 - 1460
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2548 KB) |  | HTML iconHTML  

    The quasi-cyclic low-density parity-check (QC-LDPC) codes are widely applied in digital broadcast and communication systems. However, the decoders are still difficult to be put into practice due to their large area and high power, especially in the wireless mobile devices. This paper presents an improved all-purpose multirate iterative decoder architecture for QC-LDPC codes, which can largely reduce their area and power. The architecture implements the normalized min-sum algorithm, rearranges the original two-phase message-passing flow, and adopts an efficient quantization method for the second minimum absolute values, an optimized storing scheme for the position indexes and signs, and an elaborate clock gating technique for substantive memories and registers. It is also configurable for any regular and irregular QC-LDPC codes, and can be easily tuned up to different code rates and code word lengths. The chip is fabricated in an SMIC 0.18- six-metal-layer standard CMOS technology. It attains a throughput of 104.5 Mb/s, and dissipates an average power of 486 mW at 125 MHz, and 15 decoding iterations. The core area is only 9.76 mm2. The chip has been applied into the China digital terrestrial/television multimedia broadcasting system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Process-Variation Resilient and Voltage-Scalable DCT Architecture for Robust Low-Power Computing

    Page(s): 1461 - 1470
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1648 KB) |  | HTML iconHTML  

    In this paper, we present a novel discrete cosine transform (DCT) architecture that allows aggressive voltage scaling for low-power dissipation, even under process parameter variations with minimal overhead as opposed to existing techniques. Under a scaled supply voltage and/or variations in process parameters, any possible delay errors appear only from the long paths that are designed to be less contributive to output quality. The proposed architecture allows a graceful degradation in the peak SNR (PSNR) under aggressive voltage scaling as well as extreme process variations. Results show that even under large process variations (±3σ around mean threshold voltage) and aggressive supply voltage scaling (at 0.88 V, while the nominal voltage is 1.2 V for a 90-nm technology), there is a gradual degradation of image quality with considerable power savings (71% at PSNR of 23.4 dB) for the proposed architecture, when compared to existing implementations in a 90-nm process technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design Space Exploration for Efficient Resource Utilization in Coarse-Grained Reconfigurable Architecture

    Page(s): 1471 - 1482
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2166 KB) |  | HTML iconHTML  

    Coarse-grained reconfigurable architectures (CGRAs) aim to achieve both goals of high performance and flexibility. In addition, power consumption is significant for the reconfigurable architecture to be used as a competitive processing core in embedded systems. However, the existing reconfigurable architectures require too much area and power. In this paper, we propose a new design space exploration flow, optimizing CGRA to reduce area and power with enhancing performance for digital signal processing (DSP) application domain. It reduces the array size through efficient arrangement of array components and customization of their interconnection, exploiting input patterns belonging to the DSP application domain. Such a design flow is based on pipelining and sharing of area/delay-critical resources in the processing element array. Experimental results show that for DSP applications, the proposed approach reduces area by up to 36.75%, average execution time by 36.78%, and average power by 31.85% when compared with the existing CGRA architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Correlation-Based Rectangular Encoding

    Page(s): 1483 - 1492
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (520 KB) |  | HTML iconHTML  

    In this paper, a technique is presented for improving the compression achieved with any linear decompressor by adding a small nonlinear decoder that exploits bit-wise and pattern-wise correlations present in test vectors. The proposed nonlinear decoder has a regular and compact structure, and allows continuous-flow decompression. It has a very important feature, which is that its design does not depend on the test data. This simplifies the design flow and allows the decoder to be reused when testing multiple cores on a chip. Experimental results show that combining a linear decompressor with the small nonlinear decoder proposed here significantly improves the overall compression. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variable-Latency Floating-Point Multipliers for Low-Power Applications

    Page(s): 1493 - 1497
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (455 KB) |  | HTML iconHTML  

    This paper proposes a variable-latency floating-point multiplier architecture, which is compliant with IEEE 754-1985 and suitable for low-power applications. The architecture splits the significand multiplier into the upper and lower parts, and predicts the carry bit, sticky bit, and significand product from the upper part. In the case of correct prediction, the computation of lower part is disabled and the rounding operation is significantly simplified so that the floating-point multiplication can consume less power, and be completed early while maintaining the correct IEEE rounding and product. Experimental results show that the proposed multiplier can save respectable power and energy when compared to the fast multiplier at the expense of slight area and acceptable delay overheads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Implementation of a Sort-Free K-Best Sphere Decoder

    Page(s): 1497 - 1501
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (624 KB) |  | HTML iconHTML  

    This paper describes the design and very-large-scale integration (VLSI) architecture for a 4 × 4 breadth-first K-best multiple-input-multiple-output (MIMO) decoder using a 64 quadrature-amplitude modulation (QAM) scheme. A novel sort-free approach to path extension, as well as quantized metrics result in a high-throughput VLSI architecture with lower power and area consumption compared to state-of-the-art published systems. Functionality is confirmed via a field-programmable gate array (FPGA) implementation on a Xilinx Virtex II Pro FPGA. Comparison of simulation and measurements are given, and FPGA utilization figures are provided. Finally, VLSI architectural tradeoffs are explored for a synthesized application-specific IC (ASIC) implementation in a 65-nm CMOS technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Search for T-VLSI Editor-in-Chief

    Page(s): 1502
    Save to Project icon | Request Permissions | PDF file iconPDF (145 KB)  
    Freely Available from IEEE
  • IEEE Foundation [advertisement]

    Page(s): 1503
    Save to Project icon | Request Permissions | PDF file iconPDF (320 KB)  
    Freely Available from IEEE
  • Scitopia.org [advertisement]

    Page(s): 1504
    Save to Project icon | Request Permissions | PDF file iconPDF (270 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu