By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 5 • Date May 2013

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (790 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (136 KB)  
    Freely Available from IEEE
  • Design of Ternary Logic Combinational Circuits Based on Quantum Dot Gate FETs

    Page(s): 793 - 806
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1485 KB) |  | HTML iconHTML  

    In this paper, we discuss logic circuit designs using the circuit model of three-state quantum dot gate field effect transistors (QDGFETs). QDGFETs produce one intermediate state between the two normal stable ON and OFF states due to a change in the threshold voltage over this range. We have developed a simplified circuit model that accounts for this intermediate state. Interesting logic can be implemented using QDGFETs. In this paper, we discuss the designs of various two-input three-state QDGFET gates, including NAND- and NOR-like operations and their application in different combinational circuits like decoder, multiplier, adder, and so on. Increased number of states in three-state QDGFETs will increase the number of bit-handling capability of this device and will help us to handle more number of bits at a time with less circuit elements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parametric DFM Solution for Analog Circuits: Electrical-Driven Hotspot Detection, Analysis, and Correction Flow

    Page(s): 807 - 820
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2255 KB) |  | HTML iconHTML  

    As VLSI technology pushes into advanced nodes, designers and foundries have exposed a hitherto insignificant set of yield problems. To combat yield failures, the semiconductor industry has deployed new tools and methodologies commonly referred to as design for manufacturing (DFM). Most of the early DFM efforts concentrated on catastrophic failures, or physical DFM problems. Recently, there has been an increased emphasis on parametric yield issues, referred to as electrical-DFM (e-DFM). In this paper, we present a complete e-DFM solution that detects, analyzes, and fixes electrical hotspots (e-hotspots) within an analog circuit design that are caused by different process variations. Novel algorithms are proposed to implement the engines used to develop this solution. The solution is examined on a 130-nm parametrically-failing level shifter circuit, and verified with silicon wafer measurements that confirm the existence of parametric yield issues in the design. Additional experiments are applied on a 65-nm industrial operational amplifier and voltage control oscillator (VCO). E-hotspot devices with a 27.7% variation in dc current are identified. After fixing the e-hotspots, the dc current variation in these devices is dramatically reduced to 7%, which meets the designer acceptance criteria, while saving the original VCO specifications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unified Capture Scheme for Small Delay Defect Detection and Aging Prediction

    Page(s): 821 - 833
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (984 KB) |  | HTML iconHTML  

    Small delay defect (SDD) and aging-induced circuit failure are both prominent reliability concerns for nanoscale integrated circuits. Faster-than-at-speed testing is effective on SDD detection in manufacturing testing, which is always implemented by designing a suite of test signal generation circuits on the chip. Meanwhile, the integration of online aging sensors is becoming attractive in monitoring aging-induced delay degradation in the runtime. These design requirements, if implemented in separate ways, will increase the complexity of a reliable design and consume more die area. In this paper, a unified capture scheme is proposed to generate programmable clock signals for the detection of both SDDs and circuit aging. Our motivation arises from the observations that SDD detection and online aging prediction both need to capture circuit response ahead of the functional clock. The proposed aging-resistant design method enables the offline test circuit to be reused in online operations. Reversed short channel effect is also exploited to make the underlying circuit resilient to process variations. The proposed scheme is validated by intensive HSPICE simulations. Experimental results demonstrate the effectiveness in terms of low area, power, and performance overheads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel MIMO Detection Algorithm for High-Order Constellations in the Complex Domain

    Page(s): 834 - 847
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1899 KB) |  | HTML iconHTML  

    A novel detection algorithm with an efficient VLSI architecture featuring efficient operation over infinite complex lattices is proposed. The proposed design results in the highest throughput, the lowest latency, and the lowest energy compared to the complex-domain VLSI implementations to date. The main innovations are a novel complex-domain means of expanding/visiting the intermediate nodes of the search tree on demand, rather than exhaustively, as well as a new distributed sorting scheme to keep track of the best candidates at each search phase. Its support of unbounded infinite lattice decoding distinguishes the present method from previous K-Best strategies and also allows its complexity to scale sublinearly with the modulation order. Since the expansion and sorting cores are data-driven, the architecture is well suited for a pipelined parallel VLSI implementation. The proposed algorithm is used to fabricate a 4×4, 64-QAM complex multiple-input-multiple-output detector in a 0.13-μm CMOS technology, achieving a clock rate of 417 MHz with the core area of 340 kgates. The chip test results prove that the fabricated design can sustain a throughput of 1 Gb/s with energy efficiency of 110 pJ/bit, the best numbers reported to date. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Throughput 0.13- \mu{\rm m} CMOS Lattice Reduction Core Supporting 880 Mb/s Detection

    Page(s): 848 - 861
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1836 KB) |  | HTML iconHTML  

    This paper presents the first silicon-proven implementation of a lattice reduction (LR) algorithm, which achieves maximum likelihood diversity. The implementation is based on a novel hardware-optimized due to the Lenstra, Lenstra, and Lovász (LLL) algorithm, which significantly reduces its complexity by replacing all the computationally intensive LLL operations (multiplication, division, and square root) with low-complexity additions and comparisons. The proposed VLSI design utilizes a pipelined architecture that produces an LR-reduced matrix set every 40 cycles, which is a 60% reduction compared to current state-of-the-art LR field-programmable gate array implementations. The 0.13-μm CMOS LR core presented in this paper achieves a clock rate of 352 MHz, and thus is capable of sustaining a throughput of 880 Mb/s for 64-QAM multiple-input-multiple-output detection with superior performance while dissipating 59.4 mW at 1.32 V supply. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Study of Through-Silicon-Via Impact on the 3-D Stacked IC Layout

    Page(s): 862 - 874
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1926 KB) |  | HTML iconHTML  

    The technology of through-silicon vias (TSVs) enables fine-grained integration of multiple dies into a single 3-D stack. TSVs occupy significant silicon area due to their sheer size, which has a great effect on the quality of 3-D integrated chips (ICs). Whereas well-managed TSVs alleviate routing congestion and reduce wirelength, excessive or ill-managed TSVs increase the die area and wirelength. In this paper, we investigate the impact of the TSV on the quality of 3-D IC layouts. Two design schemes, namely TSV co-placement (irregular TSV placement) and TSV site (regular TSV placement), and accompanying algorithms to find and optimize locations of gates and TSVs are proposed for the design of 3-D ICs. Two TSV assignment algorithms are also proposed to enable the regular TSV placement. Simulation results show that the wirelength of 3-D ICs is shorter than that of 2-D ICs by up to 25%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of Hardware Function Evaluators Using Low-Overhead Nonuniform Segmentation With Address Remapping

    Page(s): 875 - 886
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1039 KB) |  | HTML iconHTML  

    In the piecewise function evaluation with polynomial approximation, nonuniform segmentation can effectively reduce the size of lookup tables for some arithmetic functions compared to uniform segmentation approaches, at the cost of the extra segment address (index) encoder that results in area and delay overhead. Also, it is observed that the nonuniform segmentation reflects a design tradeoff between the ROM size and the area cost of the subsequent arithmetic computation hardware. In this paper, we propose a new nonuniform segmentation method that searches for the optimal segmentation scheme with the goal of minimized ROM, total area, or delay. For some high-variation arithmetic functions, the proposed segmentation method achieves significant area reduction compared to the uniform segmentation method. We also demonstrate the design tradeoff among uniform and nonuniform segmentation, and degree-one and degree-two polynomial approximations, with respect to precision ranging from 12 to 32 bits for the elementary function of reciprocal. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical Functional Yield Estimation and Enhancement of CNFET-Based VLSI Circuits

    Page(s): 887 - 900
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2050 KB) |  | HTML iconHTML  

    Carbon nanotube field effect transistors (CNFETs) show great promise as extensions to silicon CMOS. However, imperfections, which are mainly related to carbon nanotubes (CNTs) growth process, result in metallic and nonuniform CNTs leading to significant functional yield reduction. This paper presents a comprehensive technique for statistical functional yield estimation and enhancement of CNFET-based VLSI circuits. Based on experimental data extracted from aligned CNTs, we propose a compact statistical model to estimate the failure probability of a CNFET. Using the proposed failure model, we show that enhancing the CNT synthesis process alone cannot achieve acceptable functional yield for upcoming CNFET-based VLSI circuits. We propose a technique which is based on replacing each transistor by series-parallel transistor structures to reduce the failure probability of CNFETs in the presence of metallic and nonuniform CNTs. The technique is adapted to use single directional independence, which is inherent in aligned CNTs, to enhance the functional yield as validated by theoretical analysis and simulation results. Tradeoffs between failure probability reduction and design overheads such as area and current drive are explored. As demonstrated by extensive simulation results, the proposed technique achieves 80% functional yield in CNFET technology at the cost of 7.5X area and 34% current drive overheads if the CNT density and the fraction of semiconducting CNTs are improved to 200 CNTs per μm and 99.99%, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theoretical Modeling of Elliptic Curve Scalar Multiplier on LUT-Based FPGAs for Area and Speed

    Page(s): 901 - 909
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (550 KB) |  | HTML iconHTML  

    This paper uses a theoretical model to approximate the delay of different characteristic two primitives used in an elliptic curve scalar multiplier architecture (ECSMA) implemented on k input lookup table (LUT)-based field-programmable gate arrays. Approximations are used to determine the delay of the critical paths in the ECSMA. This is then used to theoretically estimate the optimal number of pipeline stages and the ideal placement of each stage in the ECSMA. This paper illustrates suitable scheduling for performing point addition and doubling in a pipelined data path of the ECSMA. Finally, detailed analyses, supported with experimental results, are provided to design the fastest scalar multiplier over generic curves. Experimental results for GF(2163) show that, when the ECSMA is suitably pipelined, the scalar multiplication can be performed in only 9.5 μs on a Xilinx Virtex V. Notably the design has an area which is significantly smaller than other reported high-speed designs, which is due to the better LUT utilization of the underlying field primitives. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture for Real-Time Nonparametric Probability Density Function Estimation

    Page(s): 910 - 920
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1617 KB) |  | HTML iconHTML  

    Adaptive systems are increasing in importance across a range of application domains. They rely on the ability to respond to environmental conditions, and hence real-time monitoring of statistics is a key enabler for such systems. Probability density function (PDF) estimation has been applied in numerous domains; computational limitations, however, have meant that proxies are often used. Parametric estimators attempt to approximate PDFs based on fitting data to an expected underlying distribution, but this is not always ideal. The density function can be estimated by rescaling a histogram of sampled data, but this requires many samples for a smooth curve. Kernel-based density estimation can provide a smoother curve from fewer data samples. We present a general architecture for nonparametric PDF estimation, using both histogram-based and kernel-based methods, which is designed for integration into streaming applications on field-programmable gate array (FPGAs). The architecture employs heterogeneous resources available on modern FPGAs within a highly parallelized and pipelined design, and is able to perform real-time computation on sampled data at speeds of over 250 million samples per second, while extracting a variety of statistical properties. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 1.2-mW Online Learning Mixed-Mode Intelligent Inference Engine for Low-Power Real-Time Object Recognition Processor

    Page(s): 921 - 933
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1574 KB) |  | HTML iconHTML  

    Object recognition is computationally intensive and it is challenging to meet 30-f/s real-time processing demands under sub-watt low-power constraints of mobile platforms even for heterogeneous many-core architecture. In this paper, an intelligent inference engine (IIE) is proposed as a hardware controller for a many-core processor to satisfy the requirements of low-power real-time object recognition. The IIE exploits learning and inference capabilities of the neurofuzzy system by adopting the versatile adaptive neurofuzzy inference system (VANFIS) with the proposed hardware-oriented learning algorithm. Using the programmable VANFIS, the IIE can configure its hardware topology adaptively for different target classifications. Its architecture contains analog/digital mixed-mode neurofuzzy circuits for updating online parameters to increase attention efficiency of object recognition process. It is implemented in 0.13-μm CMOS process and achieves 1.2-mW power consumption with 94% average classification accuracy within 1-μs operation delay. The 0.765-mm2 IIE achieves 76% attention efficiency and reduces power and processing delay of the 50-mm2 image processor by up to 37% and 28%, respectively, when 96% recognition accuracy is achieved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Current-Comparison-Based Domino: New Low-Leakage High-Speed Domino Circuit for Wide Fan-In Gates

    Page(s): 934 - 943
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB) |  | HTML iconHTML  

    In this paper, a new domino circuit is proposed, which has a lower leakage and higher noise immunity without dramatic speed degradation for wide fan-in gates. The technique which is utilized in this paper is based on comparison of mirrored current of the pull-up network with its worst case leakage current. The proposed circuit technique decreases the parasitic capacitance on the dynamic node, yielding a smaller keeper for wide fan-in gates to implement fast and robust circuits. Thus, the contention current and consequently power consumption and delay are reduced. The leakage current is also decreased by exploiting the footer transistor in diode configuration, which results in increased noise immunity. Simulation results of wide fan-in gates designed using a 16-nm high-performance predictive technology model demonstrate 51% power reduction and at least 2.41× noise-immunity improvement at the same delay compared to the standard domino circuits for 64-bit OR gates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Symbolic Moment Computation for Statistical Analysis of Large Interconnect Networks

    Page(s): 944 - 957
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (936 KB) |  | HTML iconHTML  

    The shrinking technology feature size and dense large-scale integration make process variation a challenging issue directly confronting the latest design automation tools. Process variation causes severe variation in interconnect networks, including very large-scale integrated interconnect structures, such as clock trees, clock mesh, power-ground networks, and other wiring structures in 3-D integrated circuits. The traditional moment computation techniques are only partly useful for analyzing such variational problems, however, their computational efficiency cannot meet the quickly rising needs, such as statistical analysis. This paper presents a novel symbolic moment calculator (SMC) for variational interconnect analysis. The moment calculator is constructed in a regular data structure that incorporates binary decision diagrams for data storage and computation. Given an interconnect circuit, such a computation diagram has to be constructed only once and can be repeatedly invoked for computation of moments with varying parameter values. Also, the SMC is friendly to interconnect synthesis in that it can be incrementally modified according to the modifications made to the circuit structure. Applications of the SMC for fast moment computation, sensitivity analysis, and statistical timing analysis are addressed. Significant efficiency is demonstrated comparing to other existing methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Uncorrelated Power Supply Noise and Ground Bounce Consideration for Test Pattern Generation

    Page(s): 958 - 970
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3023 KB) |  | HTML iconHTML  

    Power supply noise and ground bounce can cause considerable path delay variations. Capturing the worst case power supply noise at a gate level is not a sufficient indicator for measuring the worst case path delay. Furthermore, path delay variations depend on multiple parameters such as input stimuli, cell placement, switching frequency, and available decoupling capacitors. All these variables obscure the rapport between supply noise and path delay and make the selection of stimuli for worst case path delay a difficult task during test pattern generation. In this paper, we utilize power supply noise and ground bounce distribution along with physical design data to generate test patterns for capturing worst case path delay. We propose accurate close-form mathematical models for capturing the effect of power supply noise and ground bounce on path delay. These models are based on modified nodal analysis formulation of power and ground networks, where current waveforms are obtained from levelized simulation and cell library characterization. The proposed test pattern generation flow is a simulated-annealing-based iterative process, which utilizes mathematical models for capturing the impact of supply noise on path delay for a given input pattern. We perform experiments on ITC'99 benchmarks and show that path delay variation can be considerable if test patterns are not properly selected. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • C-Based Complex Event Processing on Reconfigurable Hardware

    Page(s): 971 - 974
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (528 KB) |  | HTML iconHTML  

    This brief presents an efficient complex event-processing framework, designed to process a large number of sequential events on field-programmable gate arrays (FPGAs). Unlike conventional structured query language based approaches, our approach features logic automation constructed with a new C-based event language that supports regular expressions on the basis of C functions, so that a wide variety of event-processing applications can be efficiently mapped to FPGAs. Evaluations on an FPGA-based network interface card show that we can achieve 12.3 times better event-processing performance than does a CPU software in a financial trading application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reduced-Complexity LCC Reed–Solomon Decoder Based on Unified Syndrome Computation

    Page(s): 974 - 978
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (802 KB) |  | HTML iconHTML  

    Reed-Solomon (RS) codes are widely used in digital communication and storage systems. Algebraic soft-decision decoding (ASD) of RS codes can obtain significant coding gain over the hard-decision decoding (HDD). Compared with other ASD algorithms, the low-complexity Chase (LCC) decoding algorithm needs less computation complexity with similar or higher coding gain. Besides employing complicated interpolation algorithm, the LCC decoding can also be implemented based on the HDD. However, the previous syndrome computation for 2η test vectors and the key equation solver (KES) in the HDD requires long latency and remarkable hardware. In this brief, a unified syndrome computation algorithm and the corresponding architecture are proposed. Cooperating with the KES in the reduced inversion-free Berlekamp-Messy algorithm, the reduced-complexity LCC RS decoder can speed up by 57% and the area will be reduced to 62% compared with the original design for η = 3. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Subthreshold Dual Mode Logic

    Page(s): 979 - 983
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (574 KB) |  | HTML iconHTML  

    In this brief, we introduce a novel low-power dual mode logic (DML) family, designed to operate in the subthreshold region. The proposed logic family can be switched between static and dynamic modes of operation according to system requirements. In static mode, the DML gates feature very low-power dissipation with moderate performance, while in dynamic mode they achieve higher performance, albeit with increased power dissipation. This is achieved with a simple and intuitive design concept. SPICE and Monte Carlo simulations compare performance, power dissipation, and robustness of the proposed DML gates to their CMOS and domino counterparts in the 80-nm process. Measurements of an 80-nm test chip are presented in order to prove the proposed concept. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Power Network Optimization Based on Link Breaking Methodology

    Page(s): 983 - 987
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (618 KB) |  | HTML iconHTML  

    A link breaking methodology is introduced to reduce voltage degradation within mesh structured power distribution networks. The resulting power distribution network combines a single power distribution network to lower the network impedance, and multiple networks to reduce noise coupling among the circuits. Since the sensitivity to supply voltage variations within a power distribution network can vary among various circuits, the proposed methodology reduces the voltage drop at the more sensitive circuits, while penalizes the less sensitive circuits. Each circuit can behave as an aggressor as well as a victim. The methodology utilizes two matrices describing the aggressiveness and sensitivity of a circuit. The proposed methodology is evaluated for multiple case studies, demonstrating a reduction in the voltage drop in the sensitive circuits. Based on these case studies, the voltage is improved by 5% at those nodes with the highest sensitivity. The voltage prior to application of the link breaking methodology is 96% of the ideal power supply voltage. Lowering the noise on the power network enhances the maximum operating frequency by 16% by utilizing the proposed link breaking methodology. The link breaking methodology has also been compared with a multiple voltage domain methodology, achieving 7% improvement in operating frequency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems information for authors

    Page(s): 988
    Save to Project icon | Request Permissions | PDF file iconPDF (98 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (94 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu