By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 12 • Date Dec. 2011

Filter Results

Displaying Results 1 - 25 of 27
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • CS-CMOS: A Low-Noise Logic Family for Mixed Signal SoCs

    Page(s): 2141 - 2148
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (510 KB) |  | HTML iconHTML  

    Managing the switching-noise in mixed-signal systems fabricated on a single chip is becoming increasingly challenging. This needs substantial overheads in both area and power. Existing logic families that minimize switching-noise generation, such as current-steering logic (CSL), current-balanced logic (CBL) etc. require considerably more power than traditional CMOS implementations. We present a new logic family called the current-steering CMOS (CS-CMOS) obtained by a simple modification keeping the core CMOS structure in tact to preserve its most attractive features. This family not only reduces the switching noise by a factor of ten but also delivers five times higher speed than CSL and CBL for the same power consumption. Experimental results comparing 15-stage ring-oscillators configured in the CSL and CS-CMOS families and fabricated in a 0.18 μm process show that their energy-delay-products are 6.5 fJ*ns and 1.52 fJ*ns respectively. The usefulness of this new logic family is further demonstrated by synthesizing a cell library of CS-CMOS gates and by using it to simulate benchmark circuits, a decimation filter and a frequency divider. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Modulo 2^{n}+1 Multipliers

    Page(s): 2149 - 2157
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1943 KB) |  | HTML iconHTML  

    Area-time efficient modulo (2n+1) multipliers are proposed. The result and one operand for the new modulo multipliers use weighted representation, while the other uses the diminished-1. By using the radix-4 Booth recoding, the new multipliers reduce the number of the partial products to n/2 for n even and (n+1)/2 for n odd except for one correction term. Although one correction term is used, the circuit is very simple. The architecture for the new multipliers consists of an inverted end-around-carry carry save adder tree and one diminished-1 adder. The new multipliers receive full inputs and avoid (n+1)-bit circuits. The analytical and experimental results indicate that the new multipliers offer enhanced operation speed and more compact area among all the efficient existing solutions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transition-Code Based Linearity Test Method for Pipelined ADCs With Digital Error Correction

    Page(s): 2158 - 2169
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1545 KB) |  | HTML iconHTML  

    A transition-code based method is proposed to reduce the linearity testing time of pipelined analog-to-digital converters (ADCs). By employing specific architecture-dependent rules, only a few specific transition codes need to be measured to accomplish the accurate linearity test of a pipelined ADC. In addition, a simple digital Design-for-Test (DfT) circuit is proposed to help correctly detect transition codes corresponding to each pipelined stage. With the help of the DfT circuit, the proposed method can be applied for pipelined ADCs with digital error correction (DEC). Experimental results of a practical chip show that the proposed method can achieve high test accuracy for a 12-bit 1.5-bit/stage pipelined ADC with different nonlinearities by measuring only 9.3% of the total measured samples of the conventional histogram based method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Test Flow for One-Time-Programming Applications of NROM Technology

    Page(s): 2170 - 2183
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1599 KB) |  | HTML iconHTML  

    The NROM technology is an emerging non-volatile-memory technology providing high data density with low fabrication cost. In this paper, we propose a novel test flow for the one-time-programming (OTP) applications using the NROM bit cells. Unlike the conventional test flow, the proposed flow applies the repair analysis in its package test instead of in its wafer test, and hence creates a chance for reusing the bit cells originally identified as a defect to represent the value in the OTP application. Thus, the proposed test flow can reduce the number of bit cells to be repaired and further improve the yield. Also, we propose an efficient and effective estimation scheme to predict the probability of a part being successfully repaired before packaged. This estimation can be used to determine whether a part should be packaged, such that the total profit of the proposed test flow can be optimized. A series of experiments are conducted to demonstrate the effectiveness, efficiency, and feasibility of the proposed test flow. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Built-in Self-Diagnosis and Repair Design With Fail Pattern Identification for Memories

    Page(s): 2184 - 2194
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1062 KB) |  | HTML iconHTML  

    With the advent of deep-submicrometer VLSI technology, the capacity and performance of semiconductor memory chips is increasing drastically. This advantage also makes it harder to maintain good yield. Diagnostics and redundancy repair methodologies thus are getting more and more important for memories, including embedded ones that are popular in system chips. In this paper, we propose an efficient memory diagnosis and repair scheme based on fail-pattern identification. The proposed diagnosis scheme can distinguish among row, column, and word faults, and subsequently apply the Huffman compression method for fault syndrome compression. This approach reduces the amount of data that need to be transmitted from the chip under test to the automatic test equipment (ATE) without losing fault information. It also simplifies the analysis that has to be performed on the ATE. The proposed redundancy repair scheme is assisted by fail-pattern identification approach and a flexible redundancy structure. The area overhead for our built-in self-repair (BISR) design is reasonable. Our repair scheme uses less redundancy than other redundancy schemes under the same repair rate requirement. Experimental results show that the area overhead of the BISR design is only 4.1% for an 8 K × 64 memory and is in inverse proportion to the memory size. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance and Cost Tradeoffs in Metal-Programmable Structured ASICs (MPSAs)

    Page(s): 2195 - 2208
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2566 KB) |  | HTML iconHTML  

    As process technology scales, the design effort and nonrecurring engineering (NRE) costs associated with the development of integrated circuits is becoming extremely high. Structured ASICs offer one solution to these problems. However, to realize their full potential, their performance and cost advantages, architectures, and CAD must be fully understood. We believe that this can lead to wider adoption of structured ASICs. In this paper, we take a step in this direction and investigate the area, delay, power, and cost tradeoffs in metal-programmable structured ASICs (MPSAs). In particular, we quantify the impact of the number of user-defined (custom) metal mask layers on these metrics. Results indicate that for lowest cost, the number of custom layers should be minimized, especially for small die sizes (e.g., less than 100 mm2). Delay and power, however, can be improved by a few additional custom layers. With two custom metal layers, MPSAs can be 2x - 10x cheaper than cell-based ICs (CBICs). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gate Leakage Impact on Full Open Defects in Interconnect Lines

    Page(s): 2209 - 2220
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2052 KB) |  | HTML iconHTML  

    An Interconnect full open defect breaks the connection between the driver and the gate terminals of downstream transistors, generating a floating line. The behavior of floating lines is known to depend on several factors, namely parasitic capacitances to neighboring structures, transistor capacitances of downstream gate(s) and trapped charges. For nanometer CMOS technologies, the reduction of oxide thickness leads to a significant increase in gate tunneling leakage. This new phenomenon influences the behavior of circuits with interconnect full open defects. Floating lines can no longer be considered electrically isolated and are subjected to transient evolutions, reaching a steady state determined by the technology, downstream interconnect and gate(s) topology. The occurrence of such defects and the impact of gate tunneling leakage are expected to increase in the future. In this work, interconnect full open defects affecting nanometer CMOS technologies are analyzed and the defective logic response of downstream gates after reaching the steady state is predicted. Experimental evidence of this behavior is presented for circuits belonging to a 180 nm and a 65 nm CMOS technologies. Technology trends show that the impact of gate leakage currents is expected to increase in future technologies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Reconfigurable FIR Filter Architecture to Trade Off Filter Performance for Dynamic Power Consumption

    Page(s): 2221 - 2228
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (865 KB) |  | HTML iconHTML  

    This paper presents an architectural approach to the design of low power reconfigurable finite impulse response (FIR) filter. The approach is well suited when the filter order is fixed and not changed for particular applications, and efficient trade-off between power savings and filter performance can be made using the proposed architecture. Generally, FIR filter has large amplitude variations in input data and coefficients. Considering the amplitude of both the filter coefficients and inputs, the proposed FIR filter dynamically changes the filter order. Mathematical analysis on power savings and filter performance degradation and its experimental results show that the proposed approach achieves significant power savings without seriously compromising the filter performance. The power savings is up to 41.9% with minor performance degradation, and the area overhead of the proposed scheme is less than 5.3% compared to the conventional approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Analytical Model Relating FPGA Architecture to Logic Density and Depth

    Page(s): 2229 - 2242
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (945 KB) |  | HTML iconHTML  

    This paper presents an analytical model that relates FPGA architectural parameters to the logic size and depth of an FPGA implementation. In particular, the model relates the lookup-table size, the cluster size, and the number of inputs per cluster to the amount of logic that can be packed into each lookup-table and cluster, the number of used inputs per cluster, and the depth of the circuit after technology mapping and clustering. Comparison to experimental results shows that our model has good accuracy. We illustrate how the model can be used in FPGA architectural investigations to complement the experimental approach. The model's accuracy, combined with the simple form of the equations, make them a powerful tool for FPGA architects to better understand and guide the development of future FPGA architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Power Delivery for Multicore Systems

    Page(s): 2243 - 2255
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1173 KB) |  | HTML iconHTML  

    As the industry moves from single- to multicore processors, the challenges of how to reliably design and analyze power delivery for such systems arise. We study various workload assignments to cores and their effect on the global power supply noise and ground bounce. We provide a detailed analysis of single and multiple cores and develop analytical formulas to capture the power supply noise and ground bounce of the system. We introduce metrics to estimate the amount of noise propagated from core to core and propose a supply noise aware workload assignment method. In our experiments, we show that timing constraints can be significantly affected if workload assignments are not properly made. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Clock Distribution Networks in 3-D Integrated Systems

    Page(s): 2256 - 2266
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2024 KB) |  | HTML iconHTML  

    3-D integration is an important technology that addresses fundamental limitations in on-chip interconnects. Several design issues related to 3-D circuits, such as multiplane synchronization, however, need to be addressed. A comparison of three 3-D clock distribution network topologies is presented in this paper. Good agreement is shown between the modeled and experimental results of a 3-D test circuit composed of three device planes. Successful operation of the 3-D test circuit at 1.4 GHz is demonstrated. Clock skew, clock delay, signal slew, and power dissipation measurements for the different clock topologies are also provided. The measurements suggest that each topology provides certain advantages and disadvantages in terms of different performance criteria. The proper choice, consequently, of a clock distribution network is not dictated by a single design objective but rather by the overall 3-D system design requirements including availability of resources and number of bonded planes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 0.84 ps Resolution Clock Skew Measurement via Subsampling

    Page(s): 2267 - 2275
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1555 KB) |  | HTML iconHTML  

    An all-digital on-chip clock skew measurement system via subsampling is presented. The clock nodes are subsampled with a near-frequency asynchronous sampling clock to result in beat signals which are themselves skewed in the same proportion but on a larger time scale. The beat signals are then suitably masked to extract only the skews of the rising edges of the clock signals. We propose a histogram of the arithmetic difference of the beat signals which decouples the relationship of clock jitter to the minimum measurable skew, and allows skews arbitrarily close to zero to be measured with a precision limited largely by measurement time, unlike the conventional XOR based histogram approach. We also analytically show that the proposed approach leads to an unbiased estimate of skew. The measured results from a 65 nm delay measurement front-end indicate that for an input skew range of ±1 fan-out-of-4 (FO4) delay, ±3σ resolution of 0.84 ps can be obtained with an integral error of 0.65 ps. We also experimentally demonstrate that a frequency modulation on a sampling clock maintains precision, indicating the robustness of the technique to jitter. We also show how FM modulation helps in restoring precision in case of rationally related clocks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full-Spectrum Spatial–Temporal Dynamic Thermal Analysis for Nanometer-Scale Integrated Circuits

    Page(s): 2276 - 2289
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1648 KB) |  | HTML iconHTML  

    This paper presents NanoHeat, a multi-resolution full-chip dynamic integrated circuit (IC) thermal analysis solution, that is accurate down to the scale of individual gates and transistors. NanoHeat unifies nanoscale and macroscale dynamic thermal physics models, for accurate characterization of heat transport from the gate and transistor level up to the chip-package level. A non-homogeneous Arnoldi-based analysis method is proposed for accurate and fast dynamic thermal analysis through a unified adaptive spatial-temporal refinement process. NanoHeat is capable of covering the complete spatial and temporal modeling spectrum of IC thermal analysis. The accuracy and efficiency of NanoHeat are evaluated, and NanoHeat has been applied to a large industry design. The importance of considering fine-grain temperature information is illustrated by using NanoHeat to estimate temperature-dependent negative-bias-temperature-instability (NBTI) effects. NanoHeat has been implemented and publicly released for free academic and personal use. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling

    Page(s): 2290 - 2302
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2731 KB) |  | HTML iconHTML  

    This paper presents a generalized conflict-free memory addressing scheme for memory-based fast Fourier transform (FFT) processors with parallel arithmetic processing units made up of radix-2q multi-path delay commutator (MDC). The proposed addressing scheme considers the continuous-flow operation with minimum shared memory requirements. To improve throughput, parallel high-radix processing units are employed. We prove that the solution to non-conflict memory access satisfying the constraints of the continuous-flow, variable-size, higher-radix, and parallel-processing operations indeed exists. In addition, a rescheduling technique for twiddle-factor multiplication is developed to reduce hardware complexity and to enhance hardware efficiency. From the results, we can see that the proposed processor has high utilization and efficiency to support flexible configurability for various FFT sizes with fewer computation cycles than the conventional radix-2/radix-4 memory-based FFT processors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MZZ-HVS: Multiple Sleep Modes Zig-Zag Horizontal and Vertical Sleep Transistor Sharing to Reduce Leakage Power in On-Chip SRAM Peripheral Circuits

    Page(s): 2303 - 2316
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1301 KB) |  | HTML iconHTML  

    Recent studies show that peripheral circuit (including decoders, wordline drivers, input and output drivers) constitutes a large portion of the cache leakage. In addition, as technology migrates to smaller geometries, leakage contribution to total power consumption increases faster than dynamic power, indicating that leakage will be a major contributor to overall power consumption. This paper presents zig-zag share, a circuit technique to reduce leakage in SRAM peripherals by putting them into low-leakage power sleep mode. The zig-zag share circuit is further extended to enable multiple sleep modes for cache peripherals. Each mode represents a trade-off between leakage reduction and the wakeup delay. Using architectural control of multiple sleep modes, an integrated technique called MSleep-Share is proposed and applied in L1 and L2 caches. MSleep-share relies on cache miss information to guide leakage control mechanism and switch peripheral circuit's power mode. The results show leakage reduction by up to 40× in deeply pipelined SRAM peripheral circuits, with small area overhead and small additional delay. This noticeable leakage reduction translates to up to 85% overall leakage reduction in on-chip memories. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Bit-Parallel Shifted Polynomial Basis Multiplier Using Weakly Dual Basis Over GF(2^{m})

    Page(s): 2317 - 2321
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (302 KB) |  | HTML iconHTML  

    In this paper, we present a new method to compute the Mastrovito matrix for GF(2m) generated by an arbitrary irreducible polynomial using weakly dual basis of shifted polynomial basis. In particular, we derive the explicit formulas of the proposed multiplier for special type of irreducible pentanomial xm+xk3+xk2+xk1+1 with k1 <; k2 ≤ (k1+k3)/2 <; k3 <; min(2k1,m/2). As a result, the time complexity of the proposed multiplier matches or outperforms the previously known results. On the other hand, the number of XOR gates of the proposed multiplier is slightly greater than the best known results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-Cost Dynamic Compensation Scheme for Local Clocks of Next Generation High Performance Microprocessors

    Page(s): 2322 - 2325
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (519 KB) |  | HTML iconHTML  

    We propose a low cost scheme for the dynamic compensation in the field of undesired skew and duty cycle variations of local clocks of high performance microprocessors and high end ASICs. Compared to alternate approaches, our solution features lower power consumption, smaller compensation error, and a lower or comparable area overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low Cost Hardware Implementation of Logarithm Approximation

    Page(s): 2326 - 2330
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (608 KB) |  | HTML iconHTML  

    A low cost, high-speed architecture for the computation of the binary logarithm is proposed. It is based on the Mitchell approximation with two correction stages: a piecewise linear interpolation with power-of-two slopes and truncated mantissa, and a LUT-based correction stage that correct the piecewise interpolation error. The architecture has been implemented in an FPGA device and the results are compared with other low cost architectures requiring less area and achieving high-speed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Defect-Oriented LFSR Reseeding to Target Unmodeled Defects Using Stuck-at Test Sets

    Page(s): 2330 - 2335
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (359 KB) |  | HTML iconHTML  

    Defect screening is a major challenge for nanoscale CMOS circuits, especially since many defects cannot be accurately modeled using known fault models. The effectiveness of test methods for such circuits can therefore be measured in terms of the coverage obtained for unmodeled faults. In this paper, we present a new defect-oriented dynamic LFSR reseeding technique for test-data compression. The proposed technique is based on a new output-deviation metric for grading stuck-at patterns derived from LFSR seeds. We show that, compared to standard compression-driven dynamic LFSR reseeding and a previously proposed deviation-based method, higher defect coverage is obtained using stuck-at test cubes without any loss of compression. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Routing-Aware ILS Design Technique

    Page(s): 2335 - 2338
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (189 KB) |  | HTML iconHTML  

    The Illinois Scan Architecture (ILS) consists of several scan path segments and is useful in reducing test application time and test data volume for high density chips. In this paper, we propose a scheme of layout-aware as well as coverage-driven ILS design. The partitioning of the flip-flops into ILS segments is determined by their geometric locations, whereas the set of the flip-flops to be placed in parallel is determined by the minimum incompatibility relations among the corresponding bits of a test set, to enhance fault coverage in broadcast mode. As a result, the number of serial test patterns also reduces. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High Productivity Circuit Methodology for a Semi-Custom Embedded Processor

    Page(s): 2339 - 2342
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (720 KB) |  | HTML iconHTML  

    A high productivity methodology for implementing custom circuit blocks inside a synthesized microprocessor is presented. A cell-based design style and script-based preroutes are used with a commercial place and route tool, enabling fast layout for custom blocks. The design methodology supports domino logic, register files, and random logic designs. A semi-custom MIPS microprocessor targeted for the high-volume consumer electronics market is implemented with this methodology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Page(s): 2343
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB)  
    Freely Available from IEEE
  • IEEE Foundation [advertisement]

    Page(s): 2344
    Save to Project icon | Request Permissions | PDF file iconPDF (320 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu