By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 3 • Date March 2012

Filter Results

Displaying Results 1 - 23 of 23
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (157 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • A Low-Power Gigabit CMOS Limiting Amplifier Using Negative Impedance Compensation and Its Application

    Page(s): 393 - 399
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1010 KB) |  | HTML iconHTML  

    This paper presents a low-power, gigabit limiting amplifier (LA) for application to optical receivers that employ the negative impedance compensation technique not only to enhance the gain and bandwidth characteristics simultaneously, but also to allow low-voltage, low-power operations. Test chips of the LA were implemented in a standard 0.18-μ m CMOS process, demonstrating 2.5-Gb/s operation with 40-dB gain, 0.053-UI rms jitter for 231-1 pseudorandom bit sequence inputs, 9.5-mVpp input sensitivity for 10-12 bit error rate (BER), and 5.2-mW power dissipation from a single 1.2-V supply. The chip core occupies the area of only 0.25 × 0.1 mm2 . The proposed LA was adopted to realize a low-power, gigabit optical receiver. Fabricated using the same 0.18-μm CMOS technology, the measured results of the optical receiver chip reveal 132.6-dB Ω transimpedance gain, 2.7-GHz bandwidth even with a large 1.5-pF input parasitic capacitance, -16-dBm optical sensitivity for 10-12 BER, and 51-mW power dissipation from a single 1.8-V supply. The area of the whole chip is 1.75 × 0.45 mm2. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 3–5 GHz Current-Reuse g_{m} -Boosted CG LNA for Ultrawideband in 130 nm CMOS

    Page(s): 400 - 409
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1755 KB) |  | HTML iconHTML  

    This paper presents a low-power CMOS transconductance “gm” boosted common gate (CG) ultrawideband (UWB) low noise amplifier (LNA) architecture, operating in the 3-5 GHz range, employing current-reuse technique. This proposed UWB CG LNA utilizes a common source (CS) amplifier as the gm-boosting stage which shares the bias current with the CG amplifying stage. A detailed mathematical analysis of the LNA is carried out and the different design tradeoffs are analyzed. The LNA circuit was designed and fabricated using the 130-nm IBM CMOS process and it achieved input return loss (S11) and output return loss (S22) variations of respectively - 8.4 to - 40 dB and - 14 to - 15 dB within the pass-band. The LNA exhibits almost flat forward power gain (S21) of 13 dB and a reverse isolation (S12) variation of -55 dB to -40 dB, along with a noise figure (NF) ranging between 3.5 and 4.5 dB. The complete circuit (with output buffer) draws only 3.4 mW from a 1 V supply voltage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Test Pattern Generation of Relaxed n -Detect Test Sets

    Page(s): 410 - 423
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (830 KB) |  | HTML iconHTML  

    While defect oriented testing in digital circuits is a hard process, detecting a modeled fault more than one time has been shown to result in high defect coverage. Previous work shows that such test sets, known as multiple detect or n-detect test sets, are of increased quality for a number of common defects in deep sub-micrometer technologies. Method for multiple detect test generation usually produce fully specified test patterns. This limits their usage in a number of important applications such as low power test and test compression. This work proposes a systematic methodology for identifying a large number of bits that can be unspecified in a multiple detect (n-detect) test set, while preserving the original fault coverage. The experimental results demonstrate that the number of specified bits in, even compact, n-detect test sets can be significantly reduced without any impact on the n-detect property. Additionally, in many cases, the size of the test set is reduced. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Test Pattern Generation for Multiple Aggressor Crosstalk Effects Considering Gate Leakage Loading in Presence of Gate Delays

    Page(s): 424 - 436
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1739 KB) |  | HTML iconHTML  

    Decreasing process geometries and increasing operating frequencies have made VLSI circuits more susceptible to signal integrity related failures. Capacitive crosstalk on long signal nets is of particular concern. A typical long net is capacitively coupled with multiple aggressors and also tend to have multiple fan-outs. Gate leakage current that originates in fan-out receivers, terminates in the driver causing a shift in driver output voltage. This effect becomes more prominent as gate oxide is scaled more aggressively. Thus, in nano-scale CMOS circuits, noise margin gets eroded by both aggressor crosstalk noise as well as gate leakage loading from fan-outs. In this paper, we present an automatic test pattern generation solution which uses 0-1 integer linear programming to maximize the cumulative voltage noise at a given victim net because of crosstalk and loading in conjunction with propagating the fault effect to an observation point. The target ISCAS benchmark circuits are assumed to have unit gate delays. Results demonstrate both the viability of a solution as well as a need to consider both sources of noise for signal integrity analysis. Pattern pairs generated by this technique are useful for both manufacturing test application as well as signal integrity verification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cogeneration of Fast Motion Estimation Processors and Algorithms for Advanced Video Coding

    Page(s): 437 - 448
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (861 KB) |  | HTML iconHTML  

    This paper presents a flexible and scalable motion estimation processor capable of supporting the processing requirements for high-definition (HD) video using the H.264 Advanced Video Codec, which is suited for FPGA implementation. Unlike most previous work, our core is optimized to execute all existing fast block matching algorithms, which we show to match or exceed the inter-frame prediction performance of traditional full-search approaches at the HD resolutions commonly in use today. Using our development tools, such algorithms can be described using a C-style syntax which is compiled into our custom instruction set. We show that different HD sequences exhibit different characteristics which necessitate a flexible and configurable solution when targeting embedded applications. This is supported in our core and toolset by allowing designers to modify the number of functional units to be instantiated. All processor instances remain binary compatible so recompilation of the motion estimation algorithm is not required. Due to this optimization process, it is possible to match the processing requirements of the selected motion estimation algorithm to the hardware microarchitecture leading to a very efficient implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toeplitz Matrix Approach for Binary Field Multiplication Using Quadrinomials

    Page(s): 449 - 458
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB) |  | HTML iconHTML  

    In the recent past, subquadratic space complexity multipliers have been proposed for binary fields defined by irreducible trinomials and some specific pentanomials. For such multipliers, alternative irreducible polynomials can also be used, in particular, nearly all one polynomials (NAOPs) seem to be better than pentanomials. For improved efficiency, multiplication modulo an NAOP is performed via modulo a quadrinomial whose degree is one more than that of the original NAOP. In this paper, we present a Toeplitz matrix-vector product based approach for multiplication modulo a quadrinomial. We obtain a fully parallel multiplier with a subquadratic space complexity. The Toeplitz matrix-vector product-based approach is also interesting in the design of sequential multipliers. We present two such multipliers that process a two-bit digit every clock cycle. Field-programmable gate-array implementations of the proposed sequential as well as fully parallel multipliers for the field size of 163 are also presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • NCTU-GR: Efficient Simulated Evolution-Based Rerouting and Congestion-Relaxed Layer Assignment on 3-D Global Routing

    Page(s): 459 - 472
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1318 KB) |  | HTML iconHTML  

    The increasing complexity of interconnection designs has enhanced the importance of research into global routing when seeking high-routability (low overflow) results or rapid search paths that report wirelength estimations to a placer. This work presents two routing techniques, namely circular fixed-ordering monotonic routing and evolution-based rip-up and rerouting using a two-stage cost function in a high-performance congestion-driven 2-D global router. We also propose two efficient via-minimization methods, namely congestion relaxation by layer shifting and rip-up and reassignment, for a dynamic programming-based layer assignment. Experimental results demonstrate that our router achieves performance similar to the first two winning routers in ISPD 2008 Routing Contest in terms of both routability and wirelength at a 1.05 × and 18.47 × faster routing speed. Moreover, the proposed layer assignment achieves fewer vias and shorter wirelength than congestion-constrained layer assignment (COLA). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SKB-Tree: A Fixed-Outline Driven Representation for Modern Floorplanning Problems

    Page(s): 473 - 484
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1219 KB) |  | HTML iconHTML  

    In this paper, we propose an SKB-tree representation for two modern floorplaning problems: fixed-outline and voltage-island driven floorplanning. Since SKB-tree can dynamically allocate regions for blocks so that all blocks can be placed into a specific outline for each solution, it is a suitable representation for dealing with the fixed-outline constraint. Due to this good property, we also use it to deal with the voltage-island driven floorplanning. Different from previous works, we constrain blocks of the same voltage to be placed into one region to save power routing resource, simplify power planning, and reduce IR Drop. Experimental results show the feasibility of SKB-tree. For the fixed-outline constraint with zero deadspace, SKB-tree achieved significantly better wirelength than A-FP, Parquet 4.0, ZDS, and SAFFOA. SKB-tree can get better results than other fixed-outline driven floorplanners because it only needs to focus on wirelength optimization during simulated annealing. Besides, for voltage island driven floorplanning, SKB-tree also consumes less power and wirelength. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ECOS: Stable Matching Based Metal-Only ECO Synthesis

    Page(s): 485 - 497
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2527 KB) |  | HTML iconHTML  

    To ease the time-to-market pressure and save the photomask cost, metal-only ECO realizes the last-minute design changes by revising the photomasks of metal layers only. This task is challenging because the pre-injected spare cells are limited in number and in cell types. Metal-only ECO has to implement these functional and/or timing changes using available spare cells. In this paper, we propose a stable matching based metal-only ECO synthesizer, named ECOS, that can implement the incremental design changes correctly without sacrificing timing and routability. The experiments are conducted on nine industrial testcases. These testcases reflect the real difficulties faced by designers and our results show that ECOS is promising for all of them. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Framework for Layout-Dependent STI Stress Analysis and Stress-Aware Circuit Optimization

    Page(s): 498 - 511
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2167 KB) |  | HTML iconHTML  

    With the continuous shrinking of the feature size, the effect of stress on the performance of the IC device and circuit can no longer be ignored. In fact, stress engineering is becoming more and more widely used today in advanced IC manufacture processes to improve device performance. Different from the intentionally introduced stresses to improve circuit performance, the shallow-trench-isolation (STI) stress, which is exerted by STI wells on the active area of devices, is a by-product of the fabrication process and has increasingly significant impact on the circuit behavior. This paper proposes a complete flow to characterize the influence of STI stress on the performance of RF/analog circuits by considering detailed layout and process information. An accurate and efficient finite-element method-based stress simulator has been developed to extract stress distribution from layouts of IC designs. The existing MOSFET model is also enhanced to capture the effects of stress on mobility, threshold voltage. With the enhanced model, we are able to study the influence of layout-dependent STI stress on the performance of real circuits and establish corresponding optimization strategies. The proposed flow has been applied to a series of RF/analog IC designs based on a 90-nm CMOS technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stack Aware Threshold Voltage Assignment in 3-D Multicore Designs

    Page(s): 512 - 522
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (862 KB) |  | HTML iconHTML  

    Due to the inherent nature of heat flow in 3-D integrated circuits, stacked dies exhibit a wide range of thermal characteristics. The temperature of dies progressively increases with increasing distance from the heat sink. This heterogeneous temperature profile coupled with the strong dependence of leakage on temperature and process variation plays havoc in achieving system level energy efficiency in such systems, complicating the task of power provisioning in 3-D multicores. In this paper, we address this power provisioning challenge in 3-D ICs by advocating a novel stack aware microprocessor design paradigm, where the circuit designers are aware of the intended placement of a die in a 3-D stack. We present a concrete application of this paradigm through a stack aware threshold voltage (Vt) assignment algorithm for a 3-D multicore system, where we specifically account for: 1) the change in the role of leakage power; 2) expected operating frequency; and 3) dependency of PV induced leakage variation and Vt levels. Our stack aware scheme tunes Vt assignment based on the vertical placement of the die in a 3-D stack. Detailed simulation based experiments with our proposed algorithm show 4%-19% improvement in energy efficiency for a typical multicore system organized as 3-D stacked dies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Flexible and Reconfigurable Mismatch-Tolerant Serial Clock Distribution Networks

    Page(s): 523 - 536
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1557 KB) |  | HTML iconHTML  

    We present a clock distribution network that emphasizes flexibility and layout independence. It suits a variety of applications, clock domain shapes and sizes using a modular, standard cell-based design approach that mitigates the effect of intra-die temperature and process variances. We route the clock line serially, using an averaging technique to eliminate skew between clock regions in a domain. Routing clock lines serially allows optimal wire usage for clock networks by eliminating the redundant wires required to match path delays. Our clock network provides control over regional clock skews, can be used in beneficial skew applications and facilitates silicon-debug. Serial clocking permits the use of routing switches in the clock network and allows post-silicon resizing and reshaping of clock domains. Defective sections of the clock network can be bypassed, providing post silicon repair capability. The system uses a closed-loop synchronization phase to combine the clock skew reduction of an actively synchronized clock network with an open-loop operating phase that minimizes power consumption like passive clock networks. Our clock network provides significant flexibility for application-specific integrated circuit, system-on-chip, and field-programmable gate-array designs, exhibiting good operating characteristics everywhere in the design envelope. Our silicon implementation achieves a maximum edge-to-edge uncertainty of 80 ps for regional clocks, which is roughly equal to the cycle-to-cycle jitter of the on-chip clock source. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Raising FPGA Logic Density Through Synthesis-Inspired Architecture

    Page(s): 537 - 550
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1240 KB) |  | HTML iconHTML  

    We leverage properties of the logic synthesis netlist to define both a new field-programmable gate-array (FPGA) logic element (function generator) architecture and an associated technology mapping algorithm that together provide improved logic density. We demonstrate that an “extended” logic element with slightly modified K -input lookup tables (LUTs) achieves much of the benefit of an architecture with K+1-input LUTs, while consuming silicon area close to a K-LUT (a K-LUT requires half the area of a K+1-LUT). We introduce the notion of “non-inverting paths” in a circuit's and-inverter graph (AIG) and show their utility in mapping into the proposed logic element architectures. We propose a general family of logic element architectures, and present results showing that they offer a variety of area/performance tradeoffs. One of our key results demonstrates that while circuits mapped to a traditional 5-LUT architecture need 15% more LUTs and have 14% more depth than a 6-LUT architecture, our extended 5-LUT architecture requires only 7% more LUTs and 5% more depth than 6-LUTs, on average. Nearly all of the depth reduction associated with moving from K -input to K+1 -input LUTs can be achieved with considerably less area using extended K-LUTs. We further show that 6-LUT optimal mapping depths can be achieved with a small fraction of the LUTs in hardware being 6-LUTs and the remainder being extended 5-LUTs, suggesting that a heterogeneous logic block architecture may prove to be advantageous. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical Design of an Application-Specific Instruction Set Processor for High-Throughput and Scalable FFT Processing

    Page(s): 551 - 563
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (945 KB) |  | HTML iconHTML  

    Fast Fourier transformation (FFT), a kernel data processing task in communication systems, has been studied intensively for efficient software and hardware implementations. Nowadays, various orthogonal frequency division multiplexing (OFDM)-based wireless communication standards have raised more stringent requirements on both throughput and flexibility for FFT computation. Application-specific instruction set processor (ASIP) has emerged as a promising solution to meet these requirements. This paper presents a novel hierarchical design of an ASIP tailored for FFT. We reconstruct the FFT computation flow into a scalable array structure based on an 8-point butterfly unit (BU). The array structure can easily expand along both the horizontal and vertical dimensions for any-point FFT computation. We incorporate custom register files to reduce memory access and derive a regular data addressing rule accordingly. With the microarchitecture modifications, we extend the instruction set architecture (ISA) with new instructions to accelerate FFT operations. An FFT ASIP is implemented on Tensilica's reconfigurable processor platform. Our FFT ASIP achieves the data throughput of 405.7 Mb/s for 1 K-point FFT, which attains UWB-OFDM specifications. The area of our custom processor is 147 kilo gates and the total processor power consumption is 60.7 mW, which are acceptable compared to several other designs such as application specific integrated circuit, digital signal processing, field-programmable gate array, and other ASIP implementations. We also extend the implementation for up to 8 K-point FFTs, with degraded performance but still meeting the requirements of those communications standards that demand large-size FFT computations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 15 MHz to 600 MHz, 20 mW, 0.38 mm ^{2} Split-Control, Fast Coarse Locking Digital DLL in 0.13 \mu m CMOS

    Page(s): 564 - 568
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (409 KB) |  | HTML iconHTML  

    A digital delay-locked loop (DLL) suitable for generation of multiphase clocks in applications such as time-interleaved and pipelined analog-to-digital converters (ADCs) locks in a very wide (40×) frequency range. The DLL provides 12 uniformly delayed phases, free of false harmonic locking. A two-stage digital split-control loop is implemented: a fast-locking coarse acquisition is achieved in four cycles using binary search; a fine linear loop achieves low jitter (9 ps rms @ 600 MHz) and tracks process, voltage, and temperature (PVT) variations. The false harmonic locking detector, the frequency range and the jitter performance among other design considerations are analyzed in detail. The DLL consumes 20 mW and occupies a 470 μm × 800 μm in 0.13 μm CMOS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Throughput Interpolator Architecture for Low-Complexity Chase Decoding of RS Codes

    Page(s): 568 - 573
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (253 KB) |  | HTML iconHTML  

    In this paper, a high-throughput interpolator architecture for soft-decision decoding of Reed-Solomon (RS) codes based on low-complexity chase (LCC) decoding is presented. We have formulated a modified form of the Nielson's interpolation algorithm, using some typical features of LCC decoding. The proposed algorithm works with a different scheduling, takes care of the limited growth of the polynomials, and shares the common interpolation points, for reducing the latency of interpolation. Based on the proposed modified Nielson's algorithm we have derived a low-latency architecture to reduce the overall latency of the whole LCC decoder. An efficiency of at least 39%, in terms of area-delay product, has been achieved by an LCC decoder, by using the proposed interpolator architecture, over the best of the previously reported architectures for an RS(255,239) code with eight test vectors. We have implemented the proposed interpolator in a Virtex-II FPGA device, which provides 914 Mb/s of throughput using 806 slices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancing Electromagnetic Analysis Using Magnitude Squared Incoherence

    Page(s): 573 - 577
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (783 KB) |  | HTML iconHTML  

    This paper demonstrates that magnitude squared incoherence (MSI) analysis is efficient to localize hot spots, i.e., points at which focused electromagnetic (EM) analyses can be applied with success. It is also demonstrated that MSI may be applied to enhance differential EM analyses (DEMA) based on difference of means (DoM). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems information for authors

    Page(s): 578
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB)  
    Freely Available from IEEE
  • Have you visited lately? www.ieee.org [advertisement]

    Page(s): 579
    Save to Project icon | Request Permissions | PDF file iconPDF (210 KB)  
    Freely Available from IEEE
  • Quality without compromise [advertisement]

    Page(s): 580
    Save to Project icon | Request Permissions | PDF file iconPDF (324 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu