By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 5 • Date May 2014

Filter Results

Displaying Results 1 - 25 of 32
  • Table of contents

    Publication Year: 2014 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (419 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Publication Year: 2014 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • Nonvolatile CBRAM-Crossbar-Based 3-D-Integrated Hybrid Memory for Data Retention

    Publication Year: 2014 , Page(s): 957 - 970
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2553 KB)  

    This paper explores the design of 3-D-integrated hybrid memory by conductive-bridge random-access-memory (CBRAM). Considering internal states, height, and radius of the conductive bridge of one CBRAM device, an accurate CBRAM device model is developed for CBRAM-crossbar-based nonvolatile memory design with efficient estimation of area, access time, and power. Based on this design platform, one 3-D-integrated hybrid memory is designed by stacking one tier of CBRAMcrossbar with tiers of static random access memory (SRAM) and dynamic random access memory (DRAM), where the tier of CBRAM-crossbar is deployed for data retention during power gating of SRAM/DRAM tiers. One corresponding block-level data retention is developed to only write back dirty data from SRAM/DRAM to CBRAM-crossbar. When compared with phase-change random-access-memory-based system-level data retention, our design achieves 11× faster data-migration speed and 10× less data-migration power. When compared with ferroelectric random-access-memory-based bit-level data retention, our design also achieves 17× smaller area and 56× smaller power under the same data-migration speed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Average-8T Differential-Sensing Subthreshold SRAM With Bit Interleaving and 1k Bits Per Bitline

    Publication Year: 2014 , Page(s): 971 - 982
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2594 KB)  

    This paper presents a new average-8T write/read decoupled (A8T-WRD) SRAM architecture for low-power sub/near-threshold SRAM in power-constraint applications such as biomedical implants and autonomous sensor nodes. The proposed architecture consists of several novel concepts in dealing with issues in sub/near-threshold SRAM including: 1) the differential and data-independent-leakage read port that facilitates robust and faster read operation and alleviates issues in the half-selected cell (pseudo-write) while reducing the area compared to the conventional 8T cell and 2) the various configurations from 14T for a baseline cell to 6.5T for an area-efficient 16-bit cell. These configurations reduce the overall bitcell area and enable low operating voltage. Two memory blocks based on the proposed architecture at the size of 16 and 64 kb, respectively, are fabricated in 0.13-μm CMOS process. The 64 kb prototype has an active area of 0.512 mm2 which is 16% less than that of the conventional 8T-cell-based design. The chip is fully functional for the read operation with 260 mV at 245 kHz and 270 mV for the write operation at 1 MHz. It can hold data down to 170 mV where the standby power consumption is only 884 nW. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time

    Publication Year: 2014 , Page(s): 983 - 994
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1561 KB)  

    The coarse-grained reconfigurable architecture (CGRA) is proven to be energy efficient in several specific domains. In CGRAs, the on-chip memory hierarchy, which contains the context memory and the data memory organizations, should be well considered to achieve appropriate tradeoffs among three aspects: 1) performance; 2) area; and 3) power. In this paper, two techniques called the hierarchical configuration context (HCC) and the lifetime-based data-memory organization (LDO) focusing on the context memory and the data memory organizations are proposed to compress the on-chip memory space and to reduce the reconfiguration time and the data-reference time. In the HCC, the contexts are constructed in a hierarchical fashion to completely eliminate the repetitive portions of the contexts, not only reducing the overall context storage, but also alleviating the context transportation overhead. A fast context-indexing mechanism in the HCC is proposed to achieve fast reconfiguration, as the hierarchically organized contexts can be located and accessed conveniently. In the LDO, the on-chip data are classified into two types, based on the lifetime of data. The short-lifetime data are stored in the first in first out to increase the reuse ratio of memory space automatically, whereas the long-lifetime data are stored in the radom access memory for several time references. The HCC and the LDO are used in a CGRA core called as reconfigurable processing unit (RPU). Two RPUs are integrated in a reconfigurable computing processor (RCP) called as REconfigurable MUlti-media System, High-Performance Processor (REMUS_HPP). Because of the HCC, compared with a traditional nonhierarchical system, the total context storage required in H.264 decoding is reduced by 77%. Because of the LDO, the normalized on-chip data memory size at same performance level in the REMUS_HPP is only 23.8% and 14.8% of those in XPP-III (a high-performance RCP) and ADRES (a low-power RCP). REMUS_HPP is implemented - n a 48.9-mm2 silicon with TSMC 65-nm technology, using a 200-MHz working frequency to achieve 1920 × 1088 at 30 fps H.264 high-profile decoding. Compared with XPP-III, the performance of the REMUS_HPP is 1.81× boosted, whereas the energy efficiency is 4.75× higher. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliable Concurrent Error Detection Architectures for Extended Euclidean-Based Division Over {\rm GF}(2^{m})

    Publication Year: 2014 , Page(s): 995 - 1003
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (642 KB)  

    The extended Euclidean algorithm (EEA) is an important scheme for performing the division operation in finite fields. Many sensitive and security-constrained applications such as those using the elliptic curve cryptography for establishing key agreement schemes, augmented encryption approaches, and digital signature algorithms utilize this operation in their structures. Although much study is performed to realize the EEA in hardware efficiently, research on its reliable implementations needs to be done to achieve fault-immune reliable structures. In this regard, this paper presents a new concurrent error detection (CED) scheme to provide reliability for the aforementioned sensitive and constrained applications. Our proposed CED architecture is a step forward toward more reliable architectures for the EEA algorithm architectures. Through simulations and based on the number of parity bits used, the error detection capability of our CED architecture is derived to be 100% for single-bit errors and close to 99% for the experimented multiple-bit errors. In addition, we present the performance degradations of the proposed approach, leading to low-cost and reliable EEA architectures. The proposed reliable architectures are also suitable for constrained and fault-sensitive embedded applications utilizing the EEA hardware implementations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rate-0.96 LDPC Decoding VLSI for Soft-Decision Error Correction of NAND Flash Memory

    Publication Year: 2014 , Page(s): 1004 - 1015
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3400 KB)  

    The reliability of data stored in high-density Flash memory devices tends to decrease rapidly because of the reduced cell size and multilevel cell technology. Soft-decision error correction algorithms that use multiple-precision sensing for reading memory can solve this problem; however, they require very complex hardware for high-throughput decoding. In this paper, we present a rate-0.96 (68254, 65536) shortened Euclidean geometry low-density parity-check code and its VLSI implementation for high-throughput NAND Flash memory systems. The design employs the normalized a posteriori probability (APP)-based algorithm, serial schedule, and conditional update, which lead to simple functional units, halved decoding iterations, and low-power consumption, respectively. A pipelined-parallel architecture is adopted for high-throughput decoding, and memory-reduction techniques are employed to minimize the chip size. The proposed decoder is implemented in 0.13-μm CMOS technology, and the chip size and energy consumption of the decoder are compared with those of a BCH (Bose-Chaudhuri-Hocquenghem) decoding circuit showing comparable error-correcting performance and throughput. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of On-Chip Lightweight Sensors for Effective Detection of Recycled ICs

    Publication Year: 2014 , Page(s): 1016 - 1029
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1814 KB)  

    The counterfeiting and recycling of integrated circuits (ICs) have become major issues in recent years, potentially impacting the security and reliability of electronic systems bound for military, financial, or other critical applications. With identical functionality and packaging, it would be extremely difficult to distinguish recycled ICs from unused ICs. In this paper, two types of on-chip lightweight sensors are proposed to identify recycled ICs by measuring circuit usage time when used in the field. Recycled ICs detection based on aging in ring oscillators (ROs-based) and antifuse (AF-based) are the two techniques presented in this paper. For RO-based sensors, statistical data analysis is used to separate process and temperature variations' effects on the sensor from aging experienced by the sensor in the ICs. For AF-based sensor, counters and embedded one-time programmable memory are used to record the usage time of ICs by counting the cycle of system clock or switching activities of a certain number of nets in the design. Simulation results using 90-nm technology and silicon results from 90-nm test chips show the effectiveness of RO-based sensors for identification of recycled ICs. In addition, the analysis of usage time stored in AF-based sensors shows that recycled ICs, even used for a very short period, can be accurately identified. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Light-Weight On-Chip Structure for Measuring Timing Uncertainty Induced by Noise in Integrated Circuits

    Publication Year: 2014 , Page(s): 1030 - 1041
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1409 KB)  

    Noise such as voltage drop and temperature in integrated circuits can cause significant performance variation and even functional failure in lower technology nodes. In this paper, we propose a lightweight on-chip structure that measures timing uncertainty induced by noise during functional and test operations. The proposed on-chip structure, facilitates speed characterization under various workloads and test conditions. The basic structure is highly scalable and can be tailored for various applications such as silicon validation, monitoring operation condition, and validating logic built-in-self-test conditions. Simulation results show that it offers very high measurement resolution in a highly efficient manner. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Logical Effort for CMOS-Based Dual Mode Logic Gates

    Publication Year: 2014 , Page(s): 1042 - 1053
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2024 KB)  

    Recently, a novel dual mode logic (DML) family was proposed. This logic allows operation in two modes: 1) static and 2) dynamic modes. DML gates, which can be switched between these modes on-the-fly, feature very low power dissipation in the static mode and high performance in the dynamic mode. A basic DML gate is very simple and is composed of any static logic family gate and an additional clocked transistor. In this paper, we introduce the logical effort (LE) methodology for the CMOS-based DML family. The proposed methodology allows path length minimization, delay optimization, and delay estimation of DML logic. This is done by development of complete and approximated LE models, which allows easy extraction of design optimization parameters, such as optimum number of stages, gates sizing factors, and delay estimations. The proposed optimization is shown for the dynamic mode of operation. Theoretical mathematical analysis is presented, and efficiency of the proposed methodology is shown in a standard 40-nm CMOS process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Software/Hardware Parallel Long-Period Random Number Generation Framework Based on the WELL Method

    Publication Year: 2014 , Page(s): 1054 - 1059
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1383 KB)  

    This paper presents a hardware architecture for efficient implementation of the well equidistributed long-period linear (WELL) algorithm. Our design achieves a throughput of one sample-per-cycle and runs as fast as 423 MHz on a Xilinx XC5VFX130T field-programmable gate array (FPGA) device. This performance is 7.1-fold faster than a dedicated software implementation. The proposed architecture is also implemented on targeting different devices for the comparison of other types of pseudorandom number generators. In addition, we design a software/hardware framework that is capable of dividing the WELL stream into an arbitrary number of independent parallel substreams. With support from software, this framework can obtain speedup roughly proportional to the number of parallel cores. The sequences produced by the single design are verified to be consistent with the standard software generator. In addition, the statistical tests of interleaved sequences are also performed to check for correlations between different substreams of the parallel framework. We apply our framework to two applications. Experimental results verify the correctness of our framework as well as the better characteristics of the WELL algorithm compared with the Mersenne Twister method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable CORDIC-Based Low-Power DCT Architecture Based on Data Priority

    Publication Year: 2014 , Page(s): 1060 - 1068
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1547 KB)  

    This paper presents a low-power coordinate rotation digital computer (CORDIC)-based reconfigurable discrete cosine transform (DCT) architecture. The main idea of this paper is based on the interesting fact that all the computations in DCT are not equally important in generating the frequency domain outputs. Considering the importance difference in the DCT coefficients, the number of CORDIC iterations can be dynamically changed to efficiently tradeoff image quality for power consumption. Thus, the computational energy can be significantly reduced without seriously compromising the image quality. The proposed CORDIC-based 2-D DCT architecture is implemented using 0.13 μm CMOS process, and the experimental results show that our reconfigurable DCT achieves power savings ranging from 22.9% to 52.2% over the CORDIC-based Loeffler DCT at the cost of minor image quality degradations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Practical Routability-Driven Design Flow for Multilayer Power Networks Using Aluminum-Pad Layer

    Publication Year: 2014 , Page(s): 1069 - 1081
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3089 KB)  

    This paper presents a novel framework to efficiently and effectively build a robust but routing-friendly multilayer power network under the IR-drop and electro-migration (EM) constraints. The proposed framework first considers the impact of the aluminum-pad layer and provides a conservative analytical model to determine the total metal width for each power layer that can meet the IR-drop and EM constraints. Then the proposed framework can identify an optimal irredundant stripe width by considering the number of occupied routing tracks and the potential routing detour caused by the power stripes without the information of cell placement. Next, after the cell placement is done, the proposed framework applies a dynamic-programming approach to further reduce the potential routing detour by relocating the power stripes. A series of experiments are conducted based on a 40 nm, 1.1 V, and 900-MHz microprocessor to validate the effectiveness and efficiency of the proposed framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • UNION: A Unified Inter/Intrachip Optical Network for Chip Multiprocessors

    Publication Year: 2014 , Page(s): 1082 - 1095
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1937 KB)  

    As modern computing systems become increasingly complex, communication efficiency among and inside chips has become as important as the computation speeds of individual processing cores. Traditionally, to maximize design flexibility, interchip and intrachip communication architectures are separately designed under different constraints. Jointly designing communication architectures for both interchip and intrachip communication could, however, potentially yield better solutions. In this paper, we present a unified inter/intrachip optical network, called UNION, for chip multiprocessors (CMPs). UNION is based on recent progresses in nanophotonic technologies. It connects not only cores on a single CMP, but also multiple CMPs in a system. UNION employs a hierarchical optical network to separate interchip communication traffic from intrachip communication traffic. It fully utilizes a single optical network to transmit both payload and control packets. The network controller on each CMP not only manages intrachip communications, but also collaborates with each other to facilitate interchip communications. We compared UNION with a matched electrical counterpart in 45-nm process. Simulation results for eight real CMP applications show that on average UNION improves CMP performance by 3× while reducing 88% of network energy consumption. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Resolution All-Digital Duty-Cycle Corrector in 65-nm CMOS Technology

    Publication Year: 2014 , Page(s): 1096 - 1105
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4157 KB)  

    In high-speed data transmission applications, such as double data rate memory and double sampling analog-to-digital converter, the positive and negative edges of the system clock are utilized for data sampling. Thus, these systems require an exact 50% duty cycle of the system clock. In this paper, two wide-range all-digital duty-cycle correctors (ADDCCs) with output clock phase alignment are presented. The proposed phase-alignment ADDCC (PA-ADDCC) not only achieves the desired output/input phase alignment, but also maintains the output duty cycle at 50% with a short locking time. In addition, the proposed high-resolution ADDCC (HR-ADDCC) without a half-cycle delay line can improve the delay resolution and mitigate the delay mismatch problem in a nanometer CMOS process. Experimental results show that the frequency range of the proposed ADDCCs is 263-1020 MHz for the PA-ADDCC and 200-1066 MHz for the HR-ADDCC with a DCC resolution of 3.5 and 1.75 ps, respectively. In addition, the proposed PA-ADDCC and HR-ADDCC are implemented in an all-digital manner to reduce circuit complexity and leakage power in advanced process technologies and, thus, are very suitable for system-on-chip applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variation-Aware Variable Latency Design

    Publication Year: 2014 , Page(s): 1106 - 1117
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2682 KB)  

    Although typical digital circuits are designed so that the clock period satisfies worst case path delay constraints, the average input excitation often completes computation in less than a clock cycle. Variable latency units (VLUs) allow for improved throughput by allowing one clock cycle for some computations, and two clock cycles for others, using hold logic to differentiate between the two cases. However, they may experience significant throughput losses due to the effects of process variations. We develop a combined presilicon-postsilicon technique for variation-aware VLU design that ensures high throughputs across all manufactured chips. We achieve this by identifying path clusters at the presilicon stage, such that each element of a path cluster is likely to be similarly critical in a manufactured part. We use sensors to determine which path clusters is critical at the postsilicon stage and then activate the appropriate hold logics. Practically, for a small number of path clusters, significant improvements in throughput are achievable. On a set of 32-nm PTM-based ISCAS89 circuits, our scheme offers 15.1% throughput enhancements with only 3.3% area overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 1.9-ps Jitter, 10.0-dBm-EMI Reduction Spread-Spectrum Clock Generator With Autocalibration VCO Technique for Serial-ATA Application

    Publication Year: 2014 , Page(s): 1118 - 1126
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2655 KB)  

    A spread-spectrum clock generator (SSCG) that is characterized by a low jitter voltage-controlled oscillator (VCO) with a high-frequency limiter and an autocalibration function is developed for a Serial-ATA application. The high-frequency limiter prevents the SSCG from going into an unlocked state. The proposed VCO achieved far less jitter than a conventional one because a proposed structure has fewer operating MOSs that are noise sources. The autocalibration technique calibrates the VCO sensitivity, the maximum frequency of the VCO output signal, and the characteristics of the limit function to prevent the SSCG from degrading an output jitter and an electromagnetic interference (EMI) reduction due to a process variation. The proposed SSCG with the autocalibration function was fabricated in a 0.13-μm CMOS process. The variation in the jitter in 250-cycles at 1.5 GHz with spread-spectrum clocking (SSC) is improved from 2.1-7.8 ps to 1.9-3.3 ps and the EMI reduction is achieved as 10.0 dBm by the proposed autocalibration technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 12.5-Gb/s On-Chip Oscilloscope to Measure Eye Diagrams and Jitter Histograms of High-Speed Signals

    Publication Year: 2014 , Page(s): 1127 - 1137
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5556 KB)  

    This paper presents a 12.5-Gb/s on-chip oscilloscope (OCO) circuit to measure eye diagrams and jitter histograms of high-speed digital signals. The proposed circuit adopts a novel architecture to capture both single-ended and differential signals. In addition, it is capable of measuring the eye openings and jitter of the input signals without the need to construct the whole eye diagram which makes it a suitable candidate for eye-opening monitor circuits. An asynchronous sampling technique and an efficient algorithm are employed in this research to decrease the area of the OCO as well as its processing time. The proposed circuit is fabricated in a 65-nm CMOS technology and the measurement results show sub-picosecond resolution when the input signals consist of a 10-GHz clock signal and a 12.5-Gb/s pseudorandom binary sequence. The OCO circuit has a power consumption of 1.9 mW, and its core area is 40 × 60 μm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Transistor Threshold Voltage Measurement Method for High-Speed, High-Accuracy Advanced Process Characterization

    Publication Year: 2014 , Page(s): 1138 - 1149
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1431 KB)  

    As process technologies continually advance, process variation has greatly increased and has gradually become one of the most critical factors for IC manufacturing. Furthermore, these increasingly complex processes continue to make greater use of stressors for mobility enhancement, thus requiring large volumes of data for extensive characterization of layout-dependent effects (LDE) for validation of both SPICE models and design for manufacturing. Transistor threshold voltage (Vt) is a commonly used parameter both for characterization during process development and for monitoring of volume manufacturing. To adequately quantify local process variation or LDE, Vt must be measured for a sufficiently large number of device-under-tests (DUTs) to obtain a statistically representative sample population. The number of Vt measurements required to obtain such a statistically significant result, however, requires extremely long testing time, especially for array-based test structure designs including thousands of DUTs. In this paper, we present a very fast threshold voltage measurement methodology using an operational amplifier-based source-measure unit test configuration, which greatly improves testing efficiency and accuracy, and is not sensitive to process variation. The proposed test methodology can improve Vt testing time by a factor of 5-10 relative to the commonly used binary-search algorithm, and by a factor of ~2 relative to an optimized interpolation algorithm, and achieves better accuracy (standard deviation of Vt = 0.15 mV, versus typical accuracy of ~ 0.5 mV for the two algorithms mentioned). Furthermore, the layout and configuration of conventional test structures need not be modified to adapt the proposed methodology. The measured results from the most advanced process technology nodes demonstrate the testing efficiency and accuracy of the proposed test structure in characterizing the large number of DUTs re- uired for quantifying process variation or LDEs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FinCANON: A PVT-Aware Integrated Delay and Power Modeling Framework for FinFET-Based Caches and On-Chip Networks

    Publication Year: 2014 , Page(s): 1150 - 1163
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3264 KB)  

    Recently, FinFETs have emerged as promising substitutes for conventional CMOS because of their superior control of short-channel effects and processing scalability. Nevertheless, lithographic constraints, difficulties in workfunction engineering, supply voltage variations, and temperature nonuniformity across the FinFET integrated circuit may lead to process, supply voltage, and temperature (PVT) variations, which are manifested as large spreads in delay and leakage. In this paper, we present FinCANON, an integrated framework for the simulation of power, delay, as well as PVT variations of FinFET-based caches and on-chip networks. FinCANON consists of CACTI-PVT and ORION-PVT that model caches and on-chip networks, respectively. We have developed a FinFET design library to model the circuit-level characteristics as well as their variation trends with respect to various PVT parameters for FinFET logic gates and memory cells, using accurate device simulation. With a statistical static timing analysis technique and macromodel-based methodology, we have also derived PVT variation models for delay and leakage, considering spatial correlations, to characterize the impact of PVT variations on FinFET-based caches and networks-on-chip (NoCs). In addition, we incorporate voltage generators in the FinFET design library to model back-gate biasing of FinFETs. The cache and NoC models are significantly enhanced to be more modular and scalable. We present results for various FinFET design styles and show that mixing different styles may be a promising strategy for optimizing delay and leakage of caches and NoCs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 1-V 365- \mu{\rm W} 2.5-MHz Channel Selection Filter for 3G Wireless Receiver in 55-nm CMOS

    Publication Year: 2014 , Page(s): 1164 - 1169
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1006 KB)  

    This paper presents a novel 1-V 2.5-MHz continuous-time filter for 3G wireless application, fabricated using a standard 55-nm CMOS process. The four-pole filter topology includes a single pole-tracking Operational Amplifier (OPAMP) structure to achieve low in-band noise levels, high out-of-band linearity, and reduced power consumption. An automatic frequency tuning circuit is developed to compensate for process and environmental variations. The proposed filter achieves inband noise of 18-μV rms and out-of-band IIP3 of 33 dBm within 365 μW. The out-of-band spurious-free dynamic range is measured at 76.7 dB, resulting in a figure-of-merit of 1.24 × 10-4 fJ. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Litho-Friendly Decomposition Method for Self-Aligned Triple Patterning

    Publication Year: 2014 , Page(s): 1170 - 1174
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1885 KB)  

    Multiple patterning lithography is the most likely manufacturing process for sub-32 nm technology nodes. Among different multiple patterning methods, self-aligned patterning has attracted much interest due to its robustness against overlay errors. However, self-aligned patterning compliance is subject to the litho-friendliness of the applied decomposition method. This brief establishes self-aligned triple patterning (SATP) decomposition requirements and proposes a litho-friendly layout decomposition method. First, the major SATP litho-friendliness requirements are explained. In-silico experiments on SATP process indicate that layout features printed by the structural spacers are the most accurate ones. Therefore, we propose an ILP-based decomposition which avoids decomposition conflicts and maximizes the use of structural spacers simultaneously. Experiments reveal that the proposed method improves overlay robustness and line-edge roughness of the attempted test cases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Area-Delay Efficient Binary Adders in QCA

    Publication Year: 2014 , Page(s): 1174 - 1179
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1148 KB)  

    As transistors decrease in size more and more of them can be accommodated in a single die, thus increasing chip computational capabilities. However, transistors cannot get much smaller than their current size. The quantum-dot cellular automata (QCA) approach represents one of the possible solutions in overcoming this physical limit, even though the design of logic modules in QCA is not always straightforward. In this brief, we propose a new adder that outperforms all state-of-the-art competitors and achieves the best area-delay tradeoff. The above advantages are obtained by using an overall area similar to the cheaper designs known in literature. The 64-bit version of the novel adder spans over 18.72 μ2 of active area and shows a delay of only nine clock cycles, that is just 36 clock phases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization Scheme to Minimize Reference Resistance Distribution of Spin-Transfer-Torque MRAM

    Publication Year: 2014 , Page(s): 1179 - 1182
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (656 KB)  

    Spin-transfer-torque magnetoresistive random access memory (STT-MRAM) is an emerging type of nonvolatile memory with compelling advantages in endurability, scalability, speed, and energy consumption. As the process technology shrinks, STT-MRAM has limited sensing margin due to the decrease in supply voltage and increase in process variation. Furthermore, the relatively smaller resistance difference of two states in STT-MRAM poses challenges for its read/write circuit design to maintain an acceptable sensing margin. The proposed reference circuits optimization scheme solves the reference resistance distribution issue to maximize the sensing margin and minimize the read disturbance, with low power consumption. Simulation results show that the optimization scheme is able to significantly improve the read reliability with the presence of one or few cases of reference cell failure, thus it eliminates the requirement of additional circuits for failure detection of reference cell or referencing to neighboring blocks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Throughput and Low-Complexity BCH Decoding Architecture for Solid-State Drives

    Publication Year: 2014 , Page(s): 1183 - 1187
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (788 KB)  

    This paper presents a high-throughput and low-complexity BCH decoder for NAND flash memory applications, which is developed to achieve a high data rate demanded in the recent serial interface standards. To reduce the decoding latency, a data sequence read from a flash memory channel is re-encoded by using the encoder that is idle at that time. In addition, several optimizing methods are proposed to relax the hardware complexity of a massive-parallel BCH decoder and increase the operating frequency. In a 130-nm CMOS process, a (8640, 8192, 32) BCH decoder designed as a prototype provides a decoding throughput of 6.4 Gb/s while occupying an area of 0.85 mm2. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

Krishnendu Chakrabarty
Department of Electrical Engineering
Duke University
Durham, NC 27708 USA
Krish@duke.edu