By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 7 • Date July 2014

Filter Results

Displaying Results 1 - 24 of 24
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (412 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • Partial Access Mode: New Method for Reducing Power Consumption of Dynamic Random Access Memory

    Page(s): 1461 - 1469
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2286 KB) |  | HTML iconHTML  

    Demands have been placed on a dynamic random access memory (DRAM) to not only have increased memory capacity and data transfer speed, but also have reduced operating and standby currents. When a system uses a DRAM, a refresh operation is necessary because of its data retention time restriction: each bit of the DRAM is stored as an amount of electrical charge in a storage capacitor that is discharged by the leakage current. Power consumption for the refresh operation increases in proportion to the memory capacity. We propose a new method to reduce the refresh power consumption by effectively extending the memory cell retention time. Conversion from 1 cell/bit to 2N cells/bit reduces the variation in the retention time among memory cells. Although active power increases by a factor of 2N, the refresh time increases by more than 2N as a consequence of the fact that the majority decision does better than averaging for the tail distribution of retention time. The conversion can be realized very simply from the structure of the DRAM array circuit, and it reduces the frequency of disturbance and power consumption by two orders of magnitude. On the basis of this conversion method, we propose a partial access mode to reduce power consumption dynamically when the full memory capacity is not required. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partial Parity Cache and Data Cache Management Method to Improve the Performance of an SSD-Based RAID

    Page(s): 1470 - 1480
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2743 KB) |  | HTML iconHTML  

    In this paper, a partial parity cache and data cache management method is presented for reducing the parity updating cost of a solid-state disk (SSD) based redundant array of inexpensive disk (RAID) system, thereby which the input/output (I/O) performance of the RAID system can be improved. SSDs have many advantages compared to hard disk drives. However, it is not advisable to directly add SSDs into a RAID system because doing so will decrease the performance and the life-time of the SSDs. In the RAID-5 system, parity generation includes read and write operations to the SSDs. Whenever there is a new write request to the RAID, the related parity must be updated and written to the SSDs. Such frequent parity updates result in poor RAID performance and shortens the life-time of the SSDs. This paper combines the prior methods and the proposed efficient buffer management method with a data cache. The proposed method reduces the number of read and write operations for generating parities in the RAID system. Experimental results show that the I/O performance of the RAID-5 system can be improved by 76% by using the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy Efficient Programmable MIMO Decoder Accelerator Chip in 65-nm CMOS

    Page(s): 1481 - 1490
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4210 KB) |  | HTML iconHTML  

    This paper presents an energy efficient programmable hardware accelerator that targets multiple-input-multiple-output (MIMO) decoding tasks of orthogonal frequency-division multiplexing (OFDM) systems. The work is motivated by the adoption of MIMO and OFDM by almost all existing and emerging high-speed wireless data communication systems. The accelerator was fabricated in 65-nm CMOS technology and occupies a core area of 2.48 mm2. It delivers full programmability across different wireless standards (i.e., WiFi, 3G-long term evolution, and WiMax) as well as different MIMO decoding algorithms (i.e., minimum mean square error, singular value decomposition, and maximum likelihood) with extreme energy efficiency. The energy efficiency of our MIMO accelerator chip was compared against dedicated application specific integrated circuits for 4 × 4 QR decomposition, 4 × 4 singular value decomposition, and 2 × 2 minimum mean square error decoding. Despite the programmable nature of our design, it delivered energy efficiencies that were 18% to 28% better than the dedicated solutions reported in the literature. This paper presents the VLSI implementation of the architecture discussed in [14]-[16]. It discusses the implementation decisions and tradeoffs used to ensure minimum overall energy consumption of the resulting accelerator chip without sacrificing programmability. Given its programmability and extreme energy efficiency, the accelerator is an ideal solution for today's smart phones that implement multiple MIMO-OFDM waveforms on the same platform. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PaCC: A Parallel Compare and Compress Codec for Area Reduction in Nonvolatile Processors

    Page(s): 1491 - 1505
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4818 KB) |  | HTML iconHTML  

    Nonvolatile (NV) processors have attracted much attention in recent years due to their zero standby power, resilience to power failures, and instant-on feature. One design challenge of NV processors is the excess area needed by NV registers. This paper introduces a parallel compare and compress (PaCC) architecture to reduce such excess area. A key component of the PaCC architecture is a new codec which effectively balances area and performance. In addition, the PaCC architecture includes a configurable state table to support reference vector selection for different applications. With the proposed vector selection algorithm, the PaCC architecture can outperform other vector selection approaches by over 59% in terms of reduction in the number of NV registers. The proposed architecture has been fully realized at the circuit level and synthesized for the Rohm's 0.13-μm ferroelectric-CMOS hybrid process. Results demonstrate that the design can reduce the number of NV registers by 70%-80% with less than 1% overflow possibility, which leads to up to 30% processor area saving. The overall approach is applicable to any NV processor design regardless of the NV material used. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Register Renaming and Recovery for High-Performance Processors

    Page(s): 1506 - 1514
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2060 KB) |  | HTML iconHTML  

    Modern superscalar processors implement register renaming using either random access memory (RAM) or content-addressable memories (CAM) tables. The design of these structures should address both access time and misprediction recovery penalty. Although direct-mapped RAMs provide faster access times, CAMs are more appropriate to avoid recovery penalties. The presence of associative ports in CAMs, however, prevents them from scaling with the number of physical registers and pipeline width, negatively impacting performance, area, and energy consumption at the rename stage. In this paper, we present a new hybrid RAM-CAM register renaming scheme, which combines the best of both approaches. In a steady state, a RAM provides fast and energy-efficient access to register mappings. On misspeculation, a low-complexity CAM enables immediate recovery. Experimental results show that in a four-way state-of-the-art superscalar processor, the new approach provides almost the same performance as an ideal CAM-based renaming scheme, while dissipating only between 17% and 26% of the original energy and, in some cases, consuming less energy than purely RAM-based renaming schemes. Overall, the silicon area required to implement the hybrid RAM-CAM scheme does not exceed the area required by conventional renaming mechanisms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory-Hierarchical and Mode-Adaptive HEVC Intra Prediction Architecture for Quad Full HD Video Decoding

    Page(s): 1515 - 1525
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3146 KB) |  | HTML iconHTML  

    This paper presents a high-throughput and areaefficient VLSI architecture for intra prediction in the emerging high efficiency video coding standard. Three design techniques are proposed to address the complexity systematically: 1) a hierarchical memory deployment that stores neighboring samples in 4.9 Kb of static RAM (SRAM) instead of 43.2-k gates of registers and increases throughput by processing reference samples in registers; 2) a mode-adaptive scheduling scheme for all prediction units, which provides at least 2 samples/cycle throughput while using low-throughput SRAM and can achieve 2.46 samples/cycle on the average based on the experimental results; and 3) resource sharing for multipliers and the readout circuits of reference sample registers, which can save 2.5-k gates. These techniques can efficiently reduce area by 40% but induce more power because of additional signal transitions. Signal-gating circuits are then applied to reduce 69% of SRAM power and 32% of logic power, which cost only 1.0-k gates. When synthesized at 200 MHz with 40-nm process, the proposed architecture needs only 27.0-k gates and 4.9 Kb of single-port SRAM. The layout core area is 0.036 mm2, and the power consumption is 2.11 mW in the postlayout simulation. The corresponding performance can support quad full high-definition (HD) (3840 × 2160) video decoding at 30 frames/s. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Iterative Linear Interpolation Based on Fuzzy Gradient Model for Low-Cost VLSI Implementation

    Page(s): 1526 - 1538
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4283 KB) |  | HTML iconHTML  

    In this paper, we propose an iterative linear interpolation (ILI) algorithm, which produces quadratic ILI polynomials to perform the most cost-effective interpolation among state-of-the-art quadratic and cubic methods. Unlike traditional point and area pixel models, the ILI adopts the fuzzy gradient model to estimate gradients of the target point according to its neighbor sample points in different directions. By weighing the gradients using fuzzy membership grades, the ILI estimates the difference between the target point and its neighbor sample points and finally obtains the target point. In 1-D signal reconstructions, using only three multipliers, the ILI obviously outperforms both conventional quadratic Lagrange interpolation and cubic interpolation. To approximate 2-D signals, we use five 1-D ILIs, which costs only eight multipliers to obtain similar peak signal-to-noise ratio (PSNR) performance but better robustness compared with bi-cubic interpolation. Reusing the ILI polynomials of the previous target point, we further reduce the cost of ILI to three multipliers and eight adders. The VLSI implementation using TSMC 0.18-μm technology shows that only 7256 gates are required for running a 200-MHz, 8-bit input/output, 15-bit fix-point data path, and 10-stage pipelined 2-D ILI, which is the quadratic interpolation of lowest cost but with PSNR performance closest to state-of-the-art bi-cubic methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ADC-Based Backplane Receiver Design-Space Exploration

    Page(s): 1539 - 1547
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8643 KB) |  | HTML iconHTML  

    Demand for higher throughput backplane communications, coupled with a desire for design portability and flexibility, has led to high-speed backplane receivers that use front-end analog-to-digital converters (ADCs) and digital equalization. Unfortunately, power and complexity of such receivers can be high and require careful design. This paper presents a parameterized ADC-based backplane receiver model that facilitates design-space exploration to optimize the tradeoffs between power and performance-an accurate behavioral model of front-end ADCs is presented for performance estimation and detailed power models for the digital equalizer (EQ) blocks are developed for power estimation. Model-based simulations suggest that comparator offset correction resolution is the most critical ADC design parameter when an overall receiver performance is concerned. Further receiver design-space exploration reveals that a Pareto optimal frontier exists, which can be used as a guideline to set the initial receiver configurations depending on a given power and performance constraints. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High Fill Factor Low-Voltage CMOS Image Sensor Based on Time-to-Threshold PWM VLSI Architecture

    Page(s): 1548 - 1556
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1757 KB) |  | HTML iconHTML  

    This paper presents a CMOS image sensor (CIS) VLSI architecture based on a single-inverter time-to-threshold pulsewidth modulation circuitry capable of operating as low as 330-mV supply voltage while retaining a signal-to-noise ratio of 24 dB; an important characteristic being demanded by very low voltage portable CIS-based equipment such as disposable medical cameras and on-chip autonomous wireless security vision systems. A 64 × 64 pixel array was fabricated using standard 130-nm CMOS process consuming only 5.9 nW/pixel with integration time of 2 ms at +0.5 V supply. The high fill factor of 58% facilitated a better SNR at a low supply voltage when compared with other CIS architectures. The pixel has a dynamic range of 54 dB with 7.8 frame per second. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sensitization Input Vector Impact on Propagation Delay for Nanometer CMOS ICs: Analysis and Solutions

    Page(s): 1557 - 1569
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2788 KB) |  | HTML iconHTML  

    We report and analyze the dependence of complex gates delay with the sensitization vector and its variation-that gets up to 40% in 65-nm CMOS technologies-and include its effect in the path delay estimation-that can be in the order of 16%. The gate delay is computed from a simple polynomial analytical description that requires a one-time library parameter extraction process, making it highly scalable. An STA tool based on a single-pass true path computation is used to determine the critical path list. Since it does not rely on a two-step process, it can be programmed to find efficiently the N true paths from a circuit. Results from various benchmark circuits synthesized for three commercial technologies (130, 90, and 65 nm) provide better results in number of paths reported and delay estimation for these paths compared to a commercial tool. The impact of delay variation with the sensitization vector for paths with complex gates reveals as a significant mechanism that must be considered as it is comparable to the impact of parameter variations or interconnect-induced delay. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-Power Test Generation by Merging of Functional Broadside Test Cubes

    Page(s): 1570 - 1582
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3745 KB) |  | HTML iconHTML  

    This paper describes a low-power test generation procedure, which targets the switching activity during the fast functional clock cycles of broadside tests. The procedure is based on merging of test cubes that it extracts from functional broadside tests. The use of test cube merging supports test compaction and it can be used for accommodating the constraints of test data compression. The use of functional broadside tests provides a target for the switching activity of low-power tests, which does not exceed the switching activity that is possible during functional operation, or that the circuit is designed for. The use of test cubes that are extracted from functional broadside tests is a unique feature of this procedure. It ensures that the low-power tests would create functional operation conditions in subcircuits that are defined by the test cubes. Experimental results show that the procedure detects all or almost all the transition faults that are detectable by arbitrary (functional and nonfunctional) broadside tests in benchmark circuits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FPGA-Based Bit Error Rate Performance Measurement of Wireless Systems

    Page(s): 1583 - 1592
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2041 KB) |  | HTML iconHTML  

    This paper presents the bit error rate (BER) performance validation of digital baseband communication systems on a field-programmable gate array (FPGA). The proposed BER tester (BERT) integrates fundamental baseband signal processing modules of a typical wireless communication system along with a realistic fading channel simulator and an accurate Gaussian noise generator onto a single FPGA to provide an accelerated and repeatable test environment in a laboratory setting. Using a developed graphical user interface, the error rate performance of single- and multiple-antenna systems over a wide range of parameters can be rapidly evaluated. The FPGA-based BERT should reduce the need for time-consuming software-based simulations, hence increasing the productivity. This FPGA-based solution is significantly more cost effective than conventional performance measurements made using expensive commercially available test equipment and channel simulators. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel Class of Energy-Efficient Very High-Speed Conditional Push–Pull Pulsed Latches

    Page(s): 1593 - 1605
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5408 KB) |  | HTML iconHTML  

    In this paper, a new class of pulsed latches is introduced and experimentally assessed in 65-nm CMOS. Its conditional push-pull pulsed latch topology is based on a push- pull final stage driven by two split paths with a conditional pulse generator. Two circuit implementations of the concept are discussed, with their main difference being in the pulse generator, which can be either shared (CSP3L) or not (CP3L). Measurements show that the proposed topology is very fast, as it outperforms the well-known transmission gate pulsed latch (TGPL) [1] by 1.5×-2×; hence the proposed pulsed latch has the highest performance ever reported. The proposed pulsed latch is also shown to significantly improve the energy efficiency compared to the state of the art. Indeed, a 2.3× improvement in ED3 product (energy × delay3) over TGPL was found for designs targeting minimum ED3. For designs targeting minimum ED, a 1.3× improvement was found in ED product. This comes at the cost of a 1.15×-1.35× cell area penalty, which translates into an overall area increase well below 1% in typical systems. Measurements on 256 replicas confirm that the above benefits are kept in the presence of variations. Accordingly, the proposed class of pulsed latches goes beyond the current state of the art and is well suited for VLSI systems that require both high performance and energy efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System Level Methodology for Interconnect Aware and Temperature Constrained Power Management of 3-D MP-SOCs

    Page(s): 1606 - 1619
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3505 KB) |  | HTML iconHTML  

    Modern 3-D multiprocessor systems-on-chip (MP-SoC) incorporate processing elements (PEs) and memories within die-stacks interconnected using through-silicon vias (TSVs). The resulting power density of these systems necessitates the inclusion of thermal effects in the architecture space exploration stage of the design process. The number and placement of TSVs influences the thermal conductivity in the vertical direction in die-stacks, and consequently these must be considered during thermal analysis. However, the special requirement of keep out zones (KOZs) for TSVs due to mechanical stress considerations complicates the design of the vertical interconnect, potentially impacting its electrical performance as well. This paper presents an integrated methodology that allows for TSV topology exploration to evaluate the best vertical interconnect structure while considering crosstalk, area overheads, and KOZ requirements using an initial system floorplan. After incorporating feedback from the exploration, the resulting vertical interconnect is included within a temperature-power simulation that estimates the thermal profile of the 3-D stack. Within this methodology, a novel power management scheme for 3-D MP-SoCs that considers both temperature as well as positional information and thermal relationships between PEs, while performing dynamic voltage-frequency scaling (DVFS), is introduced. The scheme effectively maintains smooth temperature profiles, decreases fluctuations in voltage-frequency levels, and increases the aggregate frequency of operation at a lower total power dissipation. Further, the scheme is applied to a stack partitioned into voltage islands, where it is shown to match the conventional per-core DVFS schemes in its performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Offset-Canceling Triple-Stage Sensing Circuit for Deep Submicrometer STT-RAM

    Page(s): 1620 - 1624
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2685 KB) |  | HTML iconHTML  

    Spin-transfer torque random access memory (STT-RAM) is considered to be a leading candidate for next-generation memory. As technology scales, however, the sensing margin of STT-RAM is significantly degraded because of increased process variation. Furthermore, the sensing current should be <;20 μA to protect the read disturbance in the beyond 45-nm technology, leading to a further decrease in the sensing margin. To achieve a target yield of six sigma in the beyond 45-nm technology with a sensing current of <;20 μA, an offset-canceling triple-stage (OCTS) sensing circuit is proposed in this brief. The OCTS sensing circuit can overcome the sensing margin and read disturbance problems by sacrificing the sensing time. Monte Carlo HSPICE simulation results using a 45-nm technology model show that the OCTS sensing circuit achieves a target yield of six sigma (96.74% for 32 Mb) with a sensing current of 20 μA and a sensing time of 6.4 ns. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Input Vector Monitoring Concurrent BIST Architecture Using SRAM Cells

    Page(s): 1625 - 1629
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (902 KB) |  | HTML iconHTML  

    Input vector monitoring concurrent built-in self test (BIST) schemes perform testing during the normal operation of the circuit without imposing a need to set the circuit offline to perform the test. These schemes are evaluated based on the hardware overhead and the concurrent test latency (CTL), i.e., the time required for the test to complete, whereas the circuit operates normally. In this brief, we present a novel input vector monitoring concurrent BIST scheme, which is based on the idea of monitoring a set (called window) of vectors reaching the circuit inputs during normal operation, and the use of a static-RAM-like structure to store the relative locations of the vectors that reach the circuit inputs in the examined window; the proposed scheme is shown to perform significantly better than previously proposed schemes with respect to the hardware overhead and CTL tradeoff. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • STT-MRAM Sensing Circuit With Self-Body Biasing in Deep Submicron Technologies

    Page(s): 1630 - 1634
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (778 KB) |  | HTML iconHTML  

    Conventional spin transfer torque MRAM sensing circuits suffer from a small sensing margin and a large sensing margin variation in deep submicron technologies. The small sensing margin issue becomes worse in the low-leakage process technology due to the higher threshold voltage. In this brief, the self-body biasing (self-BB) scheme is proposed to resolve the small sensing margin issue. In the self-BB scheme, the threshold voltage of load pMOS is adaptively controlled by body bias. Although leakage current 'lows through the body due to the positive junction bias voltage, it is well suppressed to less than 1% (0.3 μA) of the sensing current and 'lows only during the sensing operation. To reduce large sensing margin variation, the source degeneration scheme with the longer channel length is used for the load pMOS. The HSPICE simulation results obtained using low-leakage 45-nm model parameters show that the proposed sensing circuit achieves a probability of the read access pass yield (PRAPY Memory) of 100%, whereas the sensing circuit without BB scheme has an PRAPY Memory of 5.8% for a 32-Mb memory with a sensing time of 2 ns. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Method to Extend Orthogonal Latin Square Codes

    Page(s): 1635 - 1639
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB) |  | HTML iconHTML  

    Error correction codes (ECCs) are commonly used to protect memories from errors. As multibit errors become more frequent, single error correction codes are not enough and more advanced ECCs are needed. The use of advanced ECCs in memories is, however, limited by their decoding complexity. In this context, one-step majority logic decodable (OS-MLD) codes are an interesting option as the decoding is simple and can be implemented with low delay. Orthogonal Latin squares (OLS) codes are OS-MLD and have been recently considered to protect caches and memories. The main advantage of OLS codes is that they provide a wide range of choices for the block size and the error correction capabilities. In this brief, a method to extend OLS codes is presented. The proposed method enables the extension of the data block size that can be protected with a given number of parity bits thus reducing the overhead. The extended codes are also OS-MLD and have a similar decoding complexity to that of the original OLS codes. The proposed codes have been implemented to evaluate the circuit area and delay needed for different block sizes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved Analytical Delay Models for RC-Coupled Interconnects

    Page(s): 1639 - 1644
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (727 KB) |  | HTML iconHTML  

    As process technologies scale into deep submicrometer region, crosstalk delay is becoming increasingly severe, especially for global on-chip buses. To cope with this problem, accurate delay models of coupled interconnects are needed. In particular, delay models based on analytical approaches are desirable, because they are not only largely transparent to technology, but also explicitly establish the connections between delays of coupled interconnects and transition patterns, thereby enabling crosstalk alleviating techniques such as crosstalk avoidance codes. Unfortunately, existing analytical delay models, such as the widely cited model in [1], have limited accuracy and do not account for loading capacitance. In this brief, we propose analytical delay models for coupled interconnects that address these disadvantages. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Beware the Dynamic C-Element

    Page(s): 1644 - 1647
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (719 KB) |  | HTML iconHTML  

    The C-element is a well known component of asynchronous circuits. To overcome problems of current CMOS technologies, its use has even been extended to specific domains of the synchronous paradigm, such as clock generation, clock gating, and registers. An economical implementation of this component is the dynamic C-element. Its advantages over static implementations are reduced power, transition, and propagation delays as well as lower silicon area. Yet, research evaluating its electrical behavior, functionality, and robustness is scarce. This brief presents an in-depth analysis of the dynamic C-element electrical behavior. The analysis points to a constrained nature, which can lead to undefined output logic values, as well as excessive static power consumption. The brief also proposes a technique for robust design of such components that avoids such undefined values. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-Complexity Low-Latency Architecture for Matching of Data Encoded With Hard Systematic Error-Correcting Codes

    Page(s): 1648 - 1652
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (923 KB) |  | HTML iconHTML  

    A new architecture for matching the data protected with an error-correcting code (ECC) is presented in this brief to reduce latency and complexity. Based on the fact that the codeword of an ECC is usually represented in a systematic form consisting of the raw data and the parity information generated by encoding, the proposed architecture parallelizes the comparison of the data and that of the parity information. To further reduce the latency and complexity, in addition, a new butterfly-formed weight accumulator (BWA) is proposed for the efficient computation of the Hamming distance. Grounded on the BWA, the proposed architecture examines whether the incoming data matches the stored data if a certain number of erroneous bits are corrected. For a (40, 33) code, the proposed architecture reduces the latency and the hardware complexity by ~32% and 9%, respectively, compared with the most recent implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (94 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu