By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 8 • Date Aug. 2014

Filter Results

Displaying Results 1 - 19 of 19
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (412 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • MASTER: A Multicore Cache Energy-Saving Technique Using Dynamic Cache Reconfiguration

    Page(s): 1653 - 1665
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1268 KB) |  | HTML iconHTML  

    With increasing number of on-chip cores and CMOS scaling, the size of last-level caches (LLCs) is on the rise and hence, managing their leakage energy consumption has become vital for continuing to scale performance. In multicore systems, the locality of memory access stream is significantly reduced because of multiplexing of access streams from different running programs and hence, leakage energy-saving techniques such as decay cache, which rely on memory access locality, do not save a large amount of energy. The techniques based on way level allocation provide very coarse granularity and the techniques based on offline profiling become infeasible to use for large number of cores. We present a multicore cache energy saving technique using dynamic cache reconfiguration (MASTER) that uses online profiling to predict energy consumption of running programs at multiple LLC sizes. Using these estimates, suitable cache quotas are allocated to different programs using cache coloring scheme and the unused LLC space is turned off to save energy. Even for four core systems, the implementation overhead of MASTER is only 0.8% of L2 size. We evaluate MASTER using out-of-order simulations with multiprogrammed workloads from SPEC2006 and compare it with conventional cache leakage energy-saving techniques. The results show that MASTER gives the highest saving in energy and does not harm performance or cause unfairness. For twoand four-core simulations, the average savings in memory subsystem (which includes LLC and main memory) energy over shared baseline LLC are 15% and 11%, respectively. Also, the average values of weighted speedup and fair speedup are close to one (≥0.98). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Current-Mode Synthetic Control Technique for High-Efficiency DC–DC Boost Converters Over a Wide Load Range

    Page(s): 1666 - 1678
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2968 KB) |  | HTML iconHTML  

    This paper proposes a current-mode synthetic control (CSC) technique for the design of boost converters to overcome the difficulty in designing a current-ripple hysteresis boost converter and to maintain high conversion efficiency over a wide load range. The CSC technique has a high accuracy because of the additional voltage path through the error amplifier. A smooth load transient response is maintained when the operation transits from continuous conduction mode with a nearly constant switching frequency to discontinuous conduction mode with a load-dependent switching frequency. Generally, ripple performance, light-load efficiency, and switching frequency are traded off in the design of hysteresis control regulator. In this paper, a balance among the load-dependent switching frequencies at light loads results in high power conversion efficiency compared with conventional pulsewidth modulation converter and attains compact ripple performance. The experimental results show that the output voltage ripple can be kept ${<;}{rm 50}~{rm mV}$ over a wide load current range from 10 to 400 mA, where as power conversion efficiency is maintained at 78% at a load current of 10 mA when the switching frequency is decreased from 5 to 2 MHz. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • State-Aware Dynamic Frequency Selection Scheme for Energy-Harvesting Real-Time Systems

    Page(s): 1679 - 1692
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2950 KB) |  | HTML iconHTML  

    With the increasing deployment of battery-powered embedded systems such as sensor nodes in extreme environments, harvesting renewable energy from ambient environments to achieve near perpetual operation of a system has attracted considerable research efforts in the recent past. In this paper, the authors propose a dynamic frequency selection scheme for energy-harvesting real-time systems. The proposed scheme characterizes the state of a system from the perspectives of system utilization and harvested energy with respect to a certain period of time. A portion of the battery energy is allocated to a group of tasks in the period of time by jointly considering the system utilization and energy state, and the operating frequency is selected based on the allocated energy. The derived operating frequency is fine tuned to further enhance energy efficiency when overflow occurs. Simulation results demonstrate the effectiveness of the proposed scheme. Compared with the state-of-the-art scheme that decouples the energy and timing design constraints, the proposed scheme achieves comparable deadline miss rate when the battery capacity is lower than 5000 J and achieves about 11.5% lower deadline miss rate when the battery capacity is greater than 30 000 J. The proposed scheme also outperforms the benchmarking scheme in energy efficiency. When the battery is near a full charge or overflow occurs, the proposed scheme incurs less energy waste when compared with the benchmarking algorithm, which is favorable for autonomous operation of the system. Furthermore, the time complexity of the proposed scheme is one order of magnitude lower than that of the benchmarking scheme, which makes the proposed scheme well suited for dynamic scheduling. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Single-Inductor Dual-Input Dual-Output DC–DC Converter With PWM Control for Solar Energy Harvesting System

    Page(s): 1693 - 1704
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3812 KB) |  | HTML iconHTML  

    A novel single-inductor dual-input dual-output dc-dc converter with pulse width modulation control is proposed for a solar energy harvesting system. The first input of the converter is from photovoltaic (PV) cells and the second input is a rechargeable battery. Apart from the conventional role of providing a regulated output voltage to power the loading circuits, the converter also clamps the PV cells' voltage to the maximum power point value to maximize efficiency. When the PV cells harvest more power than the load, the surplus energy is used to charge the rechargeable battery. When the PV cells cannot harvest sufficient power, the converter schedules the PV cells and the battery to power the load together. A test chip was fabricated using a 0.35-μm CMOS process and measured to verify the operation of the proposed dc-dc converter and to demonstrate the power transfer efficiency of the solar power management system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy/Lifetime Cooptimization by Cache Partitioning With Graceful Performance Degradation

    Page(s): 1705 - 1715
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3533 KB) |  | HTML iconHTML  

    Aging of transistors can adversely impact the long-term reliability of devices in subnanometric technologies. Without any countermeasure, the first component that becomes unreliable will determine the life span of an entire device. The effect is more susceptible in memory arrays, where failure of a single SRAM cell would cause the failure of the whole system. In this paper, we propose a reliability management technique based on the idea of cache partitioning, which deals with cell failures by gracefully degrading its performance. By this partitioning-based strategy, various subblocks will become unreliable at different times, and the cache will keep functioning with reduced efficiency. A coarse-grain implementation of this approach, with the use of a smart aging-driven partitioning algorithm, provides a lifetime extension of more than 2× . On the other hand, a fine-grain strategy with a single cache line as a unit of power management, stretch the lifetime to its maximum limits with an addition of small hardware overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Self-Regulated Charge Pump With Improved Immunity to PVT Variations

    Page(s): 1716 - 1726
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3262 KB) |  | HTML iconHTML  

    A charge pump (CP) design that simultaneously reduces current mismatch and sensitivity to process, voltage, and temperature variations is presented. The self-biased and self-regulated circuit uses a dual-feedback mechanism with a replica CP to dynamically stabilize and equalize the currents over a wide output voltage range under varying operating conditions. The circuit is further analyzed and designed in 90-nm CMOS. Extensive corner and Monte Carlo runs are used to verify its current matching capabilities as well as its stability and response within a phase locked loop. The CP achieves near perfect current matching for a wide output voltage range and for temperatures ranging from -30 °C to 90 °C and a power supply between 1.08 and 1.32 V. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Evaluation of Confidence-Driven Error-Resilient Systems

    Page(s): 1727 - 1737
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2548 KB) |  | HTML iconHTML  

    Deeply scaled CMOS circuits are increasingly susceptible to transient faults and soft errors; emerging post-CMOS devices can be more vulnerable, sometimes exhibiting erratic errors of arbitrary duration. Applying timing and supply voltage margin is wasteful and becoming ineffective, and conventional checking and sparing techniques provide only a limited error coverage against widely varying errors. We propose a confidence-driven computing (CDC) model for an adaptive protection against nondeterministic errors. The CDC model employs fine-grained temporal redundancy and confidence checking for a faster adaptation and tunable reliability. The CDC model can be extended to deeply scaled CMOS circuits that are mainly affected by transient faults and soft errors, where an early checking (EC) technique can be used to perform independent error checking for more flexibility and better performance. To evaluate the CDC model, we apply a sample-based field-programmable gate array emulation along with real-time error injection. The CDC model is shown to adapt to fluctuating error rates and enhance the system reliability by effectively trading off performance. To evaluate the EC technique at a finer time scale, we create a new event-based simulation to capture path delay distribution, error model, and their interactions. The EC technique improves the system reliability by more than four orders of magnitude when errors are of short duration. Both the CDC model and the EC technique are synthesized in a 45-nm CMOS technology for cost estimates: 1) the area overhead is as low as 12% and 2) energy overhead can be limited to 19%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Test-Quality Optimization for Variable $n$ -Detections of Transition Faults

    Page(s): 1738 - 1749
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2016 KB) |  | HTML iconHTML  

    Aggressive technology scaling in modern chips resulted in complicated faulty timing behaviors, which necessitate undesirable long development cycle and high test volumes to ensure product quality. To reduce the test time, cost-effective and timing-efficient test selection algorithms are used to choose optimal test inputs from a large-volume test set. In this paper, we define an approximate longest sensitized path (ALSP) metric to derive the longest sensitized path for all transition faults (TFs) from the detectability of TFs with very low computational complexity. With the ALSP metric, a general public utilities-based parallel test selection method is proposed to choose a small test set with high delay test quality from the timing-unaware n-detection test set. Our results demonstrate the comparison with a commercial automatic test pattern generation tool and a previous timing-aware test selection method targeting small delay defects, and confirm that our test selection algorithm can achieve better delay test coverage and higher n -detection fault coverage with steeper fault coverage curves of ordered patterns, for the same pattern count. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Backend Dielectric Reliability Full Chip Simulator

    Page(s): 1750 - 1762
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2886 KB) |  | HTML iconHTML  

    Backend dielectric breakdown degrades the reliability of circuits. A methodology to estimate chip lifetime because of backend dielectric breakdown is presented. It incorporates failures because of parallel tracks, the width effect, and field enhancement due to line ends. It also includes the operating temperature and activity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability-Aware Design Flow for Silicon Photonics On-Chip Interconnect

    Page(s): 1763 - 1776
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1400 KB) |  | HTML iconHTML  

    Intercore communication in many-core processors presently faces scalability issues similar to those that plagued intracity telecommunications in the 1960s. Optical communication promises to address these challenges now, as then, by providing low latency, high bandwidth, and low power communication. Silicon photonic devices presently are vulnerable to fabrication and temperature-induced variability. Our fabrication and measurement results indicate that such variations degrade interconnection performance and, in extreme cases, the interconnection may fail to function at all. In this paper, we propose a reliability-aware design flow to address variation-induced reliability issues. To mitigate effects of variations, limits of device design techniques are analyzed and requirements from architecture-level design are revealed. Based on this flow, a multilevel reliability management solution is proposed, which includes athermal coating at fabrication-level, voltage tuning at device-level, as well as channel hopping at architecture-level. Simulation results indicate that our solution can fully compensate variations thereby sustaining reliable on-chip optical communication with power efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reasoning and Learning-Based Dynamic Codec Reconfiguration for Varying Processing Requirements in Network-on-Chip

    Page(s): 1777 - 1790
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2388 KB) |  | HTML iconHTML  

    Crosstalk interferences and high dynamic power consumption in a network-on-chip (NoC) are two increasingly problematic design issues. Using data codecs can reduce the switching activities on wires that cause crosstalk interferences and high dynamic power. However, data codecs have different overheads in terms of area and performance, and varying capabilities in reducing crosstalk and dynamic power. To adapt to the wide range of processing requirements incurred by applications and operating environments, a reasoning and learning (REAL) framework is proposed for a reconfigurable NoC. REAL dynamically investigates the tradeoffs among reliability, dynamic power reduction, performance, and hardware resource usages to configure the reconfigurable NoC with an appropriate data codec at runtime. As a proof of concept, a 3 × 3 reconfigurable NoC was implemented on Xilinx Virtex-4 field-programmable gate array, which required 8.2% lesser number of slices compared with a conventional NoC. Experiments show that at the same overheads of performance and hardware resources the reconfigurable NoC induces a higher probability toward the reduction of crosstalk interferences and dynamic power consumption. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • REC-STA: Reconfigurable and Efficient Chip Design With SMO-Based Training Accelerator

    Page(s): 1791 - 1802
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3979 KB) |  | HTML iconHTML  

    Sequential minimal optimization (SMO) and Karush-Kuhn-Tucker condition are often used to solve learning problems in support vector machines. However, during hardware implementation of the SMO algorithm, enhancing chip performance without excessively increasing chip area is often a crucial issue. The solution proposed in this paper is a novel reconfigurable and efficient chip design with SMO-based training accelerator (REC-STA). Two novel methods used in the proposed REC-STA are trimode coarse-grained reconfigurable architecture (TCRA) and triple finite-state-machine with dynamic scheduling The first method modifies the baseline SMO design by developing trimode reconfigurable architectures with parallel and pipeline computing capabilities. The second method provides a schedule for efficient reconfiguration of the TCRA. Use of these methods can remove kernel cache design. For chip manufacturing, the implementation of the REC-STA is synthesized, placed, and routed using the TSMC 0.18-μm technology library. The core size is 2.94 mm × 2.94 mm and the power consumption is 77.3 mW. Compared with the baseline design, the FPGA simulation results show that the proposed architecture requires 50% less memory and 31% fewer gate counts but provides a 16-fold improvement in training performance. The experimental results confirm the efficacy of the proposed architecture and methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Orchestrating Cache Management and Memory Scheduling for GPGPU Applications

    Page(s): 1803 - 1814
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4682 KB) |  | HTML iconHTML  

    Modern graphics processing units (GPUs) are delivering tremendous computing horsepower by running tens of thousands of threads concurrently. The massively parallel execution model has been effective to hide the long latency of off-chip memory accesses in graphics and other general computing applications exhibiting regular memory behaviors. With the fast-growing demand for general purpose computing on GPUs (GPGPU), GPU workloads are becoming highly diversified, and thus requiring a synergistic coordination of both computing and memory resources to unleash the computing power of GPUs. Accordingly, recent graphics processors begin to integrate an on-die level-2 (L2) cache. The huge number of threads on GPUs, however, poses significant challenges to L2 cache design. The experiments on a variety of GPGPU applications reveal that the L2 cache may or may not improve the overall performance depending on the characteristics of applications. In this paper, we propose efficient techniques to improve GPGPU performance by orchestrating both L2 cache and memory in a unified framework. The basic philosophy is to exploit the temporal locality among the massive number of concurrent memory requests and minimize the impact of memory divergence behaviors among simultaneously executed groups of threads. Our major contributions are twofold. First, a priority-based cache management is proposed to maximize the chance of frequently revisited data to be kept in the cache. Second, an effective memory scheduling is introduced to reorder memory requests in the memory controller according to the divergence behavior for reducing average waiting time of warps. Simulation results reveal that our techniques enhance the overall performance by 10% on average for memory intensive benchmarks, whereas the maximum gain can be up to 30%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Optimization of Nonvolatile Multibit 1T1R Resistive RAM

    Page(s): 1815 - 1828
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2128 KB) |  | HTML iconHTML  

    Memristor-based random access memory (RAM) is being explored as a potential replacement for flash memory to sustain the historic trends in the improvement of density, access time, and energy consumption of nonvolatile memory. In this paper, we present the detailed functionality of multibit one-transistor one-memristor (1T1R) cell-based memory arrays, and propose circuit-level performance and energy models for an individual memory cell and the memory array as a whole. We consider titanium dioxide (TiO2)and hafnium oxide (HfOx)based memristors, and for these technologies, there is a sub-10% difference between energy and performance computed using our models and HSPICE simulations. Using a performance-driven design approach, the energy-optimized TiO2-based resistive RAM (RRAM) array consumes the least write (4.06 pJ/b) and read energy (188 fJ/b) when storing 3 b/cell for 100-ns write and 1-ns read access times. Similarly, HfOx-based RRAM array consumes the least write (365 fJ/b) and read energy (173 fJ/b) when storing 3 b/cell for 1-ns write and 200-ns read access times. We also present a detailed analysis of the implications of process, voltage, and temperature variations on the performance and energy consumption of a multibit RRAM cell. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler-Assisted STT-RAM-Based Hybrid Cache for Energy Efficient Embedded Systems

    Page(s): 1829 - 1840
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2866 KB) |  | HTML iconHTML  

    Hybrid caches consisting of static RAM (SRAM) and spin-torque transfer (STT)-RAM have been proposed recently for energy efficiency. To explore the advantages of hybrid cache, most of the management strategies for hybrid caches employ migration-based techniques to dynamically move write-intensive data from STT-RAM to SRAM. These techniques involve additional access operations, and thus lead to extra overheads. In this paper, we propose two compilation-based approaches to improve the energy efficiency and performance of STT-RAM-based hybrid cache by reducing the migration overheads. The first approach, migration-aware data layout, is proposed to reduce the migrations by rearranging the data layout. The second approach, migration-aware cache locking, is proposed to reduce the migrations by locking migration-intensive memory blocks into SRAM part of hybrid cache. Furthermore, experiments show that these two methods can be combined to reduce more migrations. The reduction of migration overheads can improve the energy efficiency and performance of STT-RAM-based hybrid cache. Experimental results show that, combining these two methods, on average, the number of write operations on STT-RAM is reduced by 17.6%, the number of migrations is reduced by 38.9%, the total dynamic energy is reduced by 15.6%, and the total access latency is reduced by 13.8%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compact NOI Nanodevice Simulation

    Page(s): 1841 - 1844
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1964 KB) |  | HTML iconHTML  

    The nothing-on-insulator (NOI) transistor was recently proposed and is based on the conduction through a vacuum nanocavity. The main contributions of this brief are: 1) the NOI device study on alternative materials and 2) New key device parameters. The simulations reveal optimum NOI structure with 15-nm film thicknesses, possessing a subthreshold drain slope of 50 mV/dec, ION/IOFF ratio of 1012, and a switching time under 0.3 ps. Although the NOI device is related to the tunnel or vacuum device as phenomena, it distinctly evolves toward a compact nanostructure, with the main advantage of nanometric sizes on all three directions, becoming interesting for very large-scale integration systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Call for applications and nominations

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (420 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu