By Topic

Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

Popular Articles (November 2014)

Includes the top 50 most frequently downloaded documents for this publication according to the most recent monthly usage statistics.
  • 1. Design Framework to Overcome Aging Degradation of the 16 nm VLSI Technology Circuits

    Page(s): 691 - 703
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (20219 KB) |  | HTML iconHTML  

    Intensive scaling for VLSI circuits is a key factor for gaining outstanding performance. However, this scaling has huge negative impact on the circuit reliability, as it increases the undesired effect of aging degradation on ultradeep submicrometer technologies. Nowadays, Bias Temperature Instability (BTI) aging process has a major negative impact on VLSI circuits reliability. This paper presents a comprehensive framework that assists in designing the fortified VLSI circuits against BTI aging degradation. The framework contains: 1) the novel circuit level techniques that eliminate the effect of BTI (these techniques successfully decrease the power dissipation by 36% and enhance the reliability of VLSI circuits); 2) the evaluation of the reliability of all circuit level techniques used to eliminate BTI aging degradation for 16 nm CMOS technology; and 3) the comparison between the efficiency of all circuit level techniques in terms of power consumption and area. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2. Low-Power Digital Signal Processing Using Approximate Adders

    Page(s): 124 - 137
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (12003 KB) |  | HTML iconHTML  

    Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage overscaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units and evaluate them to demonstrate the efficacy of our approach. We also derive simple mathematical models for error and power consumption of these approximate adders. Furthermore, we demonstrate the utility of these approximate adders in two digital signal processing architectures (discrete cosine transform and finite impulse response filter) with specific quality constraints. Simulation results indicate up to 69% power savings using the proposed approximate adders, when compared to existing implementations using accurate adders. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3. Synthesis of Dual-Rail Adiabatic Logic for Low Power Security Applications

    Page(s): 975 - 988
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2531 KB) |  | HTML iconHTML  

    Programmable reversible logic is emerging as a prospective logic design style for implementation in low power, low frequency applications where minimal impact on circuit heat generation is desirable, such as mitigation of differential power analysis attacks. Adiabatic logic is an implementation of reversible logic in CMOS where the current flow through the circuit is controlled such that the energy dissipation due to switching and capacitor dissipation is minimized. Recent advances in dual-rail adiabatic logic show reduction in average and differential power, making this design methodology advantageous in applications where security is the primary design metric and operating frequency is slower, such as Smart Cards. In this paper, we present an algorithm for synthesis of adiabatic circuits in CMOS. Then, using the ESPRESSO heuristic for minimization of Boolean functions method on each output node, we reduce the size of the synthesized circuit. Our approach correlates the horizontal offsets in the permutation matrix with the necessary switches required for synthesis instead of using a library of equivalent functions. The synthesis results show that, on average, the proposed algorithm represents an improvement of 36% over the best known reversible designs with the optimized dual-rail cell libraries. Then, we present an adiabatic S-box which significantly reduces energy imbalance compared to previous benchmarks. The design is capable of forward encryption and reverse decryption with minimal overhead, allowing for efficient hardware reuse. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 4. Application-Specific Wear Leveling for Extending Lifetime of Phase Change Memory in Embedded Systems

    Page(s): 1450 - 1462
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2190 KB) |  | HTML iconHTML  

    Phase change memory (PCM) has been proposed to replace NOR flash and DRAM in embedded systems because of its attractive features. However, the endurance of PCM greatly limits its adoption in embedded systems. As most embedded systems are application-oriented, we can tackle the endurance problem of PCM by exploring application-specific features such as fixed access patterns and update frequencies. In this paper, we propose an application-specific wear leveling technique, called Curling-PCM, to evenly distribute write activities across the whole PCM chip to improve the endurance of PCM in embedded systems. The basic idea is to exploit application-specific features in embedded systems and periodically move the hot region across the whole PCM chip. To reduce the overhead of moving the hot region and improve the performance of PCM-based embedded systems, a fine-grained partial wear leveling policy is proposed for Curling-PCM, by which only part of the hot region is moved during each request handling period. Experimental results show that Curling-PCM can effectively evenly distribute write traffic for a prime application of PCM in embedded systems. We expect this paper can serve as a first step toward the full exploration of application-specific features in PCM-based embedded systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 5. High-Level Synthesis for FPGAs: From Prototyping to Deployment

    Page(s): 473 - 491
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1663 KB) |  | HTML iconHTML  

    Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 6. Enabling High-Dimensional Hierarchical Uncertainty Quantification by ANOVA and Tensor-Train Decomposition

    Page(s): 63 - 76
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1730 KB)  

    Hierarchical uncertainty quantification can reduce the computational cost of stochastic circuit simulation by employing spectral methods at different levels. This paper presents an efficient framework to simulate hierarchically some challenging stochastic circuits/systems that include high-dimensional subsystems. Due to the high parameter dimensionality, it is challenging to both extract surrogate models at the low level of the design hierarchy and to handle them in the high-level simulation. In this paper, we develop an efficient analysis of variance-based stochastic circuit/microelectromechanical systems simulator to efficiently extract the surrogate models at the low level. In order to avoid the curse of dimensionality, we employ tensor-train decomposition at the high level to construct the basis functions and Gauss quadrature points. As a demonstration, we verify our algorithm on a stochastic oscillator with four MEMS capacitors and 184 random parameters. This challenging example is efficiently simulated by our simulator at the cost of only 10min in MATLAB on a regular personal computer. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 7. Reliability-Driven Software Transformations for Unreliable Hardware

    Page(s): 1597 - 1610
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (15746 KB) |  | HTML iconHTML  

    We propose multiple reliability-driven software transformations targeting unreliable hardware. These transformations reduce the executions of critical instructions and spatial/temporal vulnerabilities of different instructions with respect to different processor components. The goal is to lower the application's susceptibility toward failures. Compared to performance-optimized compilation, our method incurs 60% lower application failures, averaged over various fault injection scenarios and fault rates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 8. Measuring the Gap Between FPGAs and ASICs

    Page(s): 203 - 215
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (258 KB) |  | HTML iconHTML  

    This paper presents experimental measurements of the differences between a 90-nm CMOS field programmable gate array (FPGA) and 90-nm CMOS standard-cell application-specific integrated circuits (ASICs) in terms of logic density, circuit speed, and power consumption for core logic. We are motivated to make these measurements to enable system designers to make better informed choices between these two media and to give insight to FPGA makers on the deficiencies to attack and, thereby, improve FPGAs. We describe the methodology by which the measurements were obtained and show that, for circuits containing only look-up table-based logic and flip-flops, the ratio of silicon area required to implement them in FPGAs and ASICs is on average 35. Modern FPGAs also contain "hard" blocks such as multiplier/accumulators and block memories. We find that these blocks reduce this average area gap significantly to as little as 18 for our benchmarks, and we estimate that extensive use of these hard blocks could potentially lower the gap to below five. The ratio of critical-path delay, from FPGA to ASIC, is roughly three to four with less influence from block memory and hard multipliers. The dynamic power consumption ratio is approximately 14 times and, with hard blocks, this gap generally becomes smaller View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 9. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory

    Page(s): 994 - 1007
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5249 KB) |  | HTML iconHTML  

    Various new nonvolatile memory (NVM) technologies have emerged recently. Among all the investigated new NVM candidate technologies, spin-torque-transfer memory (STT-RAM, or MRAM), phase-change random-access memory (PCRAM), and resistive random-access memory (ReRAM) are regarded as the most promising candidates. As the ultimate goal of this NVM research is to deploy them into multiple levels in the memory hierarchy, it is necessary to explore the wide NVM design space and find the proper implementation at different memory hierarchy levels from highly latency-optimized caches to highly density- optimized secondary storage. While abundant tools are available as SRAM/DRAM design assistants, similar tools for NVM designs are currently missing. Thus, in this paper, we develop NVSim, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 10. Statistical Criticality Computation Using the Circuit Delay

    Page(s): 717 - 727
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7046 KB)  

    The statistical nature of gate delays in current day technologies necessitates the use of measures, such as path criticality and node/edge criticality for timing optimization. Node criticalities are typically computed using the complementary path delay. An alternative approach to compute the criticality using the circuit delay has been recently proposed. In this paper, we discuss in detail, the use of circuit delay to compute node criticalities and show that the criticality thus found is not equal to the conventional measure found using complementary path delay. However, there is a monotonic relationship between them and the two measures can be used interchangeably. We derive new bounds for the global criticality and propose a pruning algorithm based on these bounds to improve the accuracy and speed of computation. The use of this pruning technique results in a significant speedup in criticality computations. We obtain an order of magnitude average speedup for ISCAS benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 11. Towards Optimal Performance-Area Trade-Off in Adders by Synthesis of Parallel Prefix Structures

    Page(s): 1517 - 1530
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1887 KB) |  | HTML iconHTML  

    This paper proposes an efficient algorithm to synthesize prefix graph structures that yield adders with the best performance-area trade-off. For designing a parallel prefix adder of a given bit-width, our approach generates prefix graph structures to optimize an objective function such as size of prefix graph subject to constraints like bit-wise output logic level. Given bit-width n and level (L) restriction, our algorithm excels the existing algorithms in minimizing the size of the prefix graph. We also prove its size-optimality when n is a power of two and L= log2n. Besides prefix graph size optimization and having the best performance-area trade-off, our approach, unlike existing techniques, can 1) handle more complex constraints such as maximum node fanout or wire-length that impact the performance/area of a design and 2) generate several feasible solutions that minimize the objective function. Generating several size-optimal solutions provides the option to choose adder designs that mitigate constraints such as wire congestion or power consumption that are difficult to model as constraints during logic synthesis. Experimental results demonstrate that our approach improves performance by 3% and area by 9% over even a 64-bit full custom designed adder implemented in an industrial high-performance design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 12. System-Level Modeling and Analysis of Thermal Effects in WDM-Based Optical Networks-on-Chip

    Page(s): 1718 - 1731
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1565 KB) |  | HTML iconHTML  

    Multiprocessor systems-on-chip show a trend toward integration of tens and hundreds of processor cores on a single chip. With the development of silicon photonics for short-haul optical communication, wavelength division multiplexing (WDM)-based optical networks-on-chip (ONoCs) are emerging on-chip communication architectures that can potentially offer high bandwidth and power efficiency. Thermal sensitivity of photonic devices is one of the main concerns about the on-chip optical interconnects. We systematically modeled thermal effects in optical links in WDM-based ONoCs. Based on the proposed thermal models, we developed OTemp, an optical thermal effect modeling platform for optical links in both WDM-based ONoCs and single-wavelength ONoCs. OTemp can be used to simulate the power consumption as well as optical power loss for optical links under temperature variations. We use case studies to quantitatively analyze the worst-case power consumption for one wavelength in an eight-wavelength WDM-based optical link under different configurations of low-temperature-dependence techniques. Results show that the worst-case power consumption increases dramatically with on-chip temperature variations. Thermal-based adjustment and optimal device settings can help reduce power consumption under temperature variations. Assume that off-chip vertical-cavity surface-emitting lasers are used as the laser source with WDM channel spacing of 1 nm, if we use thermal-based adjustment with guard rings for channel remapping, the worst-case total power consumption is 6.7 pJ/bit under the maximum temperature variation of 60°C; larger channel spacing would result in a larger worst-case power consumption in this case. If we use thermal-based adjustment without channel remapping, the worst-case total power consumption is around 9.8 pJ/bit under the maximum temperature variation of 60°C; in this case, the worst-case power consumption would benefit from a larger channel spacing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 13. Cell-Aware Test

    Page(s): 1396 - 1409
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (3271 KB) |  | HTML iconHTML  

    This paper describes the new cell-aware test (CAT) approach, which enables a transistor-level and defect-based ATPG on full CMOS-based designs to significantly reduce the defect rate of manufactured ICs, including FinFET technologies. We present results from a defect-oriented CAT fault model generation for 1,940 standard library cells, as well as the application of CAT to several industrial designs. We present high volume production test results from a 32 nm notebook processor and from a 350 nm automotive design, including the achieved defect rate reduction in defective-parts-per-million. We also present CAT diagnosis and physical failure analysis results from one failing part and give an outlook for using the functionality for quickly ramping up the yield in advanced technology nodes. View full abstract»

    Open Access
  • 14. Ant Colony Optimization-Based Fault-Aware Routing in Mesh-Based Network-on-Chip Systems

    Page(s): 1693 - 1705
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2151 KB) |  | HTML iconHTML  

    The advanced deep submicrometer technology increases the risk of failure for on-chip components. In advanced network-on-chip (NoC) systems, the failure constrains the on-chip bandwidth and network throughput. Fault-tolerant routing algorithms aim to alleviate the impact on performance. However, few works have integrated the congestion-, deadlock-, and fault-awareness information in channel evaluation function to avoid the hotspot around the faulty router. To solve this problem, we propose the ant colony optimization-based fault-aware routing (ACO-FAR) algorithm for load balancing in faulty networks. The behavior of an ant colony while facing an obstacle (failure in NoC) can be described in three steps: 1) encounter; 2) search; and 3) select. We implement the corresponding mechanisms as: 1) notification of fault information; 2) path searching mechanism; and 3) path selecting mechanism. With proposed ACO-FAR, the router can evaluate the available paths and detour packets through a less-congested fault-free path. The simulation results show that this paper has higher throughput than related works by 29.1%-66.5%. In addition, ACO-FAR can reduce the undelivered packet ratio to 0.5%-0.02% and balance the distribution of traffic flow in the faulty network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 15. Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives

    Page(s): 3 - 21
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (625 KB) |  | HTML iconHTML  

    To alleviate the complex communication problems that arise as the number of on-chip components increases, network-on-chip (NoC) architectures have been recently proposed to replace global interconnects. In this paper, we first provide a general description of NoC architectures and applications. Then, we enumerate several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation. Motivation, problem description, proposed approaches, and open issues are discussed for each problem from system, microarchitecture, and circuit perspectives. Finally, we address the interactions among these research problems and put the NoC design process into perspective. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 16. Modeling the “Effective capacitance” for the RC interconnect of CMOS gates

    Page(s): 1526 - 1535
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (872 KB)  

    With finer line widths and faster switching speeds, the resistance of on-chip metal interconnect is having a dominant impact on the timing behavior of logic gates. Specifically, the gates are switching faster and the interconnect delays are getting longer due to scaling. This results in a trend in which the RC interconnect delay is beginning to comprise a larger portion of the overall logic stage delay. This shift in relative delay dominance from the gate to the RC interconnect is increased by resistance shielding. That is, as the gate “resistance” gets smaller and the metal resistance gets larger, the gate no longer “sees” the total net capacitance and the gate delay may be significantly less than expected. This trend complicates the timing analysis of digital circuits, which relies upon simple, empirical gate delay equations for efficiency. In this paper, we develop an analytical expression for the “effective load capacitance” of a pc interconnect. In addition, when there is significant shielding, the response waveforms at the gate output may have a large exponential tail. We show that this waveform tail can strongly influence the delay of the RC interconnect. Therefore, we propose an extension of the effective capacitance equation that captures the complete waveform response accurately, with a two-piece gate-output-waveform approximation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 17. A Reliability-Aware Address Mapping Strategy for NAND Flash Memory Storage Systems

    Page(s): 1623 - 1631
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (933 KB) |  | HTML iconHTML  

    The increasing density of NAND flash memory leads to a dramatic increase in the bit error rate of flash, which greatly reduces the ability of error correcting codes (ECC) to handle multibit errors. NAND flash memory is normally used to store the file system metadata and page mapping information. Thus, a broken physical page containing metadata may cause an unintended and severe change in functionality of the entire flash. This paper presents Meta-Cure, a novel hardware and file system interface that transparently protects metadata in the presence of multibit faults. Meta-Cure exploits built-in ECC and replication in order to protect pages containing critical data, such as file system metadata. Redundant pairs are formed at run time and distributed to different physical pages to protect against failures. Meta-Cure requires no changes to the file system, on-chip hierarchy, or hardware implementation of flash memory chip. We evaluate Meta-Cure under a real-embedded platform using a variety of I/O traces. The evaluation platform adopts dual ARM Cortex A9 processor cores with 64 Gb NAND flash memory. We have evaluated the effectiveness of Meta-Cure on the new technology file system file system. Experimental results show that the proposed technique can reduce uncorrectable page errors by 70.38% with less than 7.86% time overhead in comparison with conventional error correction techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 18. Placement for Binary-Weighted Capacitive Array in SAR ADC Using Multiple Weighting Methods

    Page(s): 1277 - 1287
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4823 KB) |  | HTML iconHTML  

    The overall accuracy and linearity of a matching-limited successive-approximation-register analog-to-digital converter are primarily determined by its digital-to-analog converter's (DAC's) matching characteristics. As the resolution of the DAC increases, it is harder to achieve accurate capacitance ratios in the layout, which are affected by systematic and random mismatches. An ideal placement for the DAC array should try to minimize the systematic mismatches, followed by the random mismatch. This paper proposes a placement strategy, which incorporates a matrix-adjustment method for the DAC, and different placement techniques and weighting methods for the placements of active and dummy unit capacitors. The resulting placement addresses both systematic and random mismatches. We consider the following four systematic mismatches such as the first-order process gradients, the second-order lithographic errors, the proximity effects, the wiring complexity, and the asymmetrical fringing parasitics. The experimental results show that the placement strategy achieves smaller capacitance ratio mismatch and shorter computational runtime than those of existing works. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 19. Analytical Thermal Model for Self-Heating in Advanced FinFET Devices With Implications for Design and Reliability

    Page(s): 1045 - 1058
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (11291 KB) |  | HTML iconHTML  

    A rigorous analytical thermal model has been formulated for the analysis of self-heating effects in FinFETs, under both steady-state and transient stress conditions. 3-D self-consistent electrothermal simulations, tuned with experimentally measured electrical characteristics, were used to understand the nature of self-heating in FinFETs and calibrate the proposed model. The accuracy of the model has been demonstrated for a wide range of multifin devices by comparing it against finite element simulations. The model has been applied to carry out a detailed sensitivity analysis of self-heating with respect to various FinFET parameters and structures, which are critical for improving circuit performance and electrical overstress/electrostatic discharge (ESD) reliability. The transient model has been used to estimate the thermal time constants of these devices and predict the sensitivity of power-to-failure to various device parameters, for both long and short pulse ESD situations. Suitable modifications to the model are also proposed for evaluating the thermal characteristics of production level FinFET (or Tri-gate FET) structures involving metal-gates, body-tied bulk FinFETs, and trench contacts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 20. Error Detection and Recovery for ECC: A New Approach Against Side-Channel Attacks

    Page(s): 627 - 637
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (481 KB) |  | HTML iconHTML  

    Side channel attacks allow an attacker to retrieve secret keys with far less effort than other attacks. Countermeasures against these attacks should be considered during cryptosystem design. This paper presents a novel low-cost error detection and recovery scheme (LOEDAR) to counter fault attacks. The proposed architecture retains the efficiency of the Montgomery ladder algorithm and shows strong resistance to both environmental-induced faults as well as attacker-introduced faults. Moreover, the proposed LOEDAR scheme is compatible with most existing countermeasures against various power analysis attacks including differential power analysis and its variants, which makes it extendable to a comprehensive countermeasure against both fault attacks and power analysis attacks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 21. Optimizing the Power Delivery Network in a Smartphone Platform

    Page(s): 36 - 49
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (16483 KB) |  | HTML iconHTML  

    Smartphones consume a significant amount of power. Indeed, they can hardly provide a full day of use between charging operations even with a 2000 mAh battery. While power minimization and dynamic power management techniques have been heavily explored to improve the power efficiency of modules (processors, memory, display, GPS, etc.) inside a smartphone platform, there is one critical factor that is often overlooked: the power conversion efficiency of the power delivery network (PDN). This paper focuses on dc-dc converters, which play a pivotal role in the PDN of the smartphone platform. Starting from detailed models of the dc-dc converter designs, two optimization methods are presented: 1) static switch sizing to maximize the efficiency of a dc-dc converter under statistical loading profiles and 2) dynamic switch modulation to achieve the high efficiency enhancement under dynamically varying load conditions. To verify the efficacy of the optimization methods in actual smartphone platforms, this paper also presents a characterization procedure for the PDN. The procedure is as follows: 1) group the modules in the smartphone platform together and use profiling to estimate their average and peak power consumption levels and 2) build an equivalent dc-dc converter model for the power delivery path from the battery source to each group of modules and use linear regression to estimate the conversion efficiency of the corresponding equivalent converter. Experimental results demonstrate that the static switch sizing can achieve 6% power conversion efficiency enhancement, which translates to 19% reduction in power loss general usage of the smartphone. The dynamic switch modulation accomplishes similar improvement at the same condition, while also achieving high efficiency enhancement in various load conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 22. Built-In Self-Test, Diagnosis, and Repair of MultiMode Power Switches

    Page(s): 1231 - 1244
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1445 KB) |  | HTML iconHTML  

    Recently proposed power-gating structures for intermediate power-off modes offer significant power saving benefits as they reduce the leakage power during short periods of inactivity. Even though they are very effective for reducing static power consumption, their reliable operation can be compromised by process variations and manufacturing defects. In this paper, we propose a signature analysis technique to efficiently test power-gating structures that provide intermediate power-off modes. Based on this technique, a methodology to repair catastrophic and parametric faults, and to tolerate process variations is presented. For testing and repairing multimode power switches, we propose a robust built-in self-test and built-in self-repair scheme that reduces test cost and obviates additional manufacturing steps for post-silicon repair. Simulation results highlight the low-cost and effectiveness of the proposed method for detecting, diagnosing, and repairing defects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 23. Carbon Nanotube Robust Digital VLSI

    Page(s): 453 - 471
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8262 KB) |  | HTML iconHTML  

    Carbon nanotube field-effect transistors (CNFETs) are excellent candidates for building highly energy-efficient electronic systems of the future. Fundamental limitations inherent to carbon nanotubes (CNTs) pose major obstacles to the realization of robust CNFET digital very large-scale integration (VLSI): 1) it is nearly impossible to guarantee perfect alignment and positioning of all CNTs despite near-perfect CNT alignment achieved in recent years; 2) CNTs can be metallic or semiconducting depending on chirality; and 3) CNFET circuits can suffer from large performance variations, reduced yield, and increased susceptibility to noise. Today's CNT process improvements alone are inadequate to overcome these challenges. This paper presents an overview of: 1) imperfections and variations inherent to CNTs; 2) design and processing techniques, together with a probabilistic analysis framework, for robust CNFET digital VLSI circuits immune to inherent CNT imperfections and variations; and 3) recent experimental demonstration of CNFET digital circuits that are immune to CNT imperfections. Significant advances in design tools can enable robust and scalable CNFET circuits that overcome the challenges of the CNFET technology while retaining its energy-efficiency benefits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 24. Security Vulnerabilities of Emerging Nonvolatile Main Memories and Countermeasures

    Page(s): 2 - 15
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2895 KB)  

    Emerging nonvolatile memory devices such as phase change memories and memristors are replacing SRAM and DRAM. However, nonvolatile main memories (NVMM) are susceptible to probing attacks even when powered down. This way, they may compromise sensitive data such as passwords and keys that reside in the NVMM. To eliminate this vulnerability, we propose sneak-path encryption (SPE), a hardware intrinsic encryption technique for memristor-based NVMMs. SPE is instruction set architecture independent and has minimal impact on performance. SPE exploits the physical parameters, such as sneak-paths in crossbar memories, to encrypt the data stored in a memristor-based NVMM. SPE is resilient to a number of attacks that may be performed on NVMMs. We use a cycle accurate simulator to evaluate the performance impact of SPE-based NVMM and compare against other security techniques. SPE can secure an NVMM with a $sim 1.3$ % performance overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 25. Post-Layout Simulation Time Reduction for Phase-Locked Loop Frequency Synthesizer Using System Identification Techniques

    Page(s): 1751 - 1755
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1129 KB) |  | HTML iconHTML  

    Compact model extraction of phase-locked loop (PLL) frequency synthesizer using system identification techniques is proposed to reduce post-layout simulation time. This is the first published compact model for PLL using system identification techniques. It features an autoregressive exogenous model for the charge pump and the loop filter with a lookup table for nonlinearity compensation and a radial basis function neural network for the voltage-controlled oscillator with nonlinear frequency-voltage relationship, thereby reducing the post-layout simulation time to 26% of the original circuits with the accuracy of 93%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 26. Optimizing the NoC Slack Through Voltage and Frequency Scaling in Hard Real-Time Embedded Systems

    Page(s): 1632 - 1643
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2107 KB) |  | HTML iconHTML  

    Hard real-time embedded systems impose a strict latency requirement on interconnection subsystems. In the case of network-on-chip (NoC), this means each packet of a traffic stream has to be delivered within a time interval. In addition, with the increasing complexity of NoC, it consumes a significant portion of total chip power, which boosts the power footprint of such chips. In this paper, we propose a methodology to minimize the energy consumption of NoC without violating the prespecified latency deadlines of real-time applications. First, we develop a formal approach based on network calculus to obtain the worst-case delay bound of all packets, from which we derive a safe estimate of the number of cycles that a packet can be further delayed in the network without violating its deadline-the worst-case slack. With this information, we then develop an optimization algorithm that trades the slacks for lower NoC energy. Our algorithm recognizes the distribution of slacks for different traffic streams, and assigns different voltages and frequencies to different routers to achieve NoC energy-efficiency, while meeting the deadlines for all packets. Furthermore, we design a feedback-control strategy to enable dynamic frequency and voltage scaling on the network routers in conjunction with the energy optimization algorithm. It can flexibly improve the energy-efficiency of the overall network in response to sporadic traffic patterns at runtime. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 27. Architecture and Control Algorithms for Combating Partial Shading in Photovoltaic Systems

    Page(s): 917 - 930
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (16584 KB)  

    Partial shading is a serious obstacle to the effective utilization of photovoltaic (PV) systems since it can result in a significant degradation in the PV system output power. A PV system is organized as a series connection of PV modules, each module comprising a number of series-parallel connected PV cells. Backup PV cell employment and PV module reconfiguration techniques have been proposed to improve the performance of the PV system under the partial shading effects. However, these approaches are not very effective since they are costly in terms of their PV cell count and/or cell connectivity requirements. In contrast, this paper presents a cost-effective, reconfigurable PV module architecture with integrated switches in each PV cell. This paper also presents a dynamic programming algorithm to adaptively produce near-optimal reconfigurations of each PV module so as to maximize the PV system output power under any partial shading pattern. We implement a working prototype of reconfigurable PV module with 16 PV cells and confirm 45.2% output power level improvement. Using accurate PV cell models extracted from prototype measurement, we have demonstrated up to a factor of 2.36X output power improvement of a large-scale PV system comprised of three PV modules with 60 PV cells per module. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 28. Algorithms for Gate Sizing and Device Parameter Selection for High-Performance Designs

    Page(s): 1558 - 1571
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2960 KB) |  | HTML iconHTML  

    It is becoming increasingly important to design high-performance circuits with as low power as possible. In this paper, we study the gate sizing and device parameter selection problem for today's industrial designs. We first outline the typical practical problems that make it difficult to use traditional algorithms on high-performance industrial designs. Then, we propose a Lagrangian relaxation-based formulation that decouples timing analysis from optimization without a resulting loss in accuracy. We also propose a graph model that accurately captures discrete cell-type characteristics based on library data. We model the relaxed Lagrangian subproblem as a graph problem and propose algorithms to solve it. In our experiments, we demonstrate the importance of using the signoff timing engine to guide the optimization. We also show the benefit of the graph model we propose to solve the discrete optimization problem. Compared to a state-of-the art industrial optimization flow, we show that our algorithms can obtain up to 38% leakage power reductions and better overall timing for real high-performance microprocessor blocks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 29. Error-Correcting Code Aware Memory Subsystem

    Page(s): 1706 - 1717
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1450 KB) |  | HTML iconHTML  

    An error-correcting code (ECC) immune to bit errors has been widely used in reliable computer systems. However, ECC techniques can make memory performance severely degraded since incomplete-word write requests lead to inefficient read-to-write (RTW) and write-to-read operations of synchronous dynamic random access memory. In this paper, we propose a memory subsystem efficient for ECC operations. Our key idea is that the RTW operations causing incomplete-word write requests are split and grouped into independent read and write operations, and then the grouped read and write operations are individually scheduled for the optimal memory performance under application constraints. Experimental results show that the proposed ECC-aware memory subsystem achieves 17% shorter memory latency, and 12% higher memory utilization, on average, than the latest conventional memory subsystems on industrial multimedia applications. Moreover, the ECC-aware memory subsystem improves up to 2.5 times higher memory performance on synthetic benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 30. Exploration and Customization of FPGA-Based Soft Processors

    Page(s): 266 - 277
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1000 KB) |  | HTML iconHTML  

    As embedded systems designers increasingly use field-programmable gate arrays (FPGAs) while pursuing single-chip designs, they are motivated to have their designs also include soft processors, processors built using FPGA programmable logic. In this paper, we provide: 1) an exploration of the microarchitectural tradeoffs for soft processors and 2) a set of customization techniques that capitalizes on these tradeoffs to improve the efficiency of soft processors for specific applications. Using our infrastructure for automatically generating soft-processor implementations (which span a large area/speed design space while remaining competitive with Altera's Nios II variations), we quantify tradeoffs within soft-processor microarchitecture and explore the impact of tuning the microarchitecture to the application. In addition, we apply a technique of subsetting the instruction set to use only the portion utilized by the application. Through these two techniques, we can improve the performance-per-area of a soft processor for a specific application by an average of 25% View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 31. Online Energy-Efficient Task-Graph Scheduling for Multicore Platforms

    Page(s): 1194 - 1207
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2537 KB) |  | HTML iconHTML  

    Numerous directed acyclic graph (DAG) schedulers have been developed to improve the energy efficiency of various multicore platforms. However, these schedulers make a priori assumptions about the relationship between the task dependencies, and they are unable to adapt online to the characteristics of each application without offline profiling data. Therefore, we propose a novel energy-efficient online scheduling solution for the general DAG model to address the two aforementioned problems. Our proposed scheduler is able to adapt at run-time to the characteristics of each application by making smart foresighted decisions, which take into account the impact of current scheduling decisions on the present and future deadline miss rates and energy efficiency. Moreover, our scheduler is able to efficiently handle execution with very limited resources by avoiding scheduling tasks that are expected to miss their deadlines and do not have an impact on future deadlines. We validate our approach against state-of-the-art solutions. In our first set of experiments, our results with the H.264 video decoder demonstrate that the proposed low-complexity solution for the general DAG model reduces the energy consumption by up to 15% compared to an existing sophisticated and complex scheduler that was specifically built for the H.264 video decoder application. In our second set of experiments, our results with different configurations of synthetic DAGs demonstrate that our proposed solution is able to reduce the energy consumption by up to 55% and the deadline miss rates by up to 99% compared to a second existing scheduling solution. Finally, we show that our DAG flow manager and scheduler have low complexities on a real mobile platform and we show that our solution is resilient to workload prediction errors by using different estimator accuracies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 32. PS3-RAM: A Fast Portable and Scalable Statistical STT-RAM Reliability/Energy Analysis Method

    Page(s): 1644 - 1656
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2233 KB) |  | HTML iconHTML  

    The development of emerging spin-transfer torque random access memory (STT-RAM) is facing two major technical challenges-poor write reliability and high write energy, both of which are severely impacted by process variations and thermal fluctuations. The evaluations on STT-RAM design metrics and robustness often require a hybrid simulation flow, i.e., modeling the CMOS and magnetic devices with SPICE and macro-magnetic models, respectively. Very often, such a hybrid simulation flow involves expensive Monte Carlo simulations when the design and behavioral variabilities of STT-RAM are taken into account. In this paper, we propose a fast and scalable semi-analytical method-PS3-RAM, enabling efficient statistical simulations in STT-RAM designs. By eliminating the costly macro-magnetic and SPICE simulations, PS3-RAM achieves more than 100(000boldsymbol {times }) runtime speedup with excellent agreement with the result of conventional simulation method. PS3-RAM can also accurately estimate the STT-RAM write error rate and write energy distributions at both magnetic tunneling junction switching directions under different temperatures, demonstrating great potential in the analysis of STT-RAM reliability and write energy at the early design stage of memory or micro-architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 33. An Algorithm for Synthesis of Reversible Logic Circuits

    Page(s): 2317 - 2330
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (537 KB)  

    Reversible logic finds many applications, especially in the area of quantum computing. A completely specified n-input, n-output Boolean function is called reversible if it maps each input assignment to a unique output assignment and vice versa. Logic synthesis for reversible functions differs substantially from traditional logic synthesis and is currently an active area of research. The authors present an algorithm and tool for the synthesis of reversible functions. The algorithm uses the positive-polarity Reed-Muller expansion of a reversible function to synthesize the function as a network of Toffoli gates. At each stage, candidate factors, which represent subexpressions common between the Reed-Muller expansions of multiple outputs, are explored in the order of their attractiveness. The algorithm utilizes a priority-based search tree, and heuristics are used to rapidly prune the search space. The synthesis algorithm currently targets the generalized n-bit Toffoli gate library. However, other algorithms exist that can convert an n-bit Toffoli gate into a cascade of smaller Toffoli gates. Experimental results indicate that the authors' algorithm quickly synthesizes circuits when tested on the set of all reversible functions of three variables. Furthermore, it is able to quickly synthesize all four-variable and most five-variable reversible functions that were in the test suite. The authors also present results for some benchmark functions widely discussed in literature and some new benchmarks that the authors have developed. The algorithm is shown to synthesize many, but not all, randomly generated reversible functions of as many as 16 variables with a maximum gate count of 25 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 34. Time-domain non-Monte Carlo noise simulation for nonlinear dynamic circuits with arbitrary excitations

    Page(s): 493 - 505
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1348 KB)  

    A time-domain, non-Monte Carlo method for computer simulation of electrical noise in nonlinear dynamic circuits with arbitrary excitations and arbitrary large-signal waveforms is presented. This time-domain noise simulation method is based on results from the theory of stochastic differential equations. The noise simulation method is general in the following sense. Any nonlinear dynamic circuit with any kind of excitation, which can be simulated by the transient analysis routine in a circuit simulator, can be simulated by our noise simulator in time-domain to produce the noise variances and covariances of circuit variables as a function of time, provided that noise models for the devices in the circuit are available. Noise correlations between circuit variables at different time points can also be calculated. Previous work on computer simulation of noise in electronic circuits is reviewed with comparisons to our method. Shot, thermal, and flicker noise models for integrated-circuit devices, in the context of our time-domain noise simulation method, are discussed. The implementation of this noise simulation method in a circuit simulator (SPICE) is described. Two examples of noise simulation (a CMOS inverter and a BJT active mixer) are given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 35. Statistical Timing Analysis: From Basic Principles to State of the Art

    Page(s): 589 - 607
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (622 KB) |  | HTML iconHTML  

    Static-timing analysis (STA) has been one of the most pervasive and successful analysis engines in the design of digital circuits for the last 20 years. However, in recent years, the increased loss of predictability in semiconductor devices has raised concern over the ability of STA to effectively model statistical variations. This has resulted in extensive research in the so-called statistical STA (SSTA), which marks a significant departure from the traditional STA framework. In this paper, we review the recent developments in SSTA. We first discuss its underlying models and assumptions, then survey the major approaches, and close by discussing its remaining key challenges. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 36. VHDL-AMS and Verilog-AMS as alternative hardware description languages for efficient modeling of multidiscipline systems

    Page(s): 204 - 225
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2600 KB) |  | HTML iconHTML  

    This paper focuses on commonalities and differences between the two mixed-signal hardware description languages, VHDL-AMS and Verilog-AMS, in the case of modeling heterogeneous or multidiscipline systems. The paper has two objectives. The first one is modeling the structure and the behavior of an airbag system using both the VHDL-AMS and the Verilog-AMS languages. Such a system encompasses several time abstractions (i.e., discrete-time and continuous-time), several disciplines, or energy domains (i.e., electrical, thermal, optical, mechanical, and chemical), and several continuous-time description formalisms (i.e., conservative-law and signal-flow descriptions). The second objective is to discuss the results of the proposed modeling process in terms of the descriptive capabilities of the VHDL-AMS and Verilog-AMS languages and of the generated simulation results. The tools used are the Advance-MS from Mentor Graphics for VHDL-AMS and the AMS Simulator from Cadence Design Systems for Verilog-AMS. This paper shows that both languages offer effective means to describe and simulate multidiscipline systems, though using different descriptive approaches. It also highlights current tool limitations, since full language definitions are not yet supported. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 37. Macromodeling of the Memristor in SPICE

    Page(s): 632 - 636
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (398 KB) |  | HTML iconHTML  

    In this paper, we present a new simulation program with integrated circuit emphasis macromodel of the recently physically implemented memristor. This macromodel could be a powerful tool for electrical engineers to design and experiment new circuits with memristors. Our simulation results show similar behavior to the already published measurements of the physical implementation. Our approach provides a solution for the modeling of boundary conditions following exactly the published mathematical model of HP Labs. The functionality of our macromodel is demonstrated with computer simulations. The source code of our macromodel is provided. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 38. Energy-Efficient Datacenters

    Page(s): 1465 - 1484
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1507 KB) |  | HTML iconHTML  

    Pervasive use of cloud computing and the resulting rise in the number of datacenters and hosting centers (that provide platform or software services to clients who do not have the means to set up and operate their own computing facilities) have brought forth many concerns, including the electrical energy cost, peak power dissipation, cooling, and carbon emission. With power consumption becoming an increasingly important issue for the operation and maintenance of the hosting centers, corporate and business owners are becoming increasingly concerned. Furthermore, provisioning resources in a cost-optimal manner so as to meet different performance criteria, such as throughput or response time, has become a critical challenge. The goal of this paper is to provide an introduction to resource provisioning and power or thermal management problems in datacenters, and to review strategies that maximize the datacenter energy efficiency subject to peak or total power consumption and thermal constraints, while meeting stipulated service level agreements in terms of task throughput and/or response time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 39. A Blind Dynamic Fingerprinting Technique for Sequential Circuit Intellectual Property Protection

    Page(s): 76 - 89
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (14873 KB) |  | HTML iconHTML  

    Design fingerprinting is a means to trace the illegally redistributed intellectual property (IP) by creating a unique IP instance with a different signature for each user. Existing fingerprinting techniques for hardware IP protection focus on lowering the design effort to create a large number of different IP instances without paying much attention on the ease of fingerprint detection upon IP integration. This paper presents the first dynamic fingerprinting technique on sequential circuit IPs to enable both the owner and legal buyers of an IP embedded in a chip to be readily identified in the field. The proposed fingerprint is an oblivious ownership watermark independently endorsed by each user through a blind signature protocol. Thus, the authorship can also be proved through the detection of different user's fingerprints without the need to separately embed an identical IP owner's signature in all fingerprinted instances. The proposed technique is applicable to both application-specific integrated circuit and field-programmable gate array IPs. Our analyses show that the fingerprint is immune to collusion attack and can withstand all perceivable attacks, with a lower probability of removal than state-of-the-art FSM watermarking schemes. The probability of coincidence of a 32-bit fingerprint is in the order of 10-10 and up to 1035 32-bit fingerprinted instances can be generated for a small design of 100 flip-flops. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 40. Testing of Flow-Based Microfluidic Biochips: Fault Modeling, Test Generation, and Experimental Demonstration

    Page(s): 1463 - 1475
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3735 KB) |  | HTML iconHTML  

    Recent advances in flow-based microfluidics have led to the emergence of biochemistry-on-a-chip as a new paradigm in clinical diagnostics and biomolecular recognition. However, a potential roadblock in the deployment of microfluidic biochips is the lack of test techniques to screen defective devices before they are used for biochemical analysis. Defective chips lead to repetition of experiments, which is undesirable due to high reagent cost and limited availability of samples. Prior work on fault detection in biochips has been limited to digital (“droplet”) microfluidics and other electrode-based technology platforms. The paper proposes the first approach for automated testing of flow-based microfluidic biochips that are designed using membrane-based valves for flow control. The proposed test technique is based on a behavioral abstraction of physical defects in microchannels and valves. The flow paths and flow control in the microfluidic device are modeled as a logic circuit composed of Boolean gates, which allows test generation to be carried out using standard automatic test pattern generation tools. The tests derived using the logic circuit model are then mapped to fluidic operations involving pumps and pressure sensors in the biochip. Feedback from pressure sensors can be compared to expected responses based on the logic circuit model, whereby the types and positions of defects are identified. We show how a fabricated biochip can be tested using the proposed method, and demonstrate experimental results for two additional fabricated chips. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 41. Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS

    Page(s): 1859 - 1880
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (904 KB) |  | HTML iconHTML  

    In this paper, we have analyzed and modeled failure probabilities (access-time failure, read/write failure, and hold failure) of synchronous random-access memory (SRAM) cells due to process-parameter variations. A method to predict the yield of a memory chip based on the cell-failure probability is proposed. A methodology to statistically design the SRAM cell and the memory organization is proposed using the failure-probability and the yield-prediction models. The developed design strategy statistically sizes different transistors of the SRAM cell and optimizes the number of redundant columns to be used in the SRAM array, to minimize the failure probability of a memory chip under area and leakage constraints. The developed method can be used in an early stage of a design cycle to enhance memory yield in nanometer regime. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 42. Yield-Aware Pareto Front Extraction for Discrete Hierarchical Optimization of Analog Circuits

    Page(s): 1437 - 1449
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3177 KB) |  | HTML iconHTML  

    This paper presents an efficient method for extracting a yield-aware Pareto front between two competing metrics of an analog circuit block, with the purpose of performing hierarchical, system-level optimization using the component-level Pareto fronts as meta-models. The proposed method consists of three steps: finding a set of Pareto-optimal design points by tracing them on a discrete grid, estimating the yield distribution of each optimal design point using a control-variate technique, and constructing a yield-aware Pareto front by interpolation. The proposed algorithm is demonstrated on a problem of finding the optimal power allocation among the components composing a clock recovery path to minimize the final clock jitter. The algorithm can estimate the Pareto front of each circuit block within a 2% error, expressing the minimum achievable jitter with 99% yield for different power budgets, while requiring only 600 ~ 1100 Monte-Carlo simulation samples in total. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 43. A Reliable Routing Architecture and Algorithm for NoCs

    Page(s): 726 - 739
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9475 KB) |  | HTML iconHTML  

    Aggressive transistor scaling continues to drive increasingly complex digital designs. The large number of transistors available today enables the development of chip multiprocessors that include many cores on one die communicating through an on-chip interconnect. As the number of cores increases, scalable communication platforms, such as networks-on-chip (NoCs), have become more popular. However, as the sole communication medium, these interconnects are a single point of failure so that any permanent fault in the NoC can cause the entire system to fail. Compounding the problem, transistors have become increasingly susceptible to wear-out related failures as their critical dimensions shrink. As a result, the on-chip network has become a critically exposed unit that must be protected. To this end, we present Vicis, a fault-tolerant architecture and companion routing protocol that is robust to a large number of permanent failures, allowing communication to continue in the face of permanent transistor failures. Vicis makes use of a two-level approach. First, it attempts to work around errors within a router by leveraging reconfigurable architectural components. Second, when faults within a router disable a link's connectivity, or even an entire router, Vicis reroutes around the faulty node or link with a novel, distributed routing algorithm for meshes and tori. Tolerating permanent faults in both the router components and the reliability hardware itself, Vicis enables graceful performance degradation of networks-on-chip. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 44. Analysis of on-chip inductance effects for distributed RLC interconnects

    Page(s): 904 - 915
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (404 KB) |  | HTML iconHTML  

    This paper introduces an accurate analysis of on-chip inductance effects for distributed RLC interconnects that takes the effect of both the series resistance and the output parasitic capacitance of the driver into account. Using rigorous first principle calculations, accurate expressions for the transfer function of these lines and their time-domain response have been presented for the first time. Using these, a new and computationally efficient performance optimization techniques for distributed RLC interconnects has been introduced. The new optimization technique has been employed to analyze the impact of line inductance on the circuit behavior and to illustrate the implications of technology scaling on wire inductance. It is shown that reduction in driver output resistance and input capacitance with scaling can make deep submicron designs increasingly susceptible to inductance effects if global interconnects are not scaled. For scaled global interconnects with increasing line resistance per unit length, as prescribed by the International Technology Roadmap for Semiconductors, the effect of inductance on interconnect performance actually diminishes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 45. Effective Post-Silicon Validation of System-on-Chips Using Quick Error Detection

    Page(s): 1573 - 1590
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3791 KB) |  | HTML iconHTML  

    This paper presents the Quick Error Detection (QED) technique for systematically creating families of post-silicon validation tests that quickly detect bugs inside processor cores and uncore components (cache controllers, memory controllers, and on-chip interconnection networks) of multicore system on chips (SoCs). Such quick detection is essential because long error detection latency, the time elapsed between the occurrence of an error due to a bug and its manifestation as an observable failure, severely limits the effectiveness of traditional post-silicon validation approaches. QED can be implemented completely in software, without any hardware modification. Hence, it is readily applicable to existing designs. Results using multiple hardware platforms, including the Intel® Core™ i7 SoC, and a state-of-the-art commercial multicore SoC, along with simulation results using an OpenSPARC T2-like multicore SoC with bug scenarios from commercial multicore SoCs demonstrate: 1) error detection latencies of post-silicon validation tests can be very long, up to billions of clock cycles, especially for bugs inside uncore components; 2) QED shortens error detection latencies by up to nine orders of magnitude to only a few hundred cycles for most bug scenarios; and 3) QED enables up to a fourfold increase in bug coverage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 46. Deterministic Synthesis of Hybrid Application-Specific Network-on-Chip Topologies

    Page(s): 1503 - 1516
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2425 KB) |  | HTML iconHTML  

    Networks-on-Chip (NoCs) enable cost-efficient and effective communication between the processing elements inside modern systems-on-chip (SoCs). NoCs with regular topologies such as meshes, tori, rings, and trees are well suited for general-purpose many core SoCs. These topologies might prove suboptimal for SoCs with predefined application characteristics and traffic patterns. Such SoCs benefit from application-specific NoC topologies, designed and optimized according to the application characteristics. This paper proposes a synthesis approach for creating hybrid, application-specific NoCs from an input floorplan and a set of use cases, describing the applications running on the SoC. The method considers latency, port count, and link length constraints. It produces hybrid topologies that utilize both NoC routers and shared buses. Furthermore, the proposed approach can insert intermediate relay routers that act as bridges or repeaters and help to reduce the cost further. Finally, the approach creates a deadlock-free routing of the communication flows by either finding deadlock-free paths or by inserting virtual channels. The benefits of the proposed method are demonstrated by comparing it to state-of-the-art approaches on a generic and an industrial SoC examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 47. Silicon Effect-Aware Full-Chip Extraction and Mitigation of TSV-to-TSV Coupling

    Page(s): 1900 - 1913
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3378 KB) |  | HTML iconHTML  

    This paper presents a silicon effect-aware multiTSV model. Through-silicon-via (TSV) depletion region, silicon substrate discharging path and electrical field distribution around TSV neighbor are modeled and studied in full-chip design. Verification with field solver and full-chip TSV-to-TSV coupling analysis in both the worst case and the average case show this model is accurate and efficient. It is found that 3-D nets receive more noise than their 2-D counterparts due to TSV-to-TSV coupling. To alleviate this coupling noise on TSV nets, two new optimization methods are investigated. One way is to utilize guard rings around the victim TSV so as to form a stronger discharging path, an alternative approach is to adopt differential signal transmission to improve noise immunity. These techniques have been implemented on 3-D IC designs with TSVs placed regularly or irregularly. Full-chip analysis results show that our approaches are effective in noise reduction with small area overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 48. A digital design flow for secure integrated circuits

    Page(s): 1197 - 1208
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB) |  | HTML iconHTML  

    Small embedded integrated circuits (ICs) such as smart cards are vulnerable to the so-called side-channel attacks (SCAs). The attacker can gain information by monitoring the power consumption, execution time, electromagnetic radiation, and other information leaked by the switching behavior of digital complementary metal-oxide-semiconductor (CMOS) gates. This paper presents a digital very large scale integrated (VLSI) design flow to create secure power-analysis-attack-resistant ICs. The design flow starts from a normal design in a hardware description language such as very-high-speed integrated circuit (VHSIC) hardware description language (VHDL) or Verilog and provides a direct path to an SCA-resistant layout. Instead of a full custom layout or an iterative design process with extensive simulations, a few key modifications are incorporated in a regular synchronous CMOS standard cell design flow. The basis for power analysis attack resistance is discussed. This paper describes how to adjust the library databases such that the regular single-ended static CMOS standard cells implement a dynamic and differential logic style and such that 20 000+ differential nets can be routed in parallel. This paper also explains how to modify the constraints and rules files for the synthesis, place, and differential route procedures. Measurement-based experimental results have demonstrated that the secure digital design flow is a functional technique to thwart side-channel power analysis. It successfully protects a prototype Advanced Encryption Standard (AES) IC fabricated in an 0.18-mum CMOS View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 49. Microfluidics-Based Biochips: Technology Issues, Implementation Platforms, and Design-Automation Challenges

    Page(s): 211 - 223
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (776 KB) |  | HTML iconHTML  

    Microfluidics-based biochips are soon expected to revolutionize clinical diagnosis, deoxyribonucleic acid (DNA) sequencing, and other laboratory procedures involving molecular biology. In contrast to continuous-flow systems that rely on permanently etched microchannels, micropumps, and microvalves, digital microfluidics offers a scalable system architecture and dynamic reconfigurability; groups of unit cells in a microfluidics array can be reconfigured to change their functionality during the concurrent execution of a set of bioassays. As more bioassays are executed concurrently on a biochip, system integration and design complexity are expected to increase dramatically. This paper presents an overview of an integrated system-level design methodology that attempts to address key issues in the synthesis, testing and reconfiguration of digital microfluidics-based biochips. Different actuation mechanisms for microfluidics-based biochips, and associated design-automation trends and challenges are also discussed. The proposed top-down design-automation approach is expected to relieve biochip users from the burden of manual optimization of bioassays, time-consuming hardware design, and costly testing and maintenance procedures, and it will facilitate the integration of fluidic components with a microelectronic component in next-generation systems-on-chips (SOCs). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 50. Efficient and Concurrent Reliable Realization of the Secure Cryptographic SHA-3 Algorithm

    Page(s): 1105 - 1109
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1716 KB) |  | HTML iconHTML  

    The secure hash algorithm (SHA)-3 has been selected in 2012 and will be used to provide security to any application which requires hashing, pseudo-random number generation, and integrity checking. This algorithm has been selected based on various benchmarks such as security, performance, and complexity. In this paper, in order to provide reliable architectures for this algorithm, an efficient concurrent error detection scheme for the selected SHA-3 algorithm, i.e., Keccak, is proposed. To the best of our knowledge, effective countermeasures for potential reliability issues in the hardware implementations of this algorithm have not been presented to date. In proposing the error detection approach, our aim is to have acceptable complexity and performance overheads while maintaining high error coverage. In this regard, we present a low-complexity recomputing with rotated operands-based scheme which is a step-forward toward reducing the hardware overhead of the proposed error detection approach. Moreover, we perform injection-based fault simulations and show that the error coverage of close to 100% is derived. Furthermore, we have designed the proposed scheme and through ASIC analysis, it is shown that acceptable complexity and performance overheads are reached. By utilizing the proposed high-performance concurrent error detection scheme, more reliable and robust hardware implementations for the newly-standardized SHA-3 are realized. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The purpose of this Transactions is to publish papers of interest to individuals in the areas of computer-aided design of integrated circuits and systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

VIJAYKRISHNAN NARAYANAN
Pennsylvania State University
Dept. of Computer Science. and Engineering
354D IST Building
University Park, PA 16802, USA
vijay@cse.psu.edu