By Topic

Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

Popular Articles (December 2014)

Includes the top 50 most frequently downloaded documents for this publication according to the most recent monthly usage statistics.
  • 1. Native Simulation of MPSoC Using Hardware-Assisted Virtualization

    Page(s): 1074 - 1087
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8983 KB) |  | HTML iconHTML  

    Integration of multiple heterogeneous processors into a single system-on-a-chip is a clear trend in embedded devices. Designing and verifying these devices requires high-speed and easy-to-build simulation platforms. Among the software simulation approaches, native simulation is a good candidate since the embedded software is executed natively on the host machine, and no instruction set simulator development effort is necessary. However, existing native simulation approaches are such that the simulated software shares the memory space of the modeled hardware modules and the host operating system, making impractical the support of legacy code running on the target platform. To overcome this issue seldom mentioned in the literature, we propose the addition of a transparent address space translation layer to separate the target address space from the host simulator one. For this, we exploit the hardware-assisted virtualization technology now available on most general-purpose processors. Experiments show that this solution does not degrade the native simulation speed, while keeping the ability to accomplish software performance evaluation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2. Low-Power Digital Signal Processing Using Approximate Adders

    Page(s): 124 - 137
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (12003 KB) |  | HTML iconHTML  

    Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage overscaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units and evaluate them to demonstrate the efficacy of our approach. We also derive simple mathematical models for error and power consumption of these approximate adders. Furthermore, we demonstrate the utility of these approximate adders in two digital signal processing architectures (discrete cosine transform and finite impulse response filter) with specific quality constraints. Simulation results indicate up to 69% power savings using the proposed approximate adders, when compared to existing implementations using accurate adders. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3. A formal approach to the scheduling problem in high level synthesis

    Page(s): 464 - 475
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (980 KB)  

    An integer linear programming (ILP) model for the scheduling problem in high-level synthesis is presented. In addition to time-constrained scheduling and resource-constrained scheduling, a scheduling problem called feasible scheduling, which provides a paradigm for exploring the solution space, is constructed. Extensive consideration is given to the following applications: scheduling with chaining, multicycle operations by nonpipelined function units, and multicycle operations by pipelined function units; functional pipelining; loop folding; mutually exclusive operations; scheduling under bus constraint; and minimizing lifetimes of variables. The complexity of the number of variables in the formulation is O( s×n) where s and n are the number of control steps and operations, respectively. Since the as soon as possible (ASAP), as late as possible (ALAP), and list scheduling techniques are used to reduce the solution space, the formulation becomes very efficient. A solution to a practical problem, such as the fifth-order filter, can be found optimally in a few seconds View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 4. Design Framework to Overcome Aging Degradation of the 16 nm VLSI Technology Circuits

    Page(s): 691 - 703
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (20219 KB) |  | HTML iconHTML  

    Intensive scaling for VLSI circuits is a key factor for gaining outstanding performance. However, this scaling has huge negative impact on the circuit reliability, as it increases the undesired effect of aging degradation on ultradeep submicrometer technologies. Nowadays, Bias Temperature Instability (BTI) aging process has a major negative impact on VLSI circuits reliability. This paper presents a comprehensive framework that assists in designing the fortified VLSI circuits against BTI aging degradation. The framework contains: 1) the novel circuit level techniques that eliminate the effect of BTI (these techniques successfully decrease the power dissipation by 36% and enhance the reliability of VLSI circuits); 2) the evaluation of the reliability of all circuit level techniques used to eliminate BTI aging degradation for 16 nm CMOS technology; and 3) the comparison between the efficiency of all circuit level techniques in terms of power consumption and area. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 5. Application-Specific Wear Leveling for Extending Lifetime of Phase Change Memory in Embedded Systems

    Page(s): 1450 - 1462
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2190 KB) |  | HTML iconHTML  

    Phase change memory (PCM) has been proposed to replace NOR flash and DRAM in embedded systems because of its attractive features. However, the endurance of PCM greatly limits its adoption in embedded systems. As most embedded systems are application-oriented, we can tackle the endurance problem of PCM by exploring application-specific features such as fixed access patterns and update frequencies. In this paper, we propose an application-specific wear leveling technique, called Curling-PCM, to evenly distribute write activities across the whole PCM chip to improve the endurance of PCM in embedded systems. The basic idea is to exploit application-specific features in embedded systems and periodically move the hot region across the whole PCM chip. To reduce the overhead of moving the hot region and improve the performance of PCM-based embedded systems, a fine-grained partial wear leveling policy is proposed for Curling-PCM, by which only part of the hot region is moved during each request handling period. Experimental results show that Curling-PCM can effectively evenly distribute write traffic for a prime application of PCM in embedded systems. We expect this paper can serve as a first step toward the full exploration of application-specific features in PCM-based embedded systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 6. A Novel Built-In Self-Authentication Technique to Prevent Inserting Hardware Trojans

    Page(s): 1778 - 1791
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2415 KB) |  | HTML iconHTML  

    With the rapid globalization of the semiconductor industry, hardware Trojans have become a significant threat to government agencies and enterprises that require secure and reliable systems for their critical applications. Because of the diversity of hardware Trojans and the randomness associated with process variations, hardware Trojan detection is a challenging problem. In this paper, we propose a novel technique, called built-in self-authentication (BISA), which can be used to make hardware Trojan insertion by untrusted Graphic Data System (GDSII) developer and untrusted foundry considerably more difficult and easier to detect. The unused spaces in the circuit layout represent the best opportunity to insert Trojans by these entities. BISA works by eliminating this spare space and filling it with functional filler cells, instead of nonfunctional filler cells. A self-testing procedure generates a digital signature that will be different if any BISA cells are changed because of hardware Trojan insertion. We demonstrate that BISA can be applied to any flat or bottom-up hierarchical design with negligible overhead in terms of area, power, and timing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 7. Security Vulnerabilities of Emerging Nonvolatile Main Memories and Countermeasures

    Page(s): 2 - 15
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2895 KB) |  | HTML iconHTML  

    Emerging nonvolatile memory devices such as phase change memories and memristors are replacing SRAM and DRAM. However, nonvolatile main memories (NVMM) are susceptible to probing attacks even when powered down. This way, they may compromise sensitive data such as passwords and keys that reside in the NVMM. To eliminate this vulnerability, we propose sneak-path encryption (SPE), a hardware intrinsic encryption technique for memristor-based NVMMs. SPE is instruction set architecture independent and has minimal impact on performance. SPE exploits the physical parameters, such as sneak-paths in crossbar memories, to encrypt the data stored in a memristor-based NVMM. SPE is resilient to a number of attacks that may be performed on NVMMs. We use a cycle accurate simulator to evaluate the performance impact of SPE-based NVMM and compare against other security techniques. SPE can secure an NVMM with a ~1.3% performance overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 8. Novel Techniques for High-Sensitivity Hardware Trojan Detection Using Thermal and Power Maps

    Page(s): 1792 - 1805
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2193 KB) |  | HTML iconHTML  

    Hardware Trojans are malicious alterations or injections of unwanted circuitry to integrated circuits (ICs) by untrustworthy factories. They render great threat to the security of modern ICs by various unwanted activities such as bypassing or disabling the security fence of a system, leaking confidential information, deranging, or destroying the entire chip. Traditional testing strategies are becoming ineffective since these techniques suffer from decreased sensitivity toward small Trojans because of oversized chip and large amount of process variation present in nanometer technologies. The production volume along with decreased controllability and observability to complex ICs internals make it difficult to efficiently perform Trojan detection using typical structural tests like path latency and leakage power. In this paper, we propose a completely new post-silicon multimodal approach using runtime thermal and power maps for Trojan detection and localization. Utilizing the novel framework, we propose two different Trojan detection methods involving 2-D principal component analysis. First, supervised thresholding in case training data set is available and second, unsupervised clustering which require no prior characterization data of the chip. We introduce 11 regularization in the thermal to power inversion procedure which improves Trojan detection accuracy. To characterize ICs accurately, we perform our experiments in presence of realistic CMOS process variation. Our experimental evaluations reveal that our proposed methodology can detect very small Trojans with 3-4 orders of magnitude smaller power consumptions than the total power usage of the chip, while it scales very well because of the spatial view to ICs internals by the thermal mapping. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 9. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory

    Page(s): 994 - 1007
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5249 KB) |  | HTML iconHTML  

    Various new nonvolatile memory (NVM) technologies have emerged recently. Among all the investigated new NVM candidate technologies, spin-torque-transfer memory (STT-RAM, or MRAM), phase-change random-access memory (PCRAM), and resistive random-access memory (ReRAM) are regarded as the most promising candidates. As the ultimate goal of this NVM research is to deploy them into multiple levels in the memory hierarchy, it is necessary to explore the wide NVM design space and find the proper implementation at different memory hierarchy levels from highly latency-optimized caches to highly density- optimized secondary storage. While abundant tools are available as SRAM/DRAM design assistants, similar tools for NVM designs are currently missing. Thus, in this paper, we develop NVSim, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 10. Process-Variation Tolerant Channel-Adaptive Virtually Zero-Margin Low-Power Wireless Receiver Systems

    Page(s): 1764 - 1777
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2372 KB) |  | HTML iconHTML  

    This paper presents a process-variation tolerant, continuously channel-adaptive wireless front-end architecture and related adaptation algorithms to allow a radio-frequency transceiver to function with minimum power at all channel conditions and manufacturing process corner. Current wireless transceiver front-ends are designed for worst case channel conditions and a limited degree of post manufacture tuning is performed to compensate for process variations. It is shown how the proposed architecture can result in significant power savings over current practice without compromising system-level bit-error rate (while keeping the end-user experience unaffected). In contrast to traditional wireless circuits with limited tunability, such a zero-margin design is achieved by close loop adaptation of the wireless front-end circuits to ensure that they only consume the minimum power and deliver just enough performance (and not any more) for any channel condition. The adaptation methodology is applied to a WLAN receiver design and hardware measurement data for an adaptive receiver is presented showing a > 3 × power improvement under best case channel conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 11. Silicon Effect-Aware Full-Chip Extraction and Mitigation of TSV-to-TSV Coupling

    Page(s): 1900 - 1913
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3378 KB) |  | HTML iconHTML  

    This paper presents a silicon effect-aware multiTSV model. Through-silicon-via (TSV) depletion region, silicon substrate discharging path and electrical field distribution around TSV neighbor are modeled and studied in full-chip design. Verification with field solver and full-chip TSV-to-TSV coupling analysis in both the worst case and the average case show this model is accurate and efficient. It is found that 3-D nets receive more noise than their 2-D counterparts due to TSV-to-TSV coupling. To alleviate this coupling noise on TSV nets, two new optimization methods are investigated. One way is to utilize guard rings around the victim TSV so as to form a stronger discharging path, an alternative approach is to adopt differential signal transmission to improve noise immunity. These techniques have been implemented on 3-D IC designs with TSVs placed regularly or irregularly. Full-chip analysis results show that our approaches are effective in noise reduction with small area overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 12. High-Level Synthesis for FPGAs: From Prototyping to Deployment

    Page(s): 473 - 491
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1663 KB) |  | HTML iconHTML  

    Escalating system-on-chip design complexity is pushing the design community to raise the level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS msystem-on-chip design complexityethodology is happening now, especially for field-programmable gate array (FPGA) designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper, we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. In particular, the experiment on a sphere decoder shows that the HLS solution can achieve an 11-31% reduction in FPGA resource usage with improved design productivity compared to hand-coded design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 13. Measuring the Gap Between FPGAs and ASICs

    Page(s): 203 - 215
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (258 KB) |  | HTML iconHTML  

    This paper presents experimental measurements of the differences between a 90-nm CMOS field programmable gate array (FPGA) and 90-nm CMOS standard-cell application-specific integrated circuits (ASICs) in terms of logic density, circuit speed, and power consumption for core logic. We are motivated to make these measurements to enable system designers to make better informed choices between these two media and to give insight to FPGA makers on the deficiencies to attack and, thereby, improve FPGAs. We describe the methodology by which the measurements were obtained and show that, for circuits containing only look-up table-based logic and flip-flops, the ratio of silicon area required to implement them in FPGAs and ASICs is on average 35. Modern FPGAs also contain "hard" blocks such as multiplier/accumulators and block memories. We find that these blocks reduce this average area gap significantly to as little as 18 for our benchmarks, and we estimate that extensive use of these hard blocks could potentially lower the gap to below five. The ratio of critical-path delay, from FPGA to ASIC, is roughly three to four with less influence from block memory and hard multipliers. The dynamic power consumption ratio is approximately 14 times and, with hard blocks, this gap generally becomes smaller View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 14. RICE: rapid interconnect circuit evaluation using AWE

    Page(s): 763 - 776
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1216 KB)  

    This paper describes the Rapid Interconnect Circuit Evaluator (RICE) software developed specifically to analyze RC and RLC interconnect circuit models of virtually any size and complexity. RICE focuses specifically on the passive interconnect problem by applying the moment-matching technique of Asymptotic Waveform Evaluation (AWE) and application-specific circuit analysis techniques to yield large gains in run-time efficiency over circuit simulation without sacrificing accuracy. Moreover, this focus of AWE on passive interconnect problems permits the use of moment-matching techniques that produce stable, pre-characterized, reduced-order models for RC and RLC interconnects. RICE is demonstrated to be as accurate as a transient circuit simulation with hundreds or thousands of times the efficiency. The use of RICE is demonstrated on several VLSI interconnect and off-chip microstrip models View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 15. General-Purpose Nonlinear Model-Order Reduction Using Piecewise-Polynomial Representations

    Page(s): 249 - 264
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1002 KB) |  | HTML iconHTML  

    We present algorithms for automated macromodeling of nonlinear mixed-signal system blocks. A key feature of our methods is that they automate the generation of general-purpose macromodels that are suitable for a wide range of time- and frequency-domain analyses important in mixed-signal design flows. In our approach, a nonlinear circuit or system is approximated using piecewise-polynomial (PWP) representations. Each polynomial system is reduced to a smaller one via weakly nonlinear polynomial model-reduction methods. Our approach, dubbed PWP, generalizes recent trajectory-based piecewise-linear approaches and ties them with polynomial-based model-order reduction, which inherently captures stronger nonlinearities within each region. PWP-generated macromodels not only reproduce small-signal distortion and intermodulation properties well but also retain fidelity in large-signal transient analyses. The reduced models can be used as drop-in replacements for large subsystems to achieve fast system-level simulation using a variety of time- and frequency-domain analyses (such as dc, ac, transient, harmonic balance, etc.). For the polynomial reduction step within PWP, we also present a novel technique [dubbed multiple pseudoinput (MPI)] that combines concepts from proper orthogonal decomposition with Krylov-subspace projection. We illustrate the use of PWP and MPI with several examples (including op-amps and I/O buffers) and provide important implementation details. Our experiments indicate that it is easy to obtain speedups of about an order of magnitude with push-button nonlinear macromodel-generation algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 16. Fault Tolerant Network on Chip Switching With Graceful Performance Degradation

    Page(s): 883 - 896
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6565 KB) |  | HTML iconHTML  

    The structural redundancy inherent to on-chip interconnection networks [networks on chip (NoC)] can be exploited by adaptive routing algorithms in order to provide connectivity even if network components are out of service due to faults, which will appear at an increasing rate with future chip technology nodes. This paper is based on a new, fine-grained functional fault model and a corresponding distributed fault diagnosis method that facilitate determining the fault status of individual NoC switches and their adjacent communication links. Whereas previous work on network fault-tolerance assume switches to be either available or fully out of service, we present a novel adaptive routing algorithm that employs the remaining functionality of partly defective switches. Using diagnostic information, transient faults are handled with a retransmission scheme that avoids the latency penalty of end-to-end repeat requests. Thereby, graceful degradation of NoC communication performance can be achieved even under high failure rates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 17. High-Level Synthesis With Behavioral-Level Multicycle Path Analysis

    Page(s): 1832 - 1845
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1416 KB) |  | HTML iconHTML  

    High-level synthesis (HLS) tools generate register-transfer level (RTL) hardware descriptions from behavioral-level specifications through resource allocation, scheduling and binding. Traditionally, HLS tools build datapath pipelines by inserting pipeline registers to break combinational logic into single-cycle segments; accurately analyzing that the number of available cycles for signal propagation is proven to be infeasible at the RT-level. Thus, RT-level timing analyses must pessimistically assume each path has at most one cycle for signal propagation. This leads to false positives in critical-path analyses, prevents RTL synthesis tools from optimizing real critical paths, and forces HLS flows to insert pipeline registers without improving hardware quality. In this paper, we present an efficient behavioral-level multicycle path analysis (BL-MCPA) algorithm that leverages control-data flow information to reduce time complexity of multicycle path analysis from exponential to polynomial. BL-MCPA helps eliminate false positives in timing analysis, and improves the reported fmax by 15% on average. With BL-MCPA, we avoid unnecessary pipeline register insertion, and reduce execution latency by 25% and register usage by 29% under a user fmax constraint of 300 MHz. Using BL-MCPA, we replace large multiplexers (MUXs) by pipelined MUX-trees and reduce execution latency of hardware by up to 67% on designs whose performance is limited by the large MUXs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 18. CASA: Contention-Aware Scratchpad Memory Allocation for Online Hybrid On-Chip Memory Management

    Page(s): 1806 - 1817
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2869 KB) |  | HTML iconHTML  

    Scratchpad memory (SPM) has been increasingly used in embedded systems due to its higher efficiency in terms of energy and area compared to that of ordinary cache. A hybrid on-chip memory architecture that combines SPM with a mini-cache has been proposed. One key issue for hybrid on-chip memory architectures is to reduce the number of off-chip memory accesses and energy consumption. Existing methods achieve this by moving the most frequently accessed data into SPM. However, these methods may be ineffective because the main source of off-chip memory accesses may not be the most frequently accessed data. Instead, most off-chip memory accesses are caused by cache misses, so reducing the latter will reduce the former. Cache misses are mainly caused by data contending for cache lines. Therefore, this paper proposes a contention-aware SPM allocation method for hybrid on-chip management. The number of cache misses for a page is used as a metric to determine whether a page should be moved to SPM. When the number of misses for a page exceeds a threshold, the page is moved to SPM, reducing cache contention. Experimental results show that the proposed method can reduce the energy delay product by 35% to 53% compared to a cache-only on-chip memory architecture and 19% to 31% compared to an existing hybrid on-chip memory architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 19. Cell-Aware Test

    Page(s): 1396 - 1409
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (3271 KB) |  | HTML iconHTML  

    This paper describes the new cell-aware test (CAT) approach, which enables a transistor-level and defect-based ATPG on full CMOS-based designs to significantly reduce the defect rate of manufactured ICs, including FinFET technologies. We present results from a defect-oriented CAT fault model generation for 1,940 standard library cells, as well as the application of CAT to several industrial designs. We present high volume production test results from a 32 nm notebook processor and from a 350 nm automotive design, including the achieved defect rate reduction in defective-parts-per-million. We also present CAT diagnosis and physical failure analysis results from one failing part and give an outlook for using the functionality for quickly ramping up the yield in advanced technology nodes. View full abstract»

    Open Access
  • 20. Electromigration Study for Multiscale Power/Ground Vias in TSV-Based 3-D ICs

    Page(s): 1873 - 1885
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2868 KB) |  | HTML iconHTML  

    Electromigration (EM) in power distribution networks (PDNs) is a major reliability issue in 3-D ICs. While the EM issues of local vias and through-silicon-vias (TSV) have been studied separately, the interplay of TSVs and conventional local vias in 3-D ICs has not been well investigated. This co-design is necessary when the die-to-die vertical power delivery is done using both TSVs and local interconnects. In this paper, we model EM for PDNs of 3-D ICs with a focus on multiscale via (MSV) structure, i.e., TSVs and local vias used together for vertical power delivery. We study the impact of structure, material, and preexisting void conditions on the EM-related lifetime of our MSV structures. We also investigate the transient IR-voltage change of full-chip level 3-D PDNs with MSVs with our model. The experimental results demonstrate that our EM modeling can effectively capture the EM reliability of the full-chip level 3-D PDNs with MSVs, which can be hard to achieve by the traditional EM analysis based on the individual local via or the TSV. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 21. Analytical Thermal Model for Self-Heating in Advanced FinFET Devices With Implications for Design and Reliability

    Page(s): 1045 - 1058
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (11291 KB) |  | HTML iconHTML  

    A rigorous analytical thermal model has been formulated for the analysis of self-heating effects in FinFETs, under both steady-state and transient stress conditions. 3-D self-consistent electrothermal simulations, tuned with experimentally measured electrical characteristics, were used to understand the nature of self-heating in FinFETs and calibrate the proposed model. The accuracy of the model has been demonstrated for a wide range of multifin devices by comparing it against finite element simulations. The model has been applied to carry out a detailed sensitivity analysis of self-heating with respect to various FinFET parameters and structures, which are critical for improving circuit performance and electrical overstress/electrostatic discharge (ESD) reliability. The transient model has been used to estimate the thermal time constants of these devices and predict the sensitivity of power-to-failure to various device parameters, for both long and short pulse ESD situations. Suitable modifications to the model are also proposed for evaluating the thermal characteristics of production level FinFET (or Tri-gate FET) structures involving metal-gates, body-tied bulk FinFETs, and trench contacts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 22. A New Frontier in Ultralow Power Wireless Links: Network-on-Chip and Chip-to-Chip Interconnects

    Page(s): 186 - 198
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2001 KB) |  | HTML iconHTML  

    This paper explores the general framework and prospects for on-chip and off-chip wireless interconnects implemented for high-performance computing (HPC) systems in the context of micro power wireless design. HPC interconnects demand very high (≥ 10 Gb/s) transmission rates using ultraefficient ( $sim ~1$ pJ/bit) transceivers over extremely short (≤ 100 cm) ranges. In an attempt to design such wireless interconnects, first a model for the wireless communication channel properties is developed. The use of CMOS-based energy-efficient on–off keying (OOK) transceiver architectures operating in the 60–90 GHz bands is considered as a practical solution. In order to address strict performance requirements of wireless HPC interconnects, and taking advantage of the recent developments in device scaling, compact low-power and innovative circuits based on novel double-gate MOSFETs (DG-MOSFETs) are proposed in the implementation of the architecture. The performance of a compact low-noise amplifier (LNA) design using common source (CS) inductive degeneration with 32 nm DG-MOSFETs is investigated by quantitative analysis and simulation. The proposed inductor-less two-stage cascode cascade LNA is optimized for 90 GHz operation and has the advantage of gain switching over its CMOS counterpart without the use of additional switching transistors, which makes it remarkably power efficient and faster. As further examples of efficient and compact DG-MOSFET circuits for OOK transceiver design, a three-stage CS 5 dB tunable power amplifier operating up to 90 GHz, and a novel 90 GHz voltage controlled oscillator are also presented. This is followed by the proposal of an array of four monopole antennas studied using full-wave EM solver. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 23. Towards Optimal Performance-Area Trade-Off in Adders by Synthesis of Parallel Prefix Structures

    Page(s): 1517 - 1530
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1887 KB) |  | HTML iconHTML  

    This paper proposes an efficient algorithm to synthesize prefix graph structures that yield adders with the best performance-area trade-off. For designing a parallel prefix adder of a given bit-width, our approach generates prefix graph structures to optimize an objective function such as size of prefix graph subject to constraints like bit-wise output logic level. Given bit-width n and level (L) restriction, our algorithm excels the existing algorithms in minimizing the size of the prefix graph. We also prove its size-optimality when n is a power of two and L= log2n. Besides prefix graph size optimization and having the best performance-area trade-off, our approach, unlike existing techniques, can 1) handle more complex constraints such as maximum node fanout or wire-length that impact the performance/area of a design and 2) generate several feasible solutions that minimize the objective function. Generating several size-optimal solutions provides the option to choose adder designs that mitigate constraints such as wire congestion or power consumption that are difficult to model as constraints during logic synthesis. Experimental results demonstrate that our approach improves performance by 3% and area by 9% over even a 64-bit full custom designed adder implemented in an industrial high-performance design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 24. Optimizing the Power Delivery Network in a Smartphone Platform

    Page(s): 36 - 49
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (16483 KB) |  | HTML iconHTML  

    Smartphones consume a significant amount of power. Indeed, they can hardly provide a full day of use between charging operations even with a 2000 mAh battery. While power minimization and dynamic power management techniques have been heavily explored to improve the power efficiency of modules (processors, memory, display, GPS, etc.) inside a smartphone platform, there is one critical factor that is often overlooked: the power conversion efficiency of the power delivery network (PDN). This paper focuses on dc-dc converters, which play a pivotal role in the PDN of the smartphone platform. Starting from detailed models of the dc-dc converter designs, two optimization methods are presented: 1) static switch sizing to maximize the efficiency of a dc-dc converter under statistical loading profiles and 2) dynamic switch modulation to achieve the high efficiency enhancement under dynamically varying load conditions. To verify the efficacy of the optimization methods in actual smartphone platforms, this paper also presents a characterization procedure for the PDN. The procedure is as follows: 1) group the modules in the smartphone platform together and use profiling to estimate their average and peak power consumption levels and 2) build an equivalent dc-dc converter model for the power delivery path from the battery source to each group of modules and use linear regression to estimate the conversion efficiency of the corresponding equivalent converter. Experimental results demonstrate that the static switch sizing can achieve 6% power conversion efficiency enhancement, which translates to 19% reduction in power loss general usage of the smartphone. The dynamic switch modulation accomplishes similar improvement at the same condition, while also achieving high efficiency enhancement in various load conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 25. Ant Colony Optimization-Based Fault-Aware Routing in Mesh-Based Network-on-Chip Systems

    Page(s): 1693 - 1705
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2151 KB) |  | HTML iconHTML  

    The advanced deep submicrometer technology increases the risk of failure for on-chip components. In advanced network-on-chip (NoC) systems, the failure constrains the on-chip bandwidth and network throughput. Fault-tolerant routing algorithms aim to alleviate the impact on performance. However, few works have integrated the congestion-, deadlock-, and fault-awareness information in channel evaluation function to avoid the hotspot around the faulty router. To solve this problem, we propose the ant colony optimization-based fault-aware routing (ACO-FAR) algorithm for load balancing in faulty networks. The behavior of an ant colony while facing an obstacle (failure in NoC) can be described in three steps: 1) encounter; 2) search; and 3) select. We implement the corresponding mechanisms as: 1) notification of fault information; 2) path searching mechanism; and 3) path selecting mechanism. With proposed ACO-FAR, the router can evaluate the available paths and detour packets through a less-congested fault-free path. The simulation results show that this paper has higher throughput than related works by 29.1%-66.5%. In addition, ACO-FAR can reduce the undelivered packet ratio to 0.5%-0.02% and balance the distribution of traffic flow in the faulty network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 26. Path-Congestion-Aware Adaptive Routing With a Contention Prediction Scheme for Network-on-Chip Systems

    Page(s): 113 - 126
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (17206 KB) |  | HTML iconHTML  

    Network-on-chip systems can achieve higher performance than bus systems for chip multiprocessor systems. However, as the complexity of the network increases, the channel and switch congestion problems become major performance bottlenecks. An effective adaptive routing algorithm can help minimize path congestion through load balancing. However, conventional adaptive routing schemes only use channel-based information to detect the congestion status. Due to the lack of switch-based information, channel-based information is difficult to reveal the real congestion status along the routing path. Therefore, in this paper, we remodel the path congestion information to show hidden spatial congestion information and improve the effectiveness of routing path selection. We propose a path-congestion-aware adaptive routing (PCAR) scheme based on the following techniques: 1) a path-congestion-aware selection strategy that simultaneously considers switch congestion and channel congestion, and 2) a contention prediction technique that uses the rate of change in the buffer level to predict possible switch contention. The experimental results show that the proposed PCAR scheme can achieve a high saturation throughput with an improvement of 15.4%-48.7% compared to existing routing schemes. The proposed PCAR method also includes a VLSI architecture, which has higher area efficiency with an improvement of 16%-35.7% compared with the other router designs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 27. Automation of IC layout with analog constraints

    Page(s): 923 - 942
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2516 KB)  

    A methodology for the automatic synthesis of full-custom IC layout with analog constraints is presented. The methodology guarantees that all performance constraints are met when feasible, or otherwise, infeasibility is detected as soon as possible, thus providing a robust and efficient design environment. In the proposed approach, performance specifications are translated into lower-level bounds on parasitics or geometric parameters, using sensitivity analysis. Bounds can be used by a set of specialized layout tools performing stack generation, placement, routing, and compaction. For each tool, a detailed description is provided of its functionality, of the way constraints are mapped and enforced, and of its impact on the design flow. Examples drawn from industrial applications are reported to illustrate the effectiveness of the approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 28. PS3-RAM: A Fast Portable and Scalable Statistical STT-RAM Reliability/Energy Analysis Method

    Page(s): 1644 - 1656
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2233 KB) |  | HTML iconHTML  

    The development of emerging spin-transfer torque random access memory (STT-RAM) is facing two major technical challenges-poor write reliability and high write energy, both of which are severely impacted by process variations and thermal fluctuations. The evaluations on STT-RAM design metrics and robustness often require a hybrid simulation flow, i.e., modeling the CMOS and magnetic devices with SPICE and macro-magnetic models, respectively. Very often, such a hybrid simulation flow involves expensive Monte Carlo simulations when the design and behavioral variabilities of STT-RAM are taken into account. In this paper, we propose a fast and scalable semi-analytical method-PS3-RAM, enabling efficient statistical simulations in STT-RAM designs. By eliminating the costly macro-magnetic and SPICE simulations, PS3-RAM achieves more than 100(000boldsymbol {times }) runtime speedup with excellent agreement with the result of conventional simulation method. PS3-RAM can also accurately estimate the STT-RAM write error rate and write energy distributions at both magnetic tunneling junction switching directions under different temperatures, demonstrating great potential in the analysis of STT-RAM reliability and write energy at the early design stage of memory or micro-architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 29. Reuse-Based Optimization for Prebond and Post-Bond Testing of 3-D-Stacked ICs

    Page(s): 122 - 135
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3053 KB) |  | HTML iconHTML  

    Three-dimensional (3-D) stacking of integrated circuits (ICs) using through-silicon-vias (TSVs) is a promising integration platform for next-generation ICs. Since TSVs are not fully accessible prior to bonding, it is difficult to test the combinational logic between scan flip-flops and TSVs at a prebond stage. In order to increase testability, it has been advocated that wrapper cells (WC) be added at both ends of a TSV. However, a drawback of WC is that they incur area overhead and lead to higher latency and performance degradation on functional paths. Prior work proposed the reuse of scan cells to achieve high testability, thereby reducing the number of WC that need to be inserted; however, practical timing considerations were overlooked and the number of inserted WC was still high. We show that the general problem of minimizing the WC is equivalent to the graph-theoretic minimum clique-partitioning problem, and is therefore NP-hard. We adopt efficient heuristic methods to solve the problem and describe a timing-guided and layout-aware solution. We evaluate the heuristic methods using an exact solution technique based on integer linear programming. We also present design-for-test optimization technique to leverage the reuse-based method during post-bond testing. Results are presented for 3-D-stack implementations of the ITC'99 and the OpenCore benchmark circuits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 30. Efficient Multilayer Obstacle-Avoiding Rectilinear Steiner Tree Construction Based on Geometric Reduction

    Page(s): 1928 - 1941
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2593 KB) |  | HTML iconHTML  

    Given a set of pin-vertices, an obstacle-avoiding rectilinear Steiner minimal tree (OARSMT) connects all the pin-vertices possibly through Steiner points using vertical and horizontal segments with the minimal wirelength and without intersecting any obstacle. To deal with multiple routing layers and preferred routing orientations, we consider the multilayer obstacle-avoiding rectilinear Steiner minimal tree (ML-OARSMT) problem and the obstacle-avoiding preferred direction Steiner tree (OAPD-ST) problem. First, we prove that the multilayer case is theoretically different from the 2D one, and propose a reduction to transform a multilayer instance into a 3D instance. Based on the reduction, we apply computational geometry techniques to develop an efficient algorithm, utilizing existing OARSMT heuristics, for the ML-OARSMT problem and the OAPD-ST problem. Furthermore, we develop an advanced Steiner point selection to avoid inferior Steiner points and to improve the solution quality. Experimental results show that our algorithm provides a solution with excellent quality and has a significant speed-up compared to previously known results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 31. Statistical Timing Analysis: From Basic Principles to State of the Art

    Page(s): 589 - 607
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (622 KB) |  | HTML iconHTML  

    Static-timing analysis (STA) has been one of the most pervasive and successful analysis engines in the design of digital circuits for the last 20 years. However, in recent years, the increased loss of predictability in semiconductor devices has raised concern over the ability of STA to effectively model statistical variations. This has resulted in extensive research in the so-called statistical STA (SSTA), which marks a significant departure from the traditional STA framework. In this paper, we review the recent developments in SSTA. We first discuss its underlying models and assumptions, then survey the major approaches, and close by discussing its remaining key challenges. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 32. An Algorithm for Synthesis of Reversible Logic Circuits

    Page(s): 2317 - 2330
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (537 KB)  

    Reversible logic finds many applications, especially in the area of quantum computing. A completely specified n-input, n-output Boolean function is called reversible if it maps each input assignment to a unique output assignment and vice versa. Logic synthesis for reversible functions differs substantially from traditional logic synthesis and is currently an active area of research. The authors present an algorithm and tool for the synthesis of reversible functions. The algorithm uses the positive-polarity Reed-Muller expansion of a reversible function to synthesize the function as a network of Toffoli gates. At each stage, candidate factors, which represent subexpressions common between the Reed-Muller expansions of multiple outputs, are explored in the order of their attractiveness. The algorithm utilizes a priority-based search tree, and heuristics are used to rapidly prune the search space. The synthesis algorithm currently targets the generalized n-bit Toffoli gate library. However, other algorithms exist that can convert an n-bit Toffoli gate into a cascade of smaller Toffoli gates. Experimental results indicate that the authors' algorithm quickly synthesizes circuits when tested on the set of all reversible functions of three variables. Furthermore, it is able to quickly synthesize all four-variable and most five-variable reversible functions that were in the test suite. The authors also present results for some benchmark functions widely discussed in literature and some new benchmarks that the authors have developed. The algorithm is shown to synthesize many, but not all, randomly generated reversible functions of as many as 16 variables with a maximum gate count of 25 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 33. A Reliability-Aware Address Mapping Strategy for NAND Flash Memory Storage Systems

    Page(s): 1623 - 1631
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (933 KB) |  | HTML iconHTML  

    The increasing density of NAND flash memory leads to a dramatic increase in the bit error rate of flash, which greatly reduces the ability of error correcting codes (ECC) to handle multibit errors. NAND flash memory is normally used to store the file system metadata and page mapping information. Thus, a broken physical page containing metadata may cause an unintended and severe change in functionality of the entire flash. This paper presents Meta-Cure, a novel hardware and file system interface that transparently protects metadata in the presence of multibit faults. Meta-Cure exploits built-in ECC and replication in order to protect pages containing critical data, such as file system metadata. Redundant pairs are formed at run time and distributed to different physical pages to protect against failures. Meta-Cure requires no changes to the file system, on-chip hierarchy, or hardware implementation of flash memory chip. We evaluate Meta-Cure under a real-embedded platform using a variety of I/O traces. The evaluation platform adopts dual ARM Cortex A9 processor cores with 64 Gb NAND flash memory. We have evaluated the effectiveness of Meta-Cure on the new technology file system file system. Experimental results show that the proposed technique can reduce uncorrectable page errors by 70.38% with less than 7.86% time overhead in comparison with conventional error correction techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 34. Enabling High-Dimensional Hierarchical Uncertainty Quantification by ANOVA and Tensor-Train Decomposition

    Page(s): 63 - 76
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1730 KB) |  | HTML iconHTML  

    Hierarchical uncertainty quantification can reduce the computational cost of stochastic circuit simulation by employing spectral methods at different levels. This paper presents an efficient framework to simulate hierarchically some challenging stochastic circuits/systems that include high-dimensional subsystems. Due to the high parameter dimensionality, it is challenging to both extract surrogate models at the low level of the design hierarchy and to handle them in the high-level simulation. In this paper, we develop an efficient analysis of variance-based stochastic circuit/microelectromechanical systems simulator to efficiently extract the surrogate models at the low level. In order to avoid the curse of dimensionality, we employ tensor-train decomposition at the high level to construct the basis functions and Gauss quadrature points. As a demonstration, we verify our algorithm on a stochastic oscillator with four MEMS capacitors and 184 random parameters. This challenging example is efficiently simulated by our simulator at the cost of only 10min in MATLAB on a regular personal computer. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 35. A Blind Dynamic Fingerprinting Technique for Sequential Circuit Intellectual Property Protection

    Page(s): 76 - 89
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (14873 KB) |  | HTML iconHTML  

    Design fingerprinting is a means to trace the illegally redistributed intellectual property (IP) by creating a unique IP instance with a different signature for each user. Existing fingerprinting techniques for hardware IP protection focus on lowering the design effort to create a large number of different IP instances without paying much attention on the ease of fingerprint detection upon IP integration. This paper presents the first dynamic fingerprinting technique on sequential circuit IPs to enable both the owner and legal buyers of an IP embedded in a chip to be readily identified in the field. The proposed fingerprint is an oblivious ownership watermark independently endorsed by each user through a blind signature protocol. Thus, the authorship can also be proved through the detection of different user's fingerprints without the need to separately embed an identical IP owner's signature in all fingerprinted instances. The proposed technique is applicable to both application-specific integrated circuit and field-programmable gate array IPs. Our analyses show that the fingerprint is immune to collusion attack and can withstand all perceivable attacks, with a lower probability of removal than state-of-the-art FSM watermarking schemes. The probability of coincidence of a 32-bit fingerprint is in the order of 10-10 and up to 1035 32-bit fingerprinted instances can be generated for a small design of 100 flip-flops. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 36. Out-of-Order Parallel Discrete Event Simulation for Transaction Level Models

    Page(s): 1859 - 1872
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2025 KB) |  | HTML iconHTML  

    The validation of system models at the transaction-level typically relies on discrete event (DE) simulation. In order to reduce simulation time, parallel discrete event simulation (PDES) can be used by utilizing multiple cores available on today's host PCs. However, the total order of time imposed by regular DE simulators becomes a bottleneck that severely limits the benefits of parallel simulation. In this paper, we present a new out-of-order (OoO) PDES technique for simulating transaction-level models on multicore hosts. By localizing the simulation time to individual threads and carefully handling events at different times, a system model can be simulated following a partial order of time without loss of accuracy. Subject to advanced static analysis at compile time and table-based decisions at run time, threads can be issued early, reducing the idle time of available cores. Our proposed OoO PDES technique shows high performance gains in simulation speed with only a small increase in compile time. Using six embedded application examples, we also show the speed trade-off for multicore PDES based on different multithreading libraries. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 37. Reliability-Driven Software Transformations for Unreliable Hardware

    Page(s): 1597 - 1610
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (15746 KB) |  | HTML iconHTML  

    We propose multiple reliability-driven software transformations targeting unreliable hardware. These transformations reduce the executions of critical instructions and spatial/temporal vulnerabilities of different instructions with respect to different processor components. The goal is to lower the application's susceptibility toward failures. Compared to performance-optimized compilation, our method incurs 60% lower application failures, averaged over various fault injection scenarios and fault rates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 38. Efficient and Concurrent Reliable Realization of the Secure Cryptographic SHA-3 Algorithm

    Page(s): 1105 - 1109
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1716 KB) |  | HTML iconHTML  

    The secure hash algorithm (SHA)-3 has been selected in 2012 and will be used to provide security to any application which requires hashing, pseudo-random number generation, and integrity checking. This algorithm has been selected based on various benchmarks such as security, performance, and complexity. In this paper, in order to provide reliable architectures for this algorithm, an efficient concurrent error detection scheme for the selected SHA-3 algorithm, i.e., Keccak, is proposed. To the best of our knowledge, effective countermeasures for potential reliability issues in the hardware implementations of this algorithm have not been presented to date. In proposing the error detection approach, our aim is to have acceptable complexity and performance overheads while maintaining high error coverage. In this regard, we present a low-complexity recomputing with rotated operands-based scheme which is a step-forward toward reducing the hardware overhead of the proposed error detection approach. Moreover, we perform injection-based fault simulations and show that the error coverage of close to 100% is derived. Furthermore, we have designed the proposed scheme and through ASIC analysis, it is shown that acceptable complexity and performance overheads are reached. By utilizing the proposed high-performance concurrent error detection scheme, more reliable and robust hardware implementations for the newly-standardized SHA-3 are realized. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 39. GORDIAN: VLSI placement by quadratic programming and slicing optimization

    Page(s): 356 - 365
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1004 KB)  

    The authors present a placement method for cell-based layout styles. It is composed of alternating and interacting global optimization and partitioning steps that are followed by an optimization of the area utilization. Methods using the divide-and-conquer paradigm usually lose the global view by generating smaller and smaller subproblems. In contrast, GORDIAN maintains the simultaneous treatment of all cells over all global optimization steps, thereby considering constraints that reflect the current dissection of the circuit. The global optimizations are performed by solving quadratic programming problems that possess unique global minima. Improved partitioning schemes for the stepwise refinement of the placement are introduced. The area utilization is optimized by an exhaustive slicing procedure. The placement method is applied to real-world problems, and excellent results in terms of placement quality and computation time are obtained View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 40. Synthesis of Dual-Rail Adiabatic Logic for Low Power Security Applications

    Page(s): 975 - 988
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2531 KB) |  | HTML iconHTML  

    Programmable reversible logic is emerging as a prospective logic design style for implementation in low power, low frequency applications where minimal impact on circuit heat generation is desirable, such as mitigation of differential power analysis attacks. Adiabatic logic is an implementation of reversible logic in CMOS where the current flow through the circuit is controlled such that the energy dissipation due to switching and capacitor dissipation is minimized. Recent advances in dual-rail adiabatic logic show reduction in average and differential power, making this design methodology advantageous in applications where security is the primary design metric and operating frequency is slower, such as Smart Cards. In this paper, we present an algorithm for synthesis of adiabatic circuits in CMOS. Then, using the ESPRESSO heuristic for minimization of Boolean functions method on each output node, we reduce the size of the synthesized circuit. Our approach correlates the horizontal offsets in the permutation matrix with the necessary switches required for synthesis instead of using a library of equivalent functions. The synthesis results show that, on average, the proposed algorithm represents an improvement of 36% over the best known reversible designs with the optimized dual-rail cell libraries. Then, we present an adiabatic S-box which significantly reduces energy imbalance compared to previous benchmarks. The design is capable of forward encryption and reverse decryption with minimal overhead, allowing for efficient hardware reuse. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 41. Placement for Binary-Weighted Capacitive Array in SAR ADC Using Multiple Weighting Methods

    Page(s): 1277 - 1287
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4823 KB) |  | HTML iconHTML  

    The overall accuracy and linearity of a matching-limited successive-approximation-register analog-to-digital converter are primarily determined by its digital-to-analog converter's (DAC's) matching characteristics. As the resolution of the DAC increases, it is harder to achieve accurate capacitance ratios in the layout, which are affected by systematic and random mismatches. An ideal placement for the DAC array should try to minimize the systematic mismatches, followed by the random mismatch. This paper proposes a placement strategy, which incorporates a matrix-adjustment method for the DAC, and different placement techniques and weighting methods for the placements of active and dummy unit capacitors. The resulting placement addresses both systematic and random mismatches. We consider the following four systematic mismatches such as the first-order process gradients, the second-order lithographic errors, the proximity effects, the wiring complexity, and the asymmetrical fringing parasitics. The experimental results show that the placement strategy achieves smaller capacitance ratio mismatch and shorter computational runtime than those of existing works. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 42. Fast Monte Carlo-Based Estimation of Analog Parametric Test Metrics

    Page(s): 1977 - 1990
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2154 KB) |  | HTML iconHTML  

    The accepted approach in industry today to ensure out-going quality in high-volume manufacturing of analog circuits is to directly measure datasheet specifications. To reduce the involved costs it is required to eliminate specification tests or use instead lower-cost alternative tests. However, this is too risky if the resultant fault coverage and yield coverage metrics of the new test approach are not estimated accurately. This paper proposes a methodology to efficiently derive a set of most probable failing and marginally functional circuit instances. Based on this set, we can readily define and estimate fault coverage and yield coverage metrics. Our methodology reduces the required number of Monte Carlo simulations by one or more orders of magnitude. As an illustrative example, the methodology is applied to a radio frequency low-noise amplifier. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 43. Modeling the “Effective capacitance” for the RC interconnect of CMOS gates

    Page(s): 1526 - 1535
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (872 KB)  

    With finer line widths and faster switching speeds, the resistance of on-chip metal interconnect is having a dominant impact on the timing behavior of logic gates. Specifically, the gates are switching faster and the interconnect delays are getting longer due to scaling. This results in a trend in which the RC interconnect delay is beginning to comprise a larger portion of the overall logic stage delay. This shift in relative delay dominance from the gate to the RC interconnect is increased by resistance shielding. That is, as the gate “resistance” gets smaller and the metal resistance gets larger, the gate no longer “sees” the total net capacitance and the gate delay may be significantly less than expected. This trend complicates the timing analysis of digital circuits, which relies upon simple, empirical gate delay equations for efficiency. In this paper, we develop an analytical expression for the “effective load capacitance” of a pc interconnect. In addition, when there is significant shielding, the response waveforms at the gate output may have a large exponential tail. We show that this waveform tail can strongly influence the delay of the RC interconnect. Therefore, we propose an extension of the effective capacitance equation that captures the complete waveform response accurately, with a two-piece gate-output-waveform approximation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 44. Deterministic Synthesis of Hybrid Application-Specific Network-on-Chip Topologies

    Page(s): 1503 - 1516
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2425 KB) |  | HTML iconHTML  

    Networks-on-Chip (NoCs) enable cost-efficient and effective communication between the processing elements inside modern systems-on-chip (SoCs). NoCs with regular topologies such as meshes, tori, rings, and trees are well suited for general-purpose many core SoCs. These topologies might prove suboptimal for SoCs with predefined application characteristics and traffic patterns. Such SoCs benefit from application-specific NoC topologies, designed and optimized according to the application characteristics. This paper proposes a synthesis approach for creating hybrid, application-specific NoCs from an input floorplan and a set of use cases, describing the applications running on the SoC. The method considers latency, port count, and link length constraints. It produces hybrid topologies that utilize both NoC routers and shared buses. Furthermore, the proposed approach can insert intermediate relay routers that act as bridges or repeaters and help to reduce the cost further. Finally, the approach creates a deadlock-free routing of the communication flows by either finding deadlock-free paths or by inserting virtual channels. The benefits of the proposed method are demonstrated by comparing it to state-of-the-art approaches on a generic and an industrial SoC examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 45. Interconnect Testing and Test-Path Scheduling for Interposer-Based 2.5-D ICs

    Page(s): 136 - 149
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3118 KB) |  | HTML iconHTML  

    Interposer-based 2.5-D integrated circuits (ICs) are seen today as a first step toward the eventual industry adoption of 3-D ICs based on through-silicon vias (TSVs). The TSVs and the redistribution layer (RDL) in the silicon interposer, and micro-bumps in the assembled chip must be adequately tested for product qualification. We present an efficient interconnect-test solution that targets TSVs, RDL wires, and micro-bumps for shorts, opens, and delay faults. The proposed test technique is fully compatible with the IEEE 1149.1 Standard. To reduce test cost, we also present a test-path design and scheduling technique that minimizes a composite cost function based on test time and the design-for-test overhead in terms of additional TSVs and micro-bumps needed for test access. The locations of the dies on the interposer are taken into consideration in order to determine the order of dies in a single test path. We present simulation results to demonstrate the effectiveness of fault detection, and synthesis results to evaluate the hardware cost per die relative to the IEEE 1149.1 Standard. We also present test-path design and test-scheduling results to highlight the effectiveness of the optimization technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 46. NTUplace4h: A Novel Routability-Driven Placement Algorithm for Hierarchical Mixed-Size Circuit Designs

    Page(s): 1914 - 1927
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2345 KB) |  | HTML iconHTML  

    A wirelength-driven placer without considering routability could introduce irresolvable routing-congested placements. Therefore, it is desirable to develop an effective routability-driven placer for modern mixed-size designs employing hierarchical methodologies for faster turnaround time. In this paper, we propose a novel routability-driven analytical placement algorithm for hierarchical mixed-size circuit designs. This paper presents a novel design hierarchy identification technique to effectively identify design hierarchies and guide placement for better wirelength and routability. The proposed algorithm optimizes routability from four major aspects: 1) narrow channel handling; 2) pin density; 3) routing overflow optimization; and 4) net congestion optimization. Routability-driven legalization and detailed placement are also proposed to further optimize routing congestion. Compared with the participating teams for the 2012 ICCAD Design Hierarchy Aware Routability-driven Placement Contest, our placer can achieve the best quality (both the average overflow and wirelength) and the best overall score (by additionally considering running time). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 47. Charge Allocation in Hybrid Electrical Energy Storage Systems

    Page(s): 1003 - 1016
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (742 KB) |  | HTML iconHTML  

    A hybrid electrical energy storage (HEES) system consists of multiple banks of heterogeneous electrical energy storage (EES) elements placed between a power source and some load devices and providing charge storage and retrieval functions. For an HEES system to perform its desired functions of 1) reducing electricity costs by storing electricity obtained from the power grid at off-peak times when its price is lower, for use at peak times instead of electricity that must be bought then at higher prices, and 2) alleviating problems, such as excessive power fluctuation and undependable power supply, which are associated with the use of large amounts of renewable energy on the grid, appropriate charge management policies must be developed in order to efficiently store and retrieve electrical energy while attaining performance metrics that are close to the respective best values across the constituent EES banks in the HEES system. This paper is the first to formally describe the global charge allocation problem in HEES systems, namely, distributing a specified level of incoming power to a subset of destination EES banks so that maximum charge allocation efficiency is achieved. The problem is formulated as a mixed integer nonlinear program with the objective function set to the global charge allocation efficiency and the constraints capturing key requirements and features of the system such as the energy conservation law, power conversion losses in the chargers, the rate capacity, and self-discharge effects in the EES elements. A rigorous algorithm is provided to obtain near-optimal charge allocation efficiency under a daily charge allocation schedule. A photovoltaic array is used as an example of the power source for the charge allocation process and a heuristic is provided to predict the solar radiation level with a high accuracy. Simulation results using this photovoltaic cell array and a representative HEES system demonstrate up to 25% gain in the charge allocation e- ficiency by employing the proposed algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 48. Software-Based Self-Test for Small Caches in Microprocessors

    Page(s): 1991 - 2004
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2997 KB) |  | HTML iconHTML  

    Nowadays, on-line testing is essential for modern microprocessors to detect latent defects that either escape manufacturing testing or appear during system operation. Small memories, such as L1 caches and translation lookaside buffers (TLBs) are not usually equipped with memory built-in self-test (MBIST) hardware. Software-based self-test (SBST) is a flexible and low-cost solution for on-line March test application and error detection in such small memories. Although, L1 caches and TLBs are small components, their reliable operation is crucial for the system performance due to the large penalties caused when L1 cache or TLB misses occur. In this paper, an SBST program development methodology is proposed for on-line testing of small cache memories in microprocessors. To overcome testability challenges that are due to the “hidden” or implicit operation of such memories, the proposed SBST methodology exploits: 1) existing special purpose instructions that modern instruction set architectures implement to access these cache arrays for debug-diagnostic (DD) purposes, termed hereafter direct cache access instructions and 2) performance monitoring and trap handling mechanisms. Besides, the proposed SBST methodology combines features that are crucial for on-line testing: a) compact test validation; b) simplified coding style; c) low invasiveness of the test program; and d) small memory footprint. The methodology is comprehensively demonstrated on the instruction and data L1 cache arrays and the instruction and data TLB arrays of OpenSPARC T1. Experimental results show that the exploitation of such DD instructions has a significant improvement of test time (up to 86% for instruction L1 cache, up to 87% for the data L1 cache, up to 37% for D-TLB, and up to 91% for I-TLB) when compared to SBST solutions that do not utilize these types of instructions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 49. Aging Adaption in Integrated Circuits Using a Novel Built-In Sensor

    Page(s): 109 - 121
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2306 KB) |  | HTML iconHTML  

    As process technology further scales, aging, noise and variations in integrated circuits (ICs) and systems become a major challenge to both the semiconductor and electronic design automation (EDA) industries, which may cause significantly increased mismatch between modeled and actual silicon behavior, and even IC failure in field. Therefore, the addition of accurate and low-cost on-chip sensors is of great value to reduce the mismatch and perform in-field measurements. This paper presents a novel standard-cell-based sensor for reliability analysis of digital ICs (called Radic), in order to better understand the characteristics of gate, functional path aging and process variations' impact on timing performance, and perform in-field aging measurements. The Radic sensor has been fabricated on two floating gate Freescale SoCs in very advanced technology. The measurement results demonstrate that the resolution can be better than 0.1 ps, and the accuracy is kept throughout aging/process variation. Additionally, a built-in aging adaption system based on Radic sensor is proposed to perform in-field aging adaption. Simulation results verify that, comparing with designs with fixed aging guardband, the proposed aging adaption system releases 80% of aging timing margin, saves silicon area by 1.02%-3.16% at most targeting frequencies, and prevents aging induced failure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 50. Compact Lateral Thermal Resistance Model of TSVs for Fast Finite-Difference Based Thermal Analysis of 3-D Stacked ICs

    Page(s): 1490 - 1502
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2182 KB) |  | HTML iconHTML  

    Thermal issue is the leading design constraint for 3-D stacked integrated circuits (ICs) and through silicon vias (TSVs) are used to effectively reduce the temperature of 3-D ICs. Normally, TSV is considered as a good thermal conductor in its vertical direction, and its vertical thermal resistance has been well modeled. However, lateral heat transfer of TSVs, which is also important, was largely ignored in the past. In this paper, we propose an accurate physics-based model for lateral thermal resistance of TSVs in terms of physical and material parameters, and study the conditions for model accuracy. For TSV arrays or farm, we show that the space or pitch between TSVs has a significant impact on TSV thermal behavior and should be properly considered in the TSV models. The proposed lateral thermal resistance model is fully compatible with the existing modeling approaches, and thus we could build a more accurate complete TSV thermal model. The new TSV thermal model can be easily integrated into a finite difference (FD) based thermal analysis framework to improve analysis efficiency. The accuracy of the model is validated against a commercial finite element tool-COMSOL. Experimental results show that the improved TSV thermal model (with proposed lateral thermal model) could greatly improve the accuracy of FD method in thermal simulation comparing with the existing method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The purpose of this Transactions is to publish papers of interest to individuals in the areas of computer-aided design of integrated circuits and systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

VIJAYKRISHNAN NARAYANAN
Pennsylvania State University
Dept. of Computer Science. and Engineering
354D IST Building
University Park, PA 16802, USA
vijay@cse.psu.edu