By Topic

Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

Issue 7 • Date July 2014

Filter Results

Displaying Results 1 - 20 of 20
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (84 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems society information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (131 KB)  
    Freely Available from IEEE
  • Multiple-Population Moment Estimation: Exploiting Interpopulation Correlation for Efficient Moment Estimation in Analog/Mixed-Signal Validation

    Page(s): 961 - 974
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (11571 KB) |  | HTML iconHTML  

    Moment estimation is an important problem during circuit validation, in both presilicon and postsilicon stages. From the estimated moments, the probability of failure and parametric yield can be estimated at each circuit configuration and corner, and these metrics are used for design optimization and making product qualification decisions. The problem is especially difficult if only a very small sample size is allowed for measurement or simulation, as is the case for complex analog/mixed-signal circuits. In this paper, we propose an efficient moment estimation method, called multiple-population moment estimation (MPME), that significantly improves estimation accuracy under small sample size. The key idea is to leverage the data collected under different corners/configurations to improve the accuracy of moment estimation at each individual corner/configuration. Mathematically, we employ the hierarchical Bayesian framework to exploit the underlying correlation in the data. We apply the proposed method to several datasets including postsilicon measurements of a commercial high-speed I/O link, and demonstrate an average error reduction of up to 2×, which can be equivalently translated to significant reduction of validation time and cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthesis of Dual-Rail Adiabatic Logic for Low Power Security Applications

    Page(s): 975 - 988
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2531 KB) |  | HTML iconHTML  

    Programmable reversible logic is emerging as a prospective logic design style for implementation in low power, low frequency applications where minimal impact on circuit heat generation is desirable, such as mitigation of differential power analysis attacks. Adiabatic logic is an implementation of reversible logic in CMOS where the current flow through the circuit is controlled such that the energy dissipation due to switching and capacitor dissipation is minimized. Recent advances in dual-rail adiabatic logic show reduction in average and differential power, making this design methodology advantageous in applications where security is the primary design metric and operating frequency is slower, such as Smart Cards. In this paper, we present an algorithm for synthesis of adiabatic circuits in CMOS. Then, using the ESPRESSO heuristic for minimization of Boolean functions method on each output node, we reduce the size of the synthesized circuit. Our approach correlates the horizontal offsets in the permutation matrix with the necessary switches required for synthesis instead of using a library of equivalent functions. The synthesis results show that, on average, the proposed algorithm represents an improvement of 36% over the best known reversible designs with the optimized dual-rail cell libraries. Then, we present an adiabatic S-box which significantly reduces energy imbalance compared to previous benchmarks. The design is capable of forward encryption and reverse decryption with minimal overhead, allowing for efficient hardware reuse. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Low Overhead Quasi-Delay-Insensitive (QDI) Asynchronous Data Path Synthesis Based on Microcell-Interleaving Genetic Algorithm (MIGA)

    Page(s): 989 - 1002
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (14301 KB) |  | HTML iconHTML  

    In this paper, we propose a design approach to mitigate the hardware overhead of the data completion detection circuit in quasi-delay-insensitive (QDI) asynchronous-logic circuits. In this proposed design approach, three novelties are highlighted. Firstly, a novel microcell-interleaving approach is proposed to reduce the number of completion detection (CD) circuits while retaining the required QDI attribute. Secondly, we analyze the performance of the QDI circuits based on the proposed microcell-interleaving approach graphically in terms of power dissipation, transistor count and delay, and evaluate/determine the upper and lower boundaries of these performance profiles. Thirdly, we propose a microcell-interleaving genetic algorithm (MIGA) to stochastically optimize the proposed microcell-interleaving approach on power dissipation, transistor count, and delay. To validate the proposed design approach, a complete performance profile of ISCAS-85 C499 circuit is investigated on the basis of differential cascode voltage switch logic (DCVSL) and dynamic strong indicating (DSI) microcells. We demonstrate the efficiency of the proposed design approach by benchmarking against the competing DCVSL, null convention logic and DSI designs on five ISCAS-85 circuits. Specifically, the proposed designs, on average, are 1.77 × better in power dissipation, 1.4 × better in area, and 1.58 × better in a composite metric of power × area × delay, and reasonably slower for the lowest power dissipation points. We further demonstrate the practicality of the proposed design approach by implementing an 8-tap 16-bit asynchronous QDI finite impulse response filter. Finally, we demonstrate the ~10% and ~11% improved efficiency of the proposed MIGA over the greedy algorithm and dynamic programming, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Four-Valued Reasoning and Cyclic Circuits

    Page(s): 1003 - 1016
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5891 KB) |  | HTML iconHTML  

    Allowing cycles in a logic circuit can be advantageous, for example, by reducing the number of gates required to implement a given Boolean function, or a set of functions. However, a cyclic circuit may easily be ill behaved. For instance, it may have some output wire oscillation instead of reaching a steady state. Propositional three-valued logic has long been used in tests for good behavior of cyclic circuits; a symbolic evaluation method known as ternary analysis provides one criterion for good behavior under certain assumptions about wire and gate delay. We revisit ternary analysis and argue for the use of four truth values. The fourth truth value allows for the distinction of undefined and underspecified behavior. Ability to under specify behavior is useful, because, in a quest for smaller circuits, an implementor can capitalize on degrees of freedom offered in the specification. Moreover, a fourth truth value is attractive because, rather than complicating (ternary) circuit analysis, it introduces a pleasant symmetry, in the form of contra-duality, as well as providing a convenient framework for manipulating specifications. We use this symmetry to provide fixed point results that clarify how two-, three-, and four-valued analyses are related, and to explain some observations about ternary analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accelerated Harmonic-Balance Analysis Using a Graphical Processing Unit Platform

    Page(s): 1017 - 1030
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8341 KB) |  | HTML iconHTML  

    This paper describes a new approach to accelerate the simulation of the steady-state response of nonlinear circuits using the harmonic-balance (HB) technique. The approach presented in this paper focuses on the direct factorization of the Jacobian matrix, of the HB nonlinear equations, using a graphical processing unit (GPU) platform. The computational core of the proposed approach is based on developing a block-wise version of the KLU factorization algorithm, where scalar arithmetic operations are replaced by block-aware matrix operations. For a large number of harmonics, or excitation tones, or both, the Block-KLU (BKLU) approach effectively raises the ratio of floating-point operations to other operations and, therefore, becomes an ideal vehicle for implementation on a GPU-based platform. Motivated by this fact, we develop a GPU-based framework to implement the BKLU. The proposed approach yields speedup by up to 89 times over conventional direct factorization on CPU. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stacking Signal TSV for Thermal Dissipation in Global Routing for 3-D IC

    Page(s): 1031 - 1042
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (13633 KB) |  | HTML iconHTML  

    With no further shrinking of device size, 3-D chip stacking by through-silicon-via (TSV) has been identified as an effective way to achieve better performance in speed and power. However, such solution inevitably encounters challenges in thermal dissipation since stacked dies generate a significant amount of heat per unit volume. We leverage an integrated design methodology of stacked-signal-TSVs to minimize temperature. Based on this structure, a three-stage TSV locating algorithm in global routing is designed. We demonstrate that our results, compared with baseline circuits, have 17% temperature reduction with 3% wiring overhead and no performance loss calculated by 3-D Elmore delay model. Compared with the paper by Cong and Zhang where additional thermal TSVs are inserted, our experimental results have in average 23% less TSVs with the same temperature constraint. Compared with the paper by Pathak and Lim, where movable signal TSVs are relocated to reduce temperature in hotspot regions, our result has 8% more temperature reduction with the same number of signal TSVs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning-Based Power Management for Multicore Processors via Idle Period Manipulation

    Page(s): 1043 - 1055
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (14324 KB) |  | HTML iconHTML  

    Learning-based dynamic power management (DPM) techniques, being able to adapt to varying system conditions and workloads, have attracted a lot of research attention recently. To the best of our knowledge, however, none of the existing learning-based DPM solutions are dedicated to power reduction in multicore processors, although they can be utilized by treating each processor core as a standalone entity and conducting DPM for them separately. In this paper, by including task allocation into our learning-based DPM framework for multicore processors, we are able to manipulate idle periods on processor cores to achieve a better tradeoff between power consumption and system performance. Experimental results show that the proposed solution significantly outperforms existing DPM techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Variability-Aware Adaptive Test Flow for Test Quality Improvement

    Page(s): 1056 - 1066
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9797 KB) |  | HTML iconHTML  

    In this paper, we propose a process-variability-aware adaptive test flow that realizes efficient and comprehensive detection of parametric faults. A parametric fault is essentially a malfunction in a large-scale integration chip, which is caused by the variability in fabrication processes. In our adaptive test framework, test pattern sets are altered on individual chips in order to apply the optimal set of test patterns for each chip, and thus the test coverage is improved and the test time is reduced. The test pattern is chosen on the basis of parameter estimations measured using an on-chip sensor with respect to statistical timing information. We also propose a novel metric to quantize the test coverage suitable for evaluating the test quality of parametric faults. Our experimental results using an industrial design show that the proposed flow significantly improves the parametric fault coverage and test efficiency compared to conventional test flows. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Test-Delivery Optimization in Manycore SOCs

    Page(s): 1067 - 1080
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (11326 KB) |  | HTML iconHTML  

    We present two test-data delivery optimization algorithms for system-on-chip (SoC) designs with hundreds of cores, where a network-on-chip (NoC) is used as the interconnection fabric. We first present an effective algorithm based on a subset-sum formulation to solve the test-delivery problem in NOCs with arbitrary topology that use dedicated routing. We further propose an algorithm for the important class of NOCs with grid topology and XY routing. The proposed algorithm is the first to cooptimize the number of access points, access-point locations, pin distribution to access points, and assignment of cores to access points for optimal test resource utilization of such NOCs. Test-time minimization is modeled as an NoC partitioning problem and solved with dynamic programming in polynomial time. Both the proposed methods yield high-quality results and are scalable to large SOCs with many cores. We present results on synthetic grid topology NoC-based SOCs constructed using cores from the ITC'02 benchmark, and demonstrate the scalability of our approach for two SOCs of the future, one with nearly 1000 cores and the other with 1600 cores. Test scheduling under power constraints is also incorporated in the optimization framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Hybrid Approach for Fast and Accurate Trace Signal Selection for Post-Silicon Debug

    Page(s): 1081 - 1094
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8846 KB) |  | HTML iconHTML  

    A major challenge in post-silicon debug is the lack of observability to the internal signals of a chip. Trace buffer technology provides one venue to address this challenge by online tracing of a few selected state elements. Due to the limited bandwidth of the trace buffer, only a few state elements can be selected for tracing. Recent research has focused on automated trace signal selection problem to maximize restoration of the untraced state elements using the few traced signals. Existing techniques can be categorized into high quality but slow simulation-based techniques and lower quality but much faster metric-based techniques. This paper presents a new trace signal selection technique which has comparable or better quality than simulation-based while it has a fast runtime, comparable to the metric-based techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selection of Functional Test Sequences With Overlaps

    Page(s): 1095 - 1099
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3616 KB) |  | HTML iconHTML  

    Functional test sequences may be generated for simulation-based design verification, and used as manufacturing tests or for speed binning. A class of earlier procedures selects functional test sequences for target faults from a set of available sequences in order to reduce the storage requirements and test application time. This paper describes a procedure that reduces the storage requirements further by using the selected sequences for producing additional sequences, which are referred to as overlaps. In an overlap, the first vectors of one selected sequence and the last vectors of another are combined. Overlaps thus combine initialization, fault activation and fault propagation conditions from two sequences to detect additional faults, making it unnecessary to select other sequences. Overlaps can also be used for increasing the fault coverage with respect to a fault model that was not targeted during the selection of the sequences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Global Maximum Error Controller-Based Method for Linearization Point Selection in Trajectory Piecewise-Linear Model Order Reduction

    Page(s): 1100 - 1104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6635 KB) |  | HTML iconHTML  

    We propose a new linearization point selection method based on a global maximum error controller for the trajectory piecewise-linear (TPWL) model order reduction (MOR). This method is based on a simple fact that the simulation cost of the TPWL model is very low, and selects the state at which the responses of the current TPWL model and the full nonlinear model have the maximum difference as a new linearization point. Numerical results show that the proposed method can generate the TPWL model of smaller size and higher accuracy, and can easily be extended to generate the TPWL model for multiple training inputs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient and Concurrent Reliable Realization of the Secure Cryptographic SHA-3 Algorithm

    Page(s): 1105 - 1109
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1716 KB) |  | HTML iconHTML  

    The secure hash algorithm (SHA)-3 has been selected in 2012 and will be used to provide security to any application which requires hashing, pseudo-random number generation, and integrity checking. This algorithm has been selected based on various benchmarks such as security, performance, and complexity. In this paper, in order to provide reliable architectures for this algorithm, an efficient concurrent error detection scheme for the selected SHA-3 algorithm, i.e., Keccak, is proposed. To the best of our knowledge, effective countermeasures for potential reliability issues in the hardware implementations of this algorithm have not been presented to date. In proposing the error detection approach, our aim is to have acceptable complexity and performance overheads while maintaining high error coverage. In this regard, we present a low-complexity recomputing with rotated operands-based scheme which is a step-forward toward reducing the hardware overhead of the proposed error detection approach. Moreover, we perform injection-based fault simulations and show that the error coverage of close to 100% is derived. Furthermore, we have designed the proposed scheme and through ASIC analysis, it is shown that acceptable complexity and performance overheads are reached. By utilizing the proposed high-performance concurrent error detection scheme, more reliable and robust hardware implementations for the newly-standardized SHA-3 are realized. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Paired Page Prebackup Scheme for MLC NAND Flash Memory

    Page(s): 1110 - 1114
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3186 KB) |  | HTML iconHTML  

    Multilevel cell (MLC) NAND flash memory is more cost effective compared with single-level cell NAND flash memory as it can store two or more bits in a memory cell. However, in MLC flash memory, a programming operation can corrupt the paired page under abnormal termination. In order to solve the paired page problem, a backup scheme is generally used, which inevitably causes performance degradation and shortens the lifespan of flash memory. In this paper, we propose a more efficient paired page prebackup scheme for MLC flash memory. It adaptively exploits interleaving, copyback operations, and parity data to reduce the prebackup overhead. In experiments, the proposed scheme reduced the backup overhead by up to 78%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Open Access

    Page(s): 1115
    Save to Project icon | Request Permissions | PDF file iconPDF (1156 KB)  
    Freely Available from IEEE
  • Together, we are advancing technology

    Page(s): 1116
    Save to Project icon | Request Permissions | PDF file iconPDF (398 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems publication information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (114 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (91 KB)  
    Freely Available from IEEE

Aims & Scope

The purpose of this Transactions is to publish papers of interest to individuals in the areas of computer-aided design of integrated circuits and systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

VIJAYKRISHNAN NARAYANAN
Pennsylvania State University
Dept. of Computer Science. and Engineering
354D IST Building
University Park, PA 16802, USA
vijay@cse.psu.edu