By Topic

Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

Issue 5 • Date May 2014

Filter Results

Displaying Results 1 - 19 of 19
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (83 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems society information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (131 KB)  
    Freely Available from IEEE
  • Statistical Analysis of MUX-Based Physical Unclonable Functions

    Page(s): 649 - 662
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6865 KB)  

    Physical unclonable functions (PUFs) can store secret keys in integrated circuits (ICs) by exploiting the uncontrollable randomness due to manufacturing process variations. These PUFs can be used for authentication of devices and for key generation in security applications. This paper presents a rigorous statistical analysis of various types of multiplexer-based (MUX-based) PUFs including the original MUX PUF, the feed-forward MUX PUFs, the modified feed-forward MUX PUFs, and multiplexer-demultiplexer (MUX/DeMUX) PUF. The modified feed-forward MUX PUF structure is a new structure that is introduced in this paper. Three types of feed-forward PUFs are analyzed in this paper. These include feed-forward overlap, feed-forward cascade and feed-forward separate. The performance analysis quantifies interchip and intrachip variations as a function of the number of stages, the process variation variance, the environmental noise variance, and the arbiter skew for different PUFs. Three other metrics of performance are also introduced and analyzed in this paper, which include reliability, uniqueness, and randomness. A PUF is more reliable if it has less intrachip variation. A PUF is more unique if the interchip variation is closer to 50%. A PUF is more random if its response bit is 0 or 1 with equal probability. Our statistical analysis shows that the intrachip variation is less dependent on the number of stages, N, if N is greater than ten. However, the interchip variation is dependent on N if N is less than 100. It is shown that the feed-forward PUFs have higher intrachip variation than MUX PUFs; however, the modified feed-forward PUFs have significantly lower intrachip variation than the feed-forward PUFs. It is shown that the modified feed-forward cascade MUX PUF has the best uniqueness and randomness, while the original MUX PUF has the best reliability. The analysis presented in this paper can be used by the designer to choose an appropriate PUF based on the application's- requirement. This eliminates the need for fabrication and testing of many PUFs for selecting an appropriate PUF. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Adaptive Pipelined MPSoCs for Multimedia Applications

    Page(s): 663 - 676
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (28677 KB)  

    Pipelined MPSoCs provide a high throughput implementation platform for multimedia applications. They are typically balanced at design-time considering worst-case scenarios so that a given throughput can be fulfilled at all times. Such worst-case pipelined MPSoCs lack runtime adaptability and result in inefficient resource utilization and high power/energy consumption under a dynamic workload. In this paper, we propose a novel adaptive architecture and a distributed runtime processor manager to enable runtime adaptation in pipelined MPSoCs. The proposed architecture consists of main processors and auxiliary processors, where a main processor uses differing number of auxiliary processors considering runtime workload variations. The runtime processor manager uses a combination of application's execution and knowledge, and offline profiling and statistical information to proactively predict the auxiliary processors that should be used by a main processor. The idle auxiliary processors are then deactivated using clock- or power-gating. Each main processor with a pool of auxiliary processors has its own runtime manager, which is independent of the other main processors, enabling a distributed runtime manager. Our experiments with an H.264 video encoder for HD720p resolution at 30 frames/s show that the adaptive pipelined MPSoC consumed up to 29% less energy (computed using processors and caches) than a worst-case pipelined MPSoC, while delivering a minimum of 28.75 frames/s. Our results show that adaptive pipelined MPSoCs can emerge as an energy-efficient implementation platform for advanced multimedia applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RESI: Register-Embedded Self-Immunity for Reliability Enhancement

    Page(s): 677 - 690
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (23989 KB)  

    Technology scaling in the nano-CMOS era has reached a point where coping with the failures produced by soft errors has become one of the key challenges when it comes to reliability. Akin to the fact that a register file is accessed more frequently than any other architectural component, register file protection is imperative to obstruct errors from propagating throughout a computing system. Furthermore, negative bias temperature instability (NBTI) has emerged as a major concern due to its negative impact on the lifetime of pMOS devices. Indeed, many of the pMOS transistors most affected by NBTI are in the register files as they are implemented as SRAM, which are particularly vulnerable due to their small structure size. Based on our observation that some register bits are not continuously used to represent a value stored in a register, we present a technique that exploits unused bits to improve the register file immunity against soft errors and mitigate NBTI effects. We show that our technique can reduce, on average, the register file vulnerability against multiple bit upsets by 97% (up to 100%), resulting in a high system fault coverage under various scenarios, while consuming less power and still occupying a similar area footprint compared to protecting the register file against single bit upsets (SBUs) only. The achieved result is 63% better compared to the state-of-the-art in register file protection. To compare and quantify the effect of our technique, we observe its impact on the processor's temperature using an infrared thermal camera and show that, due to consuming less power per area, our technique also operates at a lower temperature compared to protecting the register file against SBUs only. Finally, we investigate how our technique additionally moderates the stress induced by NBTI in register file SRAM cells. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design Framework to Overcome Aging Degradation of the 16 nm VLSI Technology Circuits

    Page(s): 691 - 703
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (20219 KB) |  | HTML iconHTML  

    Intensive scaling for VLSI circuits is a key factor for gaining outstanding performance. However, this scaling has huge negative impact on the circuit reliability, as it increases the undesired effect of aging degradation on ultradeep submicrometer technologies. Nowadays, Bias Temperature Instability (BTI) aging process has a major negative impact on VLSI circuits reliability. This paper presents a comprehensive framework that assists in designing the fortified VLSI circuits against BTI aging degradation. The framework contains: 1) the novel circuit level techniques that eliminate the effect of BTI (these techniques successfully decrease the power dissipation by 36% and enhance the reliability of VLSI circuits); 2) the evaluation of the reliability of all circuit level techniques used to eliminate BTI aging degradation for 16 nm CMOS technology; and 3) the comparison between the efficiency of all circuit level techniques in terms of power consumption and area. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Aging-Aware Design of Microprocessor Instruction Pipelines

    Page(s): 704 - 716
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (13550 KB)  

    As complementary metal-oxide-semiconductor technologies enter nanometer scales, microprocessors become more vulnerable to transistor aging, mainly due to bias temperature instability and hot carrier injection. These phenomena lead to increasing device delays during the operational lifetime, which result in growing delays of the instruction pipeline stages. However, the aging rates of different stages are different. Hence, a previously delay-balanced pipeline becomes increasingly imbalanced resulting in a non-optimized design in terms of lifetime [i.e., mean time to failure (MTTF)], frequency, area, and power consumption. In this paper, we propose an aging-aware, MTTF-balanced pipeline design, in which the pipeline stage delays are balanced at the desired lifetime rather than at design time. This can lead to significant MTTF (lifetime) improvements as well as additional performance, area, and power benefits. Our experimental results show that for two different microprocessors, MTTF can be extended by at least 2.3 times while achieving an additional 10% energy improvement with no penalty on delay and area. If the demand for performance is higher than that for a longer MTTF, it is also possible to improve the clock frequency by 2%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical Criticality Computation Using the Circuit Delay

    Page(s): 717 - 727
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7046 KB)  

    The statistical nature of gate delays in current day technologies necessitates the use of measures, such as path criticality and node/edge criticality for timing optimization. Node criticalities are typically computed using the complementary path delay. An alternative approach to compute the criticality using the circuit delay has been recently proposed. In this paper, we discuss in detail, the use of circuit delay to compute node criticalities and show that the criticality thus found is not equal to the conventional measure found using complementary path delay. However, there is a monotonic relationship between them and the two measures can be used interchangeably. We derive new bounds for the global criticality and propose a pruning algorithm based on these bounds to improve the accuracy and speed of computation. The use of this pruning technique results in a significant speedup in criticality computations. We obtain an order of magnitude average speedup for ISCAS benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Calculation of Generalized Polynomial-Chaos Basis Functions and Gauss Quadrature Rules in Hierarchical Uncertainty Quantification

    Page(s): 728 - 740
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9780 KB)  

    Stochastic spectral methods are efficient techniques for uncertainty quantification. Recently they have shown excellent performance in the statistical analysis of integrated circuits. In stochastic spectral methods, one needs to determine a set of orthonormal polynomials and a proper numerical quadrature rule. The former are used as the basis functions in a generalized polynomial chaos expansion. The latter is used to compute the integrals involved in stochastic spectral methods. Obtaining such information requires knowing the density function of the random input a-priori. However, individual system components are often described by surrogate models rather than density functions. In order to apply stochastic spectral methods in hierarchical uncertainty quantification, we first propose to construct physically consistent closed-form density functions by two monotone interpolation schemes. Then, by exploiting the special forms of the obtained density functions, we determine the generalized polynomial-chaos basis functions and the Gauss quadrature rules that are required by a stochastic spectral simulator. The effectiveness of our proposed algorithm is verified by both synthetic and practical circuit examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • E-Beam Lithography Character and Stencil Co-Optimization

    Page(s): 741 - 751
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7834 KB)  

    Electron-beam maskless lithography is being actively explored by the semiconductor industry for chip production in the sub-22 nm regime. Character projection allows in one e-beam shot the printing of complex pattern rather than merely a single rectangle or triangle as in variable-shaped beam projection. However, those circuit patterns that do not match any character on the stencil still have to be written by variable-shaped beam projection. We investigate a new problem of character and stencil co-optimization with blank space sharing between characters so as to minimize the total number of shots required for printing a circuit. We exploit the fact that the blank spaces on the sides of a character can be adjusted by moving the pattern to be printed within its projection region to facilitate blank space sharing so as to pack more characters into the stencil. Even though the co-optimization problem is shown to be NP-complete, we are able to design an elegant approximation algorithm, CASCO. Experiments confirm that the solutions by CASCO are nearly optimal. Compared to the published state-of-the-art, CASCO reduces the shot count by 1.59 times, while it is also orders of magnitude faster. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time-Scalable Mapping for Circuit-Switched GALS Chip Multiprocessor Platforms

    Page(s): 752 - 762
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (13027 KB)  

    We study the problem of mapping concurrent tasks of an application to cores of a chip multiprocessor that utilize circuit-switched interconnect and global asynchronous local synchronous (GALS) clocking domains. We develop a configurable algorithm that naturally handles a number of practical requirements, such as architectural features of the target platform, core failures, and hardware accelerators, and in addition, is scalable to a large number of tasks and cores. Experiments with several real life applications show that our algorithm outperforms manual mapping, integer linear programming-based mapping after ten days of solver run time, and a recent packet-switched network on chip-based task mapper through which, we underscore the unique requirements of task mapping for circuit-switched GALS architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Thermal-Aware On-Line Scheduler for 3-D Many-Core Processor Throughput Optimization

    Page(s): 763 - 773
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (16058 KB)  

    3-D many-core processor (3-D MCP) has become an emerging technology to tackle the power wall problem due to rapidly increasing number of transistors. However, when maximizing the throughput of 3-D MCP, which is expressed as a weighted sum of the speeds, due to the inherent heat removal limitation, thermal issues must be taken into consideration. Since the temperature of a core strongly depends on its location in the 3-D IC, a proper task allocation can alleviate the thermal problem and improve the throughput. Nevertheless, conventional techniques require computationally intensive thermal simulation, which prohibits its usage from the online application. In this paper, we propose an efficient online task allocation and task migration algorithm attempting to maximize the throughput of 3-D MCP simultaneously, considering unfinished tasks left from the last scheduling interval and new incoming tasks of this scheduling interval. The results of our experiments show that our proposed method achieves a 20.82X runtime speedup. These results are comparable to the exhaustive solutions obtained from optimization-modeling software LINGO. In addition, on average, our throughput results, with and without consideration of unfinished tasks, are only 4.39% and 0.69% worse, respectively, than that of the exhaustive method. In 128 task-to-core allocations, our method takes only 0.951 ms, which is 59.39 times faster than that of the previous work. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Contactless Pre-Bond TSV Test and Diagnosis Using Ring Oscillators and Multiple Voltage Levels

    Page(s): 774 - 785
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (22468 KB)  

    Defects in through-silicon vias (TSVs) due to fabrication steps decrease the yield and reliability of 3-D stacked integrated circuits, hence these defects need to be screened early in the manufacturing flow. Before wafer thinning, TSVs are buried in silicon and cannot be mechanically contacted, which severely limits the test access. Although TSVs become exposed after wafer thinning, probing on them is difficult because of TSV dimensions and the risk of probe-induced damage. To circumvent these problems, we propose a non-invasive method for pre-bond TSV test that does not require TSV probing. We use open TSVs as capacitive loads of their driving gates and measure the propagation delay by means of ring oscillators. Defects in TSVs cause variations in their resistor-capacitor parameters and therefore lead to variations in the propagation delay. By measuring these variations, we can detect the resistive open and leakage faults. We exploit different voltage levels to increase the sensitivity of the test and its robustness against random process variations. We provide a method to create a regression model to predict the defect size for a given measured period period of the ring oscillator, and a method for accuracy analysis. Results on fault detection effectiveness are presented through HSPICE simulations using realistic models for a 45 nm CMOS technology. The estimated design for testability area cost of our method is negligible for realistic dies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Fuse Architecture and a New Post-Share Redundancy Scheme for Yield Enhancement in 3-D-Stacked Memories

    Page(s): 786 - 797
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (23867 KB)  

    3-D-stacked memory using through-silicon-vias (TSVs) has emerged as a good alternative for overcoming the limitation of 2-D memory technology. Among many issues with 3-D-stacked memory, yield is one of the major challenges for mass production. This paper proposes a new fuse architecture and redundancy scheme to improve the yield of 3-D-stacked memories. The new fuse architecture is developed based on the fact that the unused redundancies in prebond repair cause the inefficiency. Therefore, the new fuse architecture provides a way to share redundancies in prebond and postbond repairs. There are two kinds of operation modes. One is an enable mode for collecting the used redundancy information. The other is a mask mode for obtaining faulty redundancy information using a short test algorithm. Using the new fuse architecture, a new redundancy scheme called the post-share scheme is developed to achieve optimal yield. The post-share scheme allocates the fixed number of spare rows and columns for each repair just like other schemes. However, only allocated redundancies are used in prebond repair, while both the redundancies allocated for postbond repair and unused redundancies in prebond repair can be used for postbond repair. Experimental results show that the post-share redundancy scheme significantly increases the final yield of 3-D-stacked memories and the increase of area overhead is small. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Variation-Aware Delay Fault Simulation Methodology for Resistive Open and Bridge Defects

    Page(s): 798 - 810
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (13360 KB)  

    SPICE offers an accurate method of simulating defect behavior. However, as demonstrated by recent research, it requires long computation time to simulate defect behavior, when considering process variation. To the best of our knowledge, there is no efficient variation-aware delay fault simulation methodology for resistive opens and resistive bridges. This paper presents a fast and accurate delay fault simulation methodology for these two defects. It is fast because it speeds up delay fault computation time by employing two efficient algorithms. The first algorithm is used to calculate transient gate output voltage, which is a key variable needed to compute delay faults. It employs a three step strategy to accelerate the computation of transient gate output voltage without compromising accuracy. The second algorithm uses bisection method to efficiently compute delay fault behavior of a fault-site. The proposed methodology (PM) has been incorporated in an open-source SPICE (NGSPICE) with BSIM4.7 transistor model. The methodology has been validated by comparing results with HSPICE using industrial designs from IWLS 2005 benchmarks and realistic fault-sites have been extracted from synthesized designs. Simulations are carried out using a 65-nm gate library (for illustration). When compared with HSPICE, results show that the PM is on average up to 52-times faster with ≤ 4.2% error in accuracy for resistive open and 39-times faster with ≤ 5.2% error in accuracy for resistive bridge defects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2014 IEEE compound semiconductor IC symposium

    Page(s): 811
    Save to Project icon | Request Permissions | PDF file iconPDF (889 KB)  
    Freely Available from IEEE
  • IEEE Open Access Publishing

    Page(s): 812
    Save to Project icon | Request Permissions | PDF file iconPDF (1156 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems publication information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (114 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (91 KB)  
    Freely Available from IEEE

Aims & Scope

The purpose of this Transactions is to publish papers of interest to individuals in the areas of computer-aided design of integrated circuits and systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

VIJAYKRISHNAN NARAYANAN
Pennsylvania State University
Dept. of Computer Science. and Engineering
354D IST Building
University Park, PA 16802, USA
vijay@cse.psu.edu