Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic
Skip to Results

Search Results

You searched for: (("Authors":"marculescu, d") OR "Authors":"diana marculescu")
116 Results returned
Skip to Results
  • Save this Search
  • Download Citations Disabled
  • Save To Project
  • Email
  • Print
  • Export Results
  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Distributed reinforcement learning for power limited many-core system performance optimization

    Zhuo Chen ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015

    Publication Year: 2015 , Page(s): 1521 - 1526

    IEEE Conference Publications

    As power density emerges as the main constraint for many-core systems, controlling power consumption under the Thermal Design Power (TDP) while maximizing the performance becomes increasingly critical. To dynamically save power, Dynamic Voltage Frequency Scaling (DVFS) techniques have proved to be effective and are widely available commercially. In this paper, we present an On-line Distributed Reinforcement Learning (OD-RL) based DVFS control algorithm for many-core system performance improvement under power constraints. At the finer grain, a per-core Reinforcement Learning (RL) method is used to learn the optimal control policy of the Voltage/Frequency (VF) levels in a system model-free manner. At the coarser grain, an efficient global power budget reallocation algorithm is used to maximize the overall performance. The experiments show that compared to the state-of-the-art algorithms: 1) OD-RL produces up to 98% less budget overshoot, 2) up to 44.3x better throughput per over-the-budget energy and up to 23% higher energy efficiency, and 3) two orders of magnitude speedup over state-of-the-art techniques for systems with hundreds of cores. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Procrustes: Power Constrained Performance Improvement Using Extended Maximize-then-Swap Algorithm

    Liu, G. ; Park, J. ; Marculescu, D.
    Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

    Volume: PP , Issue: 99
    DOI: 10.1109/TCAD.2015.2421911
    Publication Year: 2015 , Page(s): 1

    IEEE Early Access Articles

    This paper proposes an efficient algorithm that maximizes performance under power constraints and is applicable in the general context of traditional dynamic voltage/frequency scaling, or core heterogeneity and emerging dynamic microarchitectural adaptation. Performance maximization in these scenarios can be essentially viewed as mapping application threads to appropriate core states that have various power/performance characteristics. Such problems are formulated as a generic 0- 1 integer linear program (ILP). The proposed algorithm is an iterative heuristic-based solution. Compared with an optimal solution generated by commercial ILP solver, the proposed algorithm produces results less than 1% away from optimum on average, with more than two orders of magnitude improvement in runtime. The algorithm can be brought online for hundredcore heterogeneous systems as it scales to systems comprised of 256 cores with less than one millisecond in overhead in worst cases. The intrinsic history awareness also provides flexibility to control cost induced by switching voltage/frequency pairs, migrating threads across cores or tuning on/off micro-architectural resources. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    A comprehensive and accurate latency model for Network-on-Chip performance analysis

    Zhiliang Qian ; Da-Cheng Juan ; Bogdan, P. ; Chi-Ying Tsui ; Marculescu, D. ; Marculescu, R.
    Design Automation Conference (ASP-DAC), 2014 19th Asia and South Pacific

    DOI: 10.1109/ASPDAC.2014.6742910
    Publication Year: 2014 , Page(s): 323 - 328

    IEEE Conference Publications

    In this work, we propose a new, accurate, and comprehensive analytical model for Network-on-Chip (NoC) performance analysis. Given the application communication graph, the NoC architecture, and the routing algorithm, the proposed framework analyzes the links dependency and then determines the ordering of queuing analysis for performance modeling. The channel waiting times in the links are estimated using a generalized G/G/1/K queuing model, which can tackle bursty traffic and dependent arrival times with general service time distributions. The proposed model is general and can be used to analyze various traffic scenarios for NoC platforms with arbitrary buffer and packet lengths. Experimental results on both synthetic and real applications demonstrate the accuracy and scalability of the newly proposed model. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Energy-efficient VFI-partitioned multicore design using wireless NoC architectures

    Kim, R. ; Guangshuo Liu ; Wettin, P. ; Marculescu, R. ; Marculescu, D. ; Pande, P.P.
    Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2014 International Conference on

    DOI: 10.1145/2656106.2656120
    Publication Year: 2014 , Page(s): 1 - 9

    IEEE Conference Publications

    In recent years, multiple Voltage Frequency Island (VFI)-based designs have increasingly made their way into both commercial and research multicore platforms. On the other hand, the wireless Network-on-Chip (WiNoC) architecture has emerged as an energy-efficient and high bandwidth communication backbone for massively integrated multicore platforms. It becomes therefore possible to exploit the small-world effects induced by the wireless links of a WiNoC to achieve efficient inter-VFI data exchanges. In this work, we demonstrate that WiNoCs can provide better latency and energy profiles compared to traditional mesh-like architecture for VFI-partitioned multicore designs. The performance gains and energy efficiency are achieved due to the low-power wireless shortcuts in conjunction with the small-world architecture. Indeed, our experimental results show energy improvements as large as 40% for multithreaded application benchmarks. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    The EDA challenges in the dark silicon era

    Shafique, M. ; Garg, S. ; Henkel, J. ; Marculescu, D.
    Design Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE

    DOI: 10.1145/2593069.2593229
    Publication Year: 2014 , Page(s): 1 - 6

    IEEE Conference Publications

    Technology scaling has resulted in smaller and faster transistors in successive technology generations. However, transistor power consumption no longer scales commensurately with integration density and, consequently, it is projected that in future technology nodes it will only be possible to simultaneously power on a fraction of cores on a multi-core chip in order to stay within the power budget. The part of the chip that is powered off is referred to as dark silicon and brings new challenges as well as opportunities for the design community, particularly in the context of the interaction of dark silicon with thermal, reliability and variability concerns. In this perspectives paper we describe these new challenges and opportunities, and provide preliminary experimental evidence in their support. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Power-Planning-Aware Soft Error Hardening via Selective Voltage Assignment

    Kai-Chiang Wu ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 22 , Issue: 1
    DOI: 10.1109/TVLSI.2012.2236658
    Publication Year: 2014 , Page(s): 136 - 145

    IEEE Journals & Magazines

    Soft errors, which have been a significant concern in memories, are now a main factor in reliability degradation of logic circuits. This paper presents a power-planning-aware methodology using dual supply voltages for soft error hardening. Given a constraint on power overhead, our proposed framework can minimize the soft error rate (SER) of a circuit via selective voltage assignment. In the 70-nm predictive technology model, circuit SER can be reduced by 23% on top of SER-aware gate resizing. For power-planning awareness, a bi-partitioning technique based on a simplified version of the Fiduccia-Mattheyses (FM) algorithm is presented. The simplified FM-based partitioning refines the result of selective voltage assignment by decreasing the number of connections across voltage islands, while maintaining the SER reduction that has been accomplished. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    SLIC: Statistical learning in chip

    Blanton, R.D. ; Xin Li ; Ken Mai ; Marculescu, D. ; Marculescu, R. ; Paramesh, J. ; Schneider, J. ; Thomas, D.E.
    Integrated Circuits (ISIC), 2014 14th International Symposium on

    DOI: 10.1109/ISICIR.2014.7029574
    Publication Year: 2014 , Page(s): 119 - 123

    IEEE Conference Publications

    Despite best efforts, integrated systems are “born” (manufactured) with a unique `personality' that stems from our inability to precisely fabricate their underlying circuits, and create software a priori for controlling the resulting uncertainty. It is possible to use sophisticated test methods to identify the best-performing systems but this would result in unacceptable yields and correspondingly high costs. The system personality is further shaped by its environment (e.g., temperature, noise and supply voltage) and usage (i.e., the frequency and type of applications executed), and since both can fluctuate over time, so can the system's personality. Systems also “grow old” and degrade due to various wear-out mechanisms (e.g., negative-bias temperature instability), and unexpectedly due to various early-life failure sources. These “nature and nurture” influences make it extremely difficult to design a system that will operate optimally for all possible personalities. To address this challenge, we propose to develop statistical learning in-chip (SLIC). SLIC is a holistic approach to integrated system design based on continuously learning key personality traits on-line, for self-evolving a system to a state that optimizes performance hierarchically across the circuit, platform, and application levels. SLIC will not only optimize integrated-system performance but also reduce costs through yield enhancement since systems that would have before been deemed to have weak personalities (unreliable, faulty, etc.) can now be recovered through the use of SLIC. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    A Low-Cost, Systematic Methodology for Soft Error Robustness of Logic Circuits

    Kai-Chiang Wu ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 21 , Issue: 2
    DOI: 10.1109/TVLSI.2012.2184145
    Publication Year: 2013 , Page(s): 367 - 379
    Cited by:  Papers (4)

    IEEE Journals & Magazines

    Due to current technology scaling trends such as shrinking feature sizes and decreasing supply voltages, circuit reliability is becoming more susceptible to radiation-induced transient faults (soft errors). Soft errors, which have been a great concern in memories, are now a main factor in reliability degradation of logic circuits as well. In this paper, we present a systematic and integrated methodology for circuit robustness to soft errors. The proposed soft error rate (SER) reduction framework, based on redundancy addition and removal (RAR), aims at eliminating those gates with large contribution to the overall SER. Several metrics and constraints are introduced to guide the RAR-based approach toward SER reduction. Furthermore, we integrate a resizing strategy into our framework, as post-RAR additive SER optimization. The strategy can identify most critical gates to be upsized and thereby, minimize area and power overheads while maintaining a high level of soft error robustness. Experimental results show that the proposed RAR-based framework can achieve up to 70% reduction in output failure probability. On average, about 23% SER reduction is obtained with less than 4% area overhead. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Impact of manufacturing process variations on performance and thermal characteristics of 3D ICs: Emerging challenges and new solutions

    Da-Cheng Juan ; Garg, S. ; Marculescu, D.
    Circuits and Systems (ISCAS), 2013 IEEE International Symposium on

    DOI: 10.1109/ISCAS.2013.6571900
    Publication Year: 2013 , Page(s): 541 - 544

    IEEE Conference Publications

    Manufacturing process variations have become an important concern in the design of integrated circuits (IC) in the nanometer era. Process variations result in variability in the performance, power and thermal characteristics of ICs and, as a result, parametric yield loss. In this paper, we examine how process variation impact the 3D ICs compared to their planar (or 2D) counterparts. Using analytical models and empirical evaluation, we show that both from a clock frequency and thermal perspective, 3D ICs are worse impacted by process variations than their equivalent 2D implementations. While conventional variability mitigation techniques can be used to increase the resilience of 3D ICs to process variations, there are new opportunities for variability mitigation that are unique to 3D integration. In particular, in a die-to-die 3D bonding process, the decision of which die from one tier are bonded with which die from another can be made post fabrication after the bare die have been tested and assigned to frequency and leakage bins. In addition, for symmetric 3D design, it is additionally possible to decide the die stacking order for each 3D chip post manufacturing. We show that this flexibility in the bonding process can, in fact, result in significant performance and thermal yield improvement for 3D ICs. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Learning the optimal operating point for many-core systems with extended range voltage/frequency scaling

    Da-Cheng Juan ; Garg, S. ; Jinpyo Park ; Marculescu, D.
    Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2013 International Conference on

    DOI: 10.1109/CODES-ISSS.2013.6658995
    Publication Year: 2013 , Page(s): 1 - 10
    Cited by:  Papers (1)

    IEEE Conference Publications

    Near-Threshold Computing (NTC) has emerged as a solution that promises to significantly increase the energy efficiency of next-generation multi-core systems. This paper evaluates and analyzes the behavior of dynamic voltage and frequency scaling (DVFS) control algorithms for multi-core systems operating under near-threshold, nominal, or turbo-mode conditions. We adapt the model selection technique from machine learning to learn the relationship between performance and power. The theoretical results show that the resulting models satisfy convexity properties essential to efficiently determining optimal voltage/frequency operating points for minimizing energy consumption under throughput constraints or maximizing throughput under a given power budget. Our experimental results show that, compared with DVFS in the conventional operating range, extended range DVFS control including turbo-mode and near-threshold operation achieves an additional (1) 13.28% average energy reduction under isoperformance conditions, and (2) 7.54% average throughput increase under iso-power conditions. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Hardware-efficient stereo estimation using a residual-based approach

    Sharma, A.A. ; Neelathalli, K. ; Marculescu, D. ; Nurvitadhi, E.
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

    DOI: 10.1109/ICASSP.2013.6638145
    Publication Year: 2013 , Page(s): 2693 - 2696

    IEEE Conference Publications

    Many promising embedded computer vision applications, such as stereo estimation, rely on inference computation on Markov Random Fields (MRFs). Sequential Tree-Reweighted Message passing (TRW-S) is a superior MRF solving method, which provides better convergence and energy than others (e.g., belief propagation). Since software TRW-S solvers are slow, custom TRW-S hardware has been proposed to improve execution efficiency. This paper proposes hardware mechanisms to further optimize TRW-S hardware efficiency, by tracking differences in input message values (residues) and skipping computation when values no longer change (residue is zero). Evaluations of our hardware mechanisms using Middlebury benchmark show 1.6x to 6x potential reduction in computation (depending on design parameters) while increasing energy by only 0.4% to 4.8%. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Dynamic behavior of cell signaling networks - Model design and analysis automation

    Miskov-Zivanov, N. ; Marculescu, D. ; Faeder, J.R.
    Design Automation Conference (DAC), 2013 50th ACM / EDAC / IEEE

    Publication Year: 2013 , Page(s): 1 - 6

    IEEE Conference Publications

    Recent work has presented logical models and showed the benefits of applying logical approaches to studying the dynamics of biological networks. In this work, we develop a methodology for automating the design of such models by utilizing methods and algorithms from the field of electronic design automation. We anticipate that automated discrete model development will greatly improve the efficiency of qualitative analysis of biological networks. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    “Scaling” the impact of EDA education Preliminary findings from the CCC workshop series on extreme scale design automation

    Bahar, I. ; Jones, A.K. ; Katkoori, S. ; Madden, P.H. ; Marculescu, D. ; Markov, I.L.
    Microelectronic Systems Education (MSE), 2013 IEEE International Conference on

    DOI: 10.1109/MSE.2013.6566706
    Publication Year: 2013 , Page(s): 64 - 67

    IEEE Conference Publications

    The breakdown of Dennard scaling implies radical changes in the design, integration, manufacturing and deployment of new electronic systems. These changes, along with labor-force and macro-economic trends, undermine the status quo in the semiconductor and electronic design automation (EDA) fields. Of particular concern is a fairly static and aging workforce and a decline in new students interested in these fields. Recognizing the dramatic changes afoot, a series of Computing Community Consortium (CCC) sponsored workshops have been organized to identify key steps to take, to secure the future growth of the electronics industry. This paper shares some preliminary findings from the first of these workshops emphasizing challenges in finding and preparing the next generation of electronic design professionals. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems

    Guangshuo Liu ; Jinpyo Park ; Marculescu, D.
    Computer Design (ICCD), 2013 IEEE 31st International Conference on

    DOI: 10.1109/ICCD.2013.6657025
    Publication Year: 2013 , Page(s): 54 - 61
    Cited by:  Papers (1)

    IEEE Conference Publications

    This paper addresses the problem of dynamic thread mapping in heterogeneous many-core systems via an efficient algorithm that maximizes performance under power constraints. Heterogeneous many-core systems are composed of multiple core types with different power-performance characteristics. As well documented in the literature, the generic mapping problem is an NP-complete problem which can be formulated as a 0-1 integer linear program, therefore, prohibitively expensive to solve optimally in an online scenario. However, in real applications, thread mapping decisions need to be responsive to workload phase changes. This paper proposes an iterative approach bounding the runtime as O(n2/m), for mapping multi-threaded applications on n cores comprising of m core types. Compared with an optimal solution, the proposed algorithm produces results less than 0.6% away from optimum on average, with two orders of magnitude improvement in runtime. Results show that performance improvement can reach 16% under iso-power constraints compared to a random mapping. The algorithm can be brought online for hundred-core heterogeneous systems as it scales to systems comprised of 256 cores with less than one millisecond in overhead. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    HaDeS: Architectural synthesis for heterogeneous dark silicon chip multi-processors

    Turakhia, Y. ; Raghunathan, B. ; Garg, S. ; Marculescu, D.
    Design Automation Conference (DAC), 2013 50th ACM / EDAC / IEEE

    Publication Year: 2013 , Page(s): 1 - 7
    Cited by:  Papers (4)

    IEEE Conference Publications

    In this paper, we propose an efficient iterative optimization based approach for architectural synthesis of dark silicon heterogeneous chip multi-processors (CMPs). The goal is to determine the optimal number of cores of each type to provision the CMP with, such that the area and power budgets are met and the application performance is maximized. We consider general-purpose multi-threaded applications with a varying degree of parallelism (DOP) that can be set at run-time, and propose an accurate analytical model to predict the execution time of such applications on heterogeneous CMPs. Our experimental results illustrate that the synthesized heterogeneous dark silicon CMPs provide between 19% to 60% performance improvements over conventional homogeneous designs for variable and fixed DOP scenarios, respectively. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Mitigating the Impact of Process Variation on the Performance of 3-D Integrated Circuits

    Garg, S. ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 21 , Issue: 10
    DOI: 10.1109/TVLSI.2012.2226762
    Publication Year: 2013 , Page(s): 1903 - 1914

    IEEE Journals & Magazines

    Three-dimensional die-stacking architectures have been proposed as a promising solution to the increasing interconnect delay that is observed in scaled technologies. Although prior research has extensively evaluated the performance advantage of moving from a 2-D to a 3-D design style, the impact of process parameter variations on 3-D designs has not been studied in detail. In this paper, we attempt to bridge this gap by proposing a variability-aware design framework for fully synchronous (FS) and multiple clock-domain (MCD) 3-D systems. To mitigate the impact of process variations on 3-D designs, we propose the variability-aware 3-D integration strategy for MCD 3-D systems that maximizes the probability of the design meeting specified system performance constraints. The proposed optimization strategy is shown to significantly outperform the FS and MCD 3-D implementations that are conventionally assembled, for example, the MCD designs assembled with the proposed integration strategy provide, on average, 44% and 16.33% higher absolute yield than the FS and conventional MCD designs, respectively, at the 50% yield point of the conventional MCD designs. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    System-Level Leakage Variability Mitigation for MPSoC Platforms Using Body-Bias Islands

    Garg, Siddharth ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 20 , Issue: 12
    DOI: 10.1109/TVLSI.2011.2171512
    Publication Year: 2012 , Page(s): 2289 - 2301
    Cited by:  Papers (3)

    IEEE Journals & Magazines

    Adaptive body biasing (ABB) is a popularly used technique to mitigate the increasing impact of manufacturing process variations on leakage power dissipation. The efficacy of the ABB technique can be improved by partitioning a design into a number of “body-bias islands,” each with its individual body-bias voltage. In this paper, we propose a system-level leakage variability mitigation technique to partition a multiprocessor system into body-bias islands at the processing element (PE) granularity at design time, and to optimally assign body-bias voltages to each island post-fabrication. As opposed to prior gate- and circuit-level partitioning techniques that constrain the global clock frequency of the system, we allow each island to run at a different speed and constrain only the relevant system performance metrics - in our case the execution deadlines. Experimental results show the efficacy of the proposed methodology; we demonstrate up to 40% and 60% reduction in the mean and standard deviation of leakage power dissipation respectively, compared to a baseline system without ABB. Furthermore, the proposed design-time partitioning is, on average, 38× faster than a previously proposed Monte Carlo-based technique, while providing similar reductions in leakage power dissipation. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Exploiting Process Variability in Voltage/Frequency Control

    Herbert, S. ; Garg, Siddharth ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 20 , Issue: 8
    DOI: 10.1109/TVLSI.2011.2160001
    Publication Year: 2012 , Page(s): 1392 - 1404
    Cited by:  Papers (9)

    IEEE Journals & Magazines

    Fine-grained dynamic voltage/frequency scaling (DVFS) is an important tool in managing the balance between power and performance in chip-multiprocessors. Although manufacturing process variations are giving rise to significant core-to-core variations in power and performance, traditional DVFS controllers are unaware of these variations. Exploiting the different power profiles of the cores can significantly improve energy efficiency. Process variations do not significantly affect dynamic power, so less-leaky processing units are more energy-efficient than their leakier counterparts at a given supply voltage and frequency. Taking advantage of this observation, three existing DVFS control algorithms are modified to shift work from inefficient, leaky processing units to efficient, less leaky ones, maintaining performance while reducing total power consumption. This work-shifting is carried out both between dies in a given speed bin and between voltage/frequency islands on a given die. The gains enabled by incorporating variability-awareness into the three DVFS algorithms are demonstrated on both multithreaded and multiprogrammed workloads. For a baseline 16-core design with per-core voltage/frequency islands (VFIs) and a 4×4 mesh on-chip network, the aggregate power per squared throughput (power/throughput2 or P/T2) over all fabricated dies is reduced by 9.2%, 5.7%, and 7.7% for the three controllers. Chip multiprocessor designs using other VFI granularities and network topologies are also examined. View full abstract»

  • Freely Available from IEEE

    Guest Editorial Special Section on PAR-CAD: Parallel CAD Algorithms and CAD for Parallel Architectures/Systems

    Marculescu, D. ; Li, P.
    Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

    Volume: 31 , Issue: 1
    DOI: 10.1109/TCAD.2011.2175038
    Publication Year: 2012 , Page(s): 7 - 8

    IEEE Journals & Magazines

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Mitigating lifetime underestimation: A system-level approach considering temperature variations and correlations between failure mechanisms

    Kai-Chiang Wu ; Ming-Chao Lee ; Marculescu, D. ; Shih-Chieh Chang
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012

    DOI: 10.1109/DATE.2012.6176687
    Publication Year: 2012 , Page(s): 1269 - 1274

    IEEE Conference Publications

    Lifetime (long-term) reliability has been a main design challenge as technology scaling continues. Time-dependent dielectric breakdown (TDDB), negative bias temperature instability (NBTI), and electromigration (EM) are some of the critical failure mechanisms affecting lifetime reliability. Due to the correlation between different failure mechanisms and their significant dependence on the operating temperature, existing models assuming constant failure rate and additive impact of failure mechanisms will underestimate the lifetime of a system, usually measured by mean-time-to-failure (MTTF). In this paper, we propose a new methodology which evaluates system lifetime in MTTF and relies on Monte-Carlo simulation for verifying results. Temperature variations and the correlation between failure mechanisms are considered so as to mitigate lifetime underestimation. The proposed methodology, when applied on an Alpha 21264 processor, provides less pessimistic lifetime evaluation than the existing models based on sum of failure rate. Our experimental results also indicate that, by considering the correlation of TDDB and NBTI, the lifetime of a system is likely not dominated by TDDB or NBTI, but by EM or other failure mechanisms. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Statistical thermal modeling and optimization considering leakage power variations

    Da-Cheng Juan ; Yi-Lin Chuang ; Marculescu, D. ; Yao-Wen Chang
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012

    DOI: 10.1109/DATE.2012.6176544
    Publication Year: 2012 , Page(s): 605 - 610
    Cited by:  Papers (3)

    IEEE Conference Publications

    Unaddressed thermal issues can seriously hinder the development of reliable and low power systems. In this paper, we propose a statistical approach for analyzing thermal behavior under leakage power variations stemming from the manufacturing process. Based on the proposed models, we develop floorplanning techniques targeting thermal optimization. The experimental results show that peak temperature is reduced by up to 8.8°C, while thermal-induced leakage power and maximum thermal variance are reduced by 13% and 17%, respectively, with no additional area overhead compared with best performance-driven optimized design. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    A learning-based autoregressive model for fast transient thermal analysis of chip-multiprocessors

    Da-Cheng Juan ; Huapeng Zhou ; Marculescu, D. ; Xin Li
    Design Automation Conference (ASP-DAC), 2012 17th Asia and South Pacific

    DOI: 10.1109/ASPDAC.2012.6165027
    Publication Year: 2012 , Page(s): 597 - 602
    Cited by:  Papers (6)

    IEEE Conference Publications

    Thermal issues have become critical roadblocks for the development of advanced chip-multiprocessors (CMPs). In this paper, we introduce a new angle to view transient thermal analysis - based on predicting thermal profile, instead of calculating it. We develop a systematic framework that can learn different thermal profiles of a CMP by using an autoregressive (AR) model. The proposed AR model can serve as a fast alternative for predicting the transient temperature of a CMP with reasonably good accuracy. Experimental results show that the proposed AR model can achieve approximately 113X speed-up over existing thermal profile estimation methods, while introducing an error of only 0.8°C on average. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Special session 4A: New topics parametric yield and reliability of 3D integrated circuits: New challenges and solutions

    Garg, Siddharth ; Marculescu, D.
    VLSI Test Symposium (VTS), 2011 IEEE 29th

    DOI: 10.1109/VTS.2011.5783764
    Publication Year: 2011 , Page(s): 99

    IEEE Conference Publications

    3D integration is a promising new technology that offers numerous potential benefits including reduced wire length, high tier-to-tier bandwidth and low latency, and the possibility for heterogeneous integration of disparate technologies. As a result, 3D integrated circuits (IC) are being aggressively investigated as a potential replacement for conventional planar ICs in both academia and industry. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Aging-aware timing analysis and optimization considering path sensitization

    Kai-Chiang Wu ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011

    DOI: 10.1109/DATE.2011.5763249
    Publication Year: 2011 , Page(s): 1 - 6
    Cited by:  Papers (3)

    IEEE Conference Publications

    Device aging, which causes significant loss on circuit performance and lifetime, has been a main factor in reliability degradation of nanoscale designs. Aggressive technology scaling trends, such as thinner gate oxide without proportional down-scaling of supply voltage, necessitate an aging-aware analysis and optimization flow during early design stages. Since only a small portion of critical and near-critical paths can be sensitized and may determine the circuit delay under aging, path sensitization should also be explicitly addressed for more accurate and efficient optimization. In this paper, we first investigate the impact of path sensitization on aging-aware timing analysis and then present a novel framework for aging-aware timing optimization considering path sensitization. By extracting and manipulating critical sub-circuits accounting for the effective circuit delay, our proposed framework can reduce aging-induced performance degradation to only 1.21% or one-seventh of the original performance loss with less than 2% area overhead. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Statistical thermal evaluation and mitigation techniques for 3D Chip-Multiprocessors in the presence of process variations

    Da-Cheng Juan ; Garg, Siddharth ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011

    DOI: 10.1109/DATE.2011.5763067
    Publication Year: 2011 , Page(s): 1 - 6
    Cited by:  Papers (4)

    IEEE Conference Publications

    Thermal issues have become critical roadblocks for achieving highly reliable three-dimensional (3D) integrated circuits. This paper performs both the evaluation and mitigation of the impact of leakage power variations on the temperature profile of 3D Chip-Multiprocessors (CMPs). Furthermore, this paper provides a learning-based model to predict the maximum temperature, based on which a simple, yet effective tier-stacking algorithm to mitigate the impact of variations on the temperature profile of 3D CMPs is proposed. Results show that (1) the proposed prediction model achieves more than 98% accuracy, (2) a 4-tier 3D implementation can be more than 40°C hotter than its 2D counterpart and (3) the proposed tier-stacking algorithm significantly improves the thermal yield from 44.4% to 81.1% for a 3D CMP. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Regulatory network analysis acceleration with reconfigurable hardware

    Miskov-Zivanov, Natasa ; Bresticker, Andrew ; Krishnaswamy, Deepa ; Venkatakrishnan, Sreesan ; Kashinkunti, Prashant ; Marculescu, D. ; Faeder, J.R.
    Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE

    DOI: 10.1109/IEMBS.2011.6089916
    Publication Year: 2011 , Page(s): 149 - 152

    IEEE Conference Publications

    In medical research it is of great importance to be able to quickly obtain answers to inquiries about system response to different stimuli. Modeling the dynamics of biological regulatory networks is a promising approach to achieve this goal, but existing modeling approaches suffer from complexity issues and become inefficient with large networks. In order to improve the efficiency, we propose the implementation of models of regulatory networks in hardware, which allows for highly parallel simulation of these networks. We find that our FPGA implementation of an example model of peripheral naïve T cell differentiation provides five orders of magnitude speedup when compared to software simulation. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Analysis and mitigation of NBTI-induced performance degradation for power-gated circuits

    Kai-Chiang Wu ; Marculescu, D. ; Ming-Chao Lee ; Shih-Chieh Chang
    Low Power Electronics and Design (ISLPED) 2011 International Symposium on

    DOI: 10.1109/ISLPED.2011.5993626
    Publication Year: 2011 , Page(s): 139 - 144
    Cited by:  Papers (4)

    IEEE Conference Publications

    Device aging, which causes significant loss on circuit performance and lifetime, has been a main factor in reliability degradation of nanoscale designs. Aggressive technology scaling trends, such as thinner gate oxide without proportional downscaling of supply voltage, necessitate an aging-aware analysis and optimization flow in the early design stages. Since PMOS sleep transistors in power-gated circuits suffer from static NBTI during active mode and age very rapidly, the aging of power-gated circuits should be explicitly addressed. In this paper, for power-gated circuits, we present a novel methodology for analyzing and mitigating NBTI-induced performance degradation. Aging effects on both logic networks and sleep transistors are jointly considered for accurate analysis. By introducing 25% redundant sleep transistors with reverse body bias applied, the proposed methodology can significantly mitigate the long-term performance degradation and thus extend the circuit lifetime by 3X. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    PRICE: Power reduction by placement and clock-network co-synthesis for pulsed-latch designs

    Yi-Lin Chuang ; Hong-Ting Lin ; Tsung-Yi Ho ; Yao-Wen Chang ; Marculescu, D.
    Computer-Aided Design (ICCAD), 2011 IEEE/ACM International Conference on

    DOI: 10.1109/ICCAD.2011.6105310
    Publication Year: 2011 , Page(s): 85 - 90
    Cited by:  Papers (1)

    IEEE Conference Publications

    Pulsed latches have emerged as a popular technique to reduce the power consumption and delay for clock networks. However, the current physical synthesis flow for pulsed latches still performs circuit placement and clock-network synthesis separately, which limits achievable power reduction. This paper presents the first work in the literature to perform placement and clock-network co-synthesis for pulsed-latch designs. With the interplay between placement and clock-network synthesis, the clock-network power and timing can be optimized simultaneously. Novel progressive network forces are introduced to globally guide the placer for iterative improvements, while the clock-network synthesizer makes use of updated latch locations to optimize power and timing locally. Experimental results show that our framework can substantially minimize power consumption and improve timing slacks, compared to existing synthesis flows. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Clock skew scheduling for soft-error-tolerant sequential circuits

    Kai-Chiang Wu ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010

    DOI: 10.1109/DATE.2010.5456956
    Publication Year: 2010 , Page(s): 717 - 722
    Cited by:  Papers (1)

    IEEE Conference Publications

    Soft errors have been a critical reliability concern in nanoscale integrated circuits, especially in sequential circuits where a latched error can be propagated for multiple clock cycles and affect more than one output, more than once. This paper presents an analytical methodology for enhancing the soft error tolerance of sequential circuits. By using clock skew scheduling, we propose to minimize the probability of unwanted transient pulses being latched and also prevent latched errors from propagating through sequential circuits repeatedly. The overall methodology is formulated as a piecewise linear programming problem whose optimal solution can be found by existing mixed integer linear programming solvers. Experiments reveal that 30-40% reduction in the soft error rate for a wide range of benchmarks can be achieved. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Process variation aware performance modeling and dynamic power management for multi-core systems

    Garg, Siddharth ; Marculescu, D. ; Herbert, S.X.
    Computer-Aided Design (ICCAD), 2010 IEEE/ACM International Conference on

    DOI: 10.1109/ICCAD.2010.5654293
    Publication Year: 2010 , Page(s): 89 - 92

    IEEE Conference Publications

    Emerging multi-core platforms are increasingly impacted by the manufacturing process variations that introduce core-to-core and chip-to-chip differences in their power and performance characteristics. This can result in unacceptable yield loss since a large fraction of manufactured parts may not meet the design specifications. In this work, we present some promising, recently proposed solutions to mitigate the impact of process variations on multi-core platforms that deal with variability aware performance modeling, and static and dynamic power reduction. These solutions demonstrate the significant benefits that can be reaped if variability information is considered at the micro-architecture and system level design abstractions. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Formal modeling and reasoning for reliability analysis

    Miskov-Zivanov, Natasa ; Marculescu, D.
    Design Automation Conference (DAC), 2010 47th ACM/IEEE

    Publication Year: 2010 , Page(s): 531 - 536

    IEEE Conference Publications

    Transient faults in logic circuits are an important reliability concern for future technology nodes. In order to guide the design process and the choice of circuit optimization techniques, it is important to accurately and efficiently model transient faults and their propagation through logic circuits, while evaluating the error rates resulting from transient faults. To this end, we give an overview of the existing formal methods for modeling and reasoning about transient faults. We describe the main aspects of transient fault propagation and the advantages and drawbacks of different approaches to modeling them. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Custom Feedback control: Enabling truly scalable on-chip power management for MPSoCs

    Garg, Siddharth ; Marculescu, D. ; Marculescu, R.
    Low-Power Electronics and Design (ISLPED), 2010 ACM/IEEE International Symposium on

    Publication Year: 2010 , Page(s): 425 - 430
    Cited by:  Papers (2)

    IEEE Conference Publications

    In this paper, we propose Custom Feedback Control, a new dynamic voltage and frequency control architecture for MP-SoC designs that bridges the gap between the two extreme points on the performance versus implementation cost tradeoff curve, i.e., fully-centralized and full-decentralized control architectures. We outline a methodology to efficiently explore the vast design space of Custom Feedback control architectures, enabling designers to synthesize controllers that meet both the performance and implementation cost criteria. Our experimental results on an MPSoC platform running a video-encoding application demonstrate that, for the same energy dissipation, Custom Feedback control can achieve within 5% of the performance of a fully-centralized controller with only 17% of the implementation cost. In contrast, the performance of a fully-decentralized controller can be up to 2.5X worse than that of the fully-centralized controller. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Multiple Transient Faults in Combinational and Sequential Circuits: A Systematic Approach

    Miskov-Zivanov, Natasa ; Marculescu, D.
    Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

    Volume: 29 , Issue: 10
    DOI: 10.1109/TCAD.2010.2061131
    Publication Year: 2010 , Page(s): 1614 - 1627
    Cited by:  Papers (20)

    IEEE Journals & Magazines

    Transient faults in logic circuits are becoming an important reliability concern for future technology nodes. Radiation-induced faults have received significant attention in recent years, while multiple transients originating from a single radiation hit are predicted to occur more often. Furthermore, some effects, like reconvergent fanout-induced glitches, are more pronounced in the case of multiple faults. Therefore, to guide the design process and the choice of circuit optimization techniques, it is important to model multiple faults and their propagation through logic circuits, while evaluating the changes in error rates resulting from multiple simultaneous faults. In this paper, we show how output error probabilities change with increasing number of simultaneous faults and we also analyze the impact of multiple errors in state flip-flops, during the cycles following the cycle when fault(s) occurred. The results obtained using the proposed framework show that output error probability resulting from multiple-event transient or multiple-bit upsets can vary across different outputs and different circuits by several orders of magnitude. The results also show that the impact of different masking factors also varies across circuits and this information can be valuable for customizing protection techniques. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Variation-aware dynamic voltage/frequency scaling

    Herbert, S. ; Marculescu, D.
    High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on

    DOI: 10.1109/HPCA.2009.4798265
    Publication Year: 2009 , Page(s): 301 - 312
    Cited by:  Papers (21)  |  Patents (2)

    IEEE Conference Publications

    Fine-grained dynamic voltage/frequency scaling (DVFS) is an important tool in managing the balance between power and performance in chip-multiprocessors. Although manufacturing process variations are giving rise to significant core-to-core variations in power and performance, traditional DVFS controllers are unaware of these variations. Exploiting the different power/performance profiles of the cores can significantly improve energy-efficiency. Two hardware DVFS control algorithms are considered and the gains enabled by incorporating variability-awareness are demonstrated on multithreaded commercial workloads. For a design with per-core voltage/frequency islands (VFIs), the mean power per unit throughput for a simple threshold-based controller is reduced by 8.0% when variability-awareness is added. A complex greedy-search controller sees an even larger reduction of 15.4%. The variability-aware versions of the two controllers achieve power/throughput reductions of 2.1% and 9.9% relative to LinOpt, a recent software variability-aw are DVFS scheme. Designs which apply DVFS at a coarser granularity are also considered, and the variability-aware schemes maintain significant improvement over the -unaware ones. With four cores per VFI, variability-awareness reduces power/throughput by 6.5% and 9.2% for the threshold- based and greedy-search controllers, respectively. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Power Management of Voltage/Frequency Island-Based Systems Using Hardware-Based Methods

    Choudhary, P. ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 17 , Issue: 3
    DOI: 10.1109/TVLSI.2008.2005309
    Publication Year: 2009 , Page(s): 427 - 438
    Cited by:  Papers (7)

    IEEE Journals & Magazines

    Shrinking technology nodes combined with the need for higher clock speeds have made it increasingly difficult to distribute a single global clock across a chip while meeting the power requirements of the design. Globally asynchronous locally synchronous (GALS) design style can help achieve low power consumption and modularity of a design while greatly reducing the number of global interconnects. Such multiple clock domain architectures can benefit from having frequency/voltage values assigned to each domain based on workload requirements. The work presented in this paper proposes a new hardware-based approach to dynamically change the frequencies and potentially voltages of a voltage-frequency island (VFI) system driven by a dynamic workload. This technique tries to change the frequency of a synchronous island such that it will have efficient power utilization while satisfying performance constraints. In recent years, there have been major developments, both in industry and academia, in the field of multiprocessor systems. Such multiprocessor systems are very good candidates for VFI design style implementation, where one or more processors can be part of a single VFI. To demonstrate the feasibility of our proposed method, we have implemented a multiprocessor system for a field-programmable gate array (FPGA) platform that uses independently generated clocks for each processor. The results from the FPGA platform confirm the claim that the power consumption of a system can potentially be reduced while maintaining the performance of many applications. Our work concentrates primarily on embedded systems, but the idea can be explored for general-purpose computing as well. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    3D-GCP: An analytical model for the impact of process variations on the critical path delay distribution of 3D ICs

    Garg, Siddharth ; Marculescu, D.
    Quality of Electronic Design, 2009. ISQED 2009. Quality Electronic Design

    DOI: 10.1109/ISQED.2009.4810285
    Publication Year: 2009 , Page(s): 147 - 155
    Cited by:  Papers (13)

    IEEE Conference Publications

    3D Integrated Circuits (ICs) have been recently proposed as a solution to the increasing wire delay concerns in scaled technologies. At the same time, technology scaling leads to increasing variability in manufacturing process parameters, making it imperative to quantify the impact of these variations on performance. In this work, we take, to the best of our knowledge, the first step towards formally modeling the impact of process variations on the clock frequency of fully-synchronous (FS) 3D ICs. The proposed analytical models demonstrate theoretically and experimentally that 3D designs behave very differently under the impact of process variations as compared to equivalent 2D designs. In particular, for the same number of critical paths, we show that a 3D design is always less likely to meet a pre-defined frequency target compared to its 2D counterpart. Furthermore, as opposed to models for 2D ICs, the 3D models need to accurately account for not only within-die (WID) critical paths, i.e., paths that lie entirely within one of the die layers, but also D2D critical paths that use through-silicon vias (TSVs) to span across multiple dies in the 3D stack. Finally, we show, theoretically and experimentally, that the mapping of critical paths to the die layers of a 3D IC can also affect the timing yield of a design, while the mapping issue does not arise in the 2D case since there is only a single die layer in a 2D IC. The accuracy of the proposed models is experimentally verified and found to be in excellent agreement with detailed SPICE and gate-level Monte Carlo (MC) simulations. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    System-level process variability analysis and mitigation for 3D MPSoCs

    Garg, Siddharth ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE '09.

    DOI: 10.1109/DATE.2009.5090739
    Publication Year: 2009 , Page(s): 604 - 609
    Cited by:  Papers (5)

    IEEE Conference Publications

    While prior research has extensively evaluated the performance advantage of moving from a 2D to a 3D design style, the impact of process parameter variations on 3D designs has been largely ignored. In this paper, we attempt to bridge this gap by proposing a variability-aware design framework for fully-synchronous (FS) and multiple clock-domain (MCD) 3D systems. First, we develop analytical system-level models of the impact of process variations on the performance of FS 3D designs. The accuracy of the model is demonstrated by comparing against transistor-level Monte Carlo simulations in SPICE - we observe a maximum error of only 0:7% (average 0:31% error) in the mean of the maximum critical path delay distribution. Second, to mitigate the impact of process variations on 3D designs, we propose a variability-aware 3D integration strategy for MCD 3D systems that maximizes the probability of the design meeting specified system performance constraints. The proposed optimization strategy is shown to significantly outperform FS and MCD 3D implementations that are conventionally assembled - for example, the MCD designs assembled with the proposed integration strategy provide, on average, 44% and 16:33% higher absolute yield than the FS and conventional MCD designs respectively, at the 50% yield point of the conventional MCD designs. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Mitigating the Impact of Variability on Chip-Multiprocessor Power and Performance

    Herbert, S. ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 17 , Issue: 10
    DOI: 10.1109/TVLSI.2009.2020394
    Publication Year: 2009 , Page(s): 1520 - 1533
    Cited by:  Papers (5)

    IEEE Journals & Magazines

    Chip-multiprocessors (CMPs) have emerged as a popular means of exploiting growing transistor budgets. However, the same technology scaling that increases the number of transistors on a single die also creates greater variability in their key power- and performance-determining characteristics. As the number of cores and amount of memory per die increase, individual core and cache tiles will become small enough that traditional sources of intra-die power and performance variations will result in tile-to-tile (T2T) variations. We start from low-level models of the phenomena involved and create models for how systematic within-die process variations, random within-die process variations, and thermal variations manifest themselves as T2T variations. Current commercial CMP designs are partitioned into fine-grained frequency islands (FIs) to allow per-core control of clock frequencies. We use our models to evaluate leveraging this partitioning to address T2T variations. Exploiting the FI partitioning improves performance by an average of 8.4% relative to the fully-synchronous baseline when both process and thermal variability are addressed simultaneously, highlighting the importance of an integrated approach. The FI design can also achieve performance 7.1% higher than the baseline at fixed power or draw 24.2% less power at equal performance. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Technology-driven limits on DVFS controllability of multiple voltage-frequency island designs: A system-level perspective

    Garg, Siddharth ; Marculescu, D. ; Marculescu, R. ; Ogras, U.
    Design Automation Conference, 2009. DAC '09. 46th ACM/IEEE

    Publication Year: 2009 , Page(s): 818 - 821
    Cited by:  Papers (2)

    IEEE Conference Publications

    In this paper, we consider the case of network-on-chip (NoC) based multiple processor systems-on-chip (MPSoCs) implemented using multiple voltage and frequency islands (VFIs) that rely on fine grained dynamic voltage and frequency scaling (DVFS) for run time control of the system power dissipation. Specifically, we present a framework to compute theoretical bounds on the performance of DVFS controllers for such systems under the impact of three important technology driven constraints: reliability and temperature driven upper limits on the maximum supply voltage; inductive noise driven constraints on the maximum rate of change of voltage/frequency; and increasing manufacturing process variations. Our experimental results show that, for the benchmarks considered, any DVFS control algorithm will lose up to 87% performance, measured in terms of the number of steps required to reach a reference steady state, in the presence of maximum frequency and maximum frequency increment constraints. In addition, increasing process variations can lead to up to 60% of fabricated chips being unable to meet the specified DVFS control specifications, irrespective of the DVFS algorithm used. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Design and Management of Voltage-Frequency Island Partitioned Networks-on-Chip

    Ogras, U.Y. ; Marculescu, R. ; Marculescu, D. ; Eun Gu Jung
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 17 , Issue: 3
    DOI: 10.1109/TVLSI.2008.2011229
    Publication Year: 2009 , Page(s): 330 - 341
    Cited by:  Papers (21)

    IEEE Journals & Magazines

    The design of many core systems-on-chip (SoCs) has become increasingly challenging due to high levels of integration, excessive energy consumption and clock distribution problems. To deal with these issues, we consider network-on-chip (NoC) architectures partitioned into several voltage-frequency islands (VFIs) and propose a design methodology for runtime energy management. The proposed approach minimizes the energy consumption subject to performance constraints. Then, we present efficient techniques for on-the-fly workload monitoring and management to ensure that the system can cope with variability in the workload and various technology-related parameters. Simulation results demonstrate the effectiveness of our approach in reducing the overall system energy consumption for a real video application. Finally, the results and functional correctness are validated using an field-programmable gate-array (FPGA) prototype for an NoC with multiple VFIs. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Joint logic restructuring and pin reordering against NBTI-induced performance degradation

    Kai-Chiang Wu ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE '09.

    DOI: 10.1109/DATE.2009.5090636
    Publication Year: 2009 , Page(s): 75 - 80
    Cited by:  Papers (10)

    IEEE Conference Publications

    Negative Bias Temperature Instability (NBTI), a PMOS aging phenomenon causing significant loss on circuit performance and lifetime, has become a critical challenge for temporal reliability concerns in nanoscale designs. Aggressive technology scaling trends, such as thinner gate oxide without proportional downscaling of supply voltage, necessitate a design optimization flow considering NBTI effects at the early stages. In this paper, we present a novel framework using joint logic restructuring and pin reordering to mitigate NBTI-induced performance degradation. Based on detecting functional symmetries and transistor stacking effects, the proposed methodology involves only wire perturbation and introduces no gate area overhead at all. Experimental results reveal that, by using this approach, on average 56% of performance loss due to NBTI can be recovered. Moreover, our methodology reduces the number of critical transistors remaining under severe NBTI and thus, transistor resizing can be applied to further mitigate NBTI effects with low area overhead. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    A systematic approach to modeling and analysis of transient faults in logic circuits

    Miskov-Zivanov, N. ; Marculescu, D.
    Quality of Electronic Design, 2009. ISQED 2009. Quality Electronic Design

    DOI: 10.1109/ISQED.2009.4810329
    Publication Year: 2009 , Page(s): 408 - 413
    Cited by:  Papers (6)

    IEEE Conference Publications

    With technology scaling, the occurrence rate of not only single, but also multiple transients resulting from a single hit is increasing. In this work, we consider the effect of these multiple-event transients on the outputs of logic circuits. Our framework allows for the analysis of soft errors in logic circuits, including several aspects: estimation of the effect of both single and multiple transient faults on both combinational and sequential circuits, analysis of the impact of multiple flip-flop upsets in sequential circuits, and analysis of transient behavior of the soft error rate in the cycles following the hit. The proposed framework can be used to estimate the impact of transient faults stemming not only from radiation, but also other physical phenomena. The results obtained using the proposed framework show that output error rates, resulting from multiple-event transient or multiple-bit upsets can vary across different circuits by several orders of magnitude. View full abstract»

  • Freely Available from IEEE

    Guest Editorial Special Section on Low-Power Electronics and Design

    Marculescu, D. ; Henkel, J.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 16 , Issue: 6
    DOI: 10.1109/TVLSI.2008.2000343
    Publication Year: 2008 , Page(s): 609 - 610

    IEEE Journals & Magazines

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Process-Driven Variability Analysis of Single and Multiple Voltage–Frequency Island Latency-Constrained Systems

    Marculescu, D. ; Garg, Siddharth
    Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

    Volume: 27 , Issue: 5
    DOI: 10.1109/TCAD.2008.917969
    Publication Year: 2008 , Page(s): 893 - 905
    Cited by:  Papers (6)

    IEEE Journals & Magazines

    The problem of determining bounds for application completion times running on generic systems comprising single or multiple voltage-frequency islands (VFIs) with arbitrary topologies is addressed in the context of manufacturing-process-driven variability. The approach provides an exact solution for the system-level timing yield in synchronous single-voltage (SSV) and VFI systems with an underlying tree-based topology and a tight upper bound for generic non-tree-based topologies. The results show that: 1) timing yield for the overall source-to-sink completion time for generic systems can be modeled in an exact manner for both SSV and VFI systems and 2) multiple-VFI latency-constrained systems can achieve up to two times higher timing yield than their SSV counterparts. The results are formally proven and are supported by experimental results on two embedded applications, namely, a software-defined radio and a Moving Pictures Expert Group 2 encoder. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Variation-adaptive feedback control for networks-on-chip with multiple clock domains

    Ogras, U.Y. ; Marculescu, R. ; Marculescu, D.
    Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE

    Publication Year: 2008 , Page(s): 614 - 619
    Cited by:  Papers (3)

    IEEE Conference Publications

    This paper discusses the use of networks-on-chip (NoCs) consisting of multiple voltage-frequency islands to cope with power consumption, clock distribution and parameter variation problems in future multiprocessor systems-on-chip (MPSoCs). In this architecture, communication within each island is synchronous, while communication across different islands is achieved via mixed-clock mixed-voltage queues. In order to dynamically control the speed of each domain in the presence of parameter and workload variations, we propose a robust feedback control methodology. Towards this end, we first develop a state-space model based on the utilization of the inter-domain queues. Then, we identify the theoretical conditions under which the network is controllable. Finally, we synthesize state feedback controllers to cope with workload variations and minimize power consumption. Experimental results demonstrate robustness to parameter variations and more than 40% energy savings by exploiting workload variations through dynamic voltage-frequency scaling (DVFS) for a hardware MPEG-2 encoder design. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Characterizing chip-multiprocessor variability-tolerance

    Herbert, S. ; Marculescu, D.
    Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE

    Publication Year: 2008 , Page(s): 313 - 318
    Cited by:  Papers (5)

    IEEE Conference Publications

    Spatially-correlated intra-die process variations result in significant core-to-core frequency variations in chip-multiprocessors. An analytical model for frequency island chip-multiprocessor throughput is introduced. The improved variability-tolerance of FI-CMPs over their globally-clocked counterparts is quantified across a range of core counts and sizes under constant die area. The benefits are highest for designs consisting of many small cores, with the throughput of a globally-clocked design with 70 small cores increasing by 8.8% when per-core frequency islands are used. The small- core FI-CMP also loses only 7.2% of its nominal performance to process variations, the least among any of the designs. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Power-aware soft error hardening via selective voltage scaling

    Kai-Chiang Wu ; Marculescu, D.
    Computer Design, 2008. ICCD 2008. IEEE International Conference on

    DOI: 10.1109/ICCD.2008.4751877
    Publication Year: 2008 , Page(s): 301 - 306
    Cited by:  Papers (6)

    IEEE Conference Publications

    Nanoscale integrated circuits are becoming increasingly sensitive to radiation-induced transient faults (soft errors) due to current technology scaling trends, such as shrinking feature sizes and reducing supply voltages. Soft errors, which have been a significant concern in memories, are now a main factor in reliability degradation of logic circuits. This paper presents a power-aware methodology using dual supply voltages for soft error hardening. Given a constraint on power overhead, our proposed framework can minimize the soft error rate (SER) of a circuit via selective voltage scaling. On average, circuit SER can be reduced by 33.45% for various sizes of transient glitches with only 11.74% energy increase. The overhead in normalized power-delay-area product per 1% SER reduction is 0.64%, 1.33X less than that of existing state-of-the-art approaches. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Soft error rate reduction using redundancy addition and removal

    Kai-Chiang Wu ; Marculescu, D.
    Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific

    DOI: 10.1109/ASPDAC.2008.4484014
    Publication Year: 2008 , Page(s): 559 - 564
    Cited by:  Papers (5)

    IEEE Conference Publications

    Due to current technology scaling trends such as shrinking feature sizes and reducing supply voltages, circuit reliability has become more susceptible to radiation-induced transient faults (soft errors). Soft errors, which have been a great concern in memories, are now a main factor in reliability degradation of logic circuits. In this paper, we propose a novel framework based on redundancy addition and removal (RAR) for soft error rate (SER) reduction. Several metrics and constraints are introduced to guide our proposed framework towards SER reduction in an efficient manner. Experimental results show that up to 70% reduction in output failure probability can be achieved with relatively low area overhead. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Modeling and Optimization for Soft-Error Reliability of Sequential Circuits

    Miskov-Zivanov, N. ; Marculescu, D.
    Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

    Volume: 27 , Issue: 5
    DOI: 10.1109/TCAD.2008.917591
    Publication Year: 2008 , Page(s): 803 - 816
    Cited by:  Papers (16)

    IEEE Journals & Magazines

    Due to reduction in device feature size and supply voltage, the sensitivity to radiation-induced transient faults of digital systems dramatically increases. In this paper, we present two approaches to evaluating the susceptibility of sequential circuits to soft errors. The first approach uses the Markov chain theory but can only provide steady-state behavior information. The second approach uses symbolic modeling based on binary decision diagrams/algebraic decision diagrams and circuit unrolling. The soft-error rate (SER) evaluation using this approach is demonstrated by the set of experimental results, which show that, for most of the benchmarks used, the SER decreases well below a given threshold (10-7 FIT) within ten clock cycles after the hit. The results obtained with the proposed symbolic framework are within 4% average error and up to 11 000x faster when compared to HSPICE detailed circuit simulation. The framework can be used for selective gate sizing targeting radiation hardening, leading up to 80% SER reduction when applied to a subset of ISCAS'89 benchmarks. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Analysis of dynamic voltage/frequency scaling in chip-multiprocessors

    Herbert, S. ; Marculescu, D.
    Low Power Electronics and Design (ISLPED), 2007 ACM/IEEE International Symposium on

    DOI: 10.1145/1283780.1283790
    Publication Year: 2007 , Page(s): 38 - 43
    Cited by:  Papers (52)  |  Patents (3)

    IEEE Conference Publications

    Fine-grained dynamic voltage/frequency scaling (DVFS) demonstrates great promise for improving the energy-efficiency of chip-multiprocessors (CMPs), which have emerged as a popular way for designers to exploit growing transistor budgets. We examine the tradeoffs involved in the choice of both DVFS control scheme and method by which the processor is partitioned into voltage/frequency islands (VFIs). We simulate real multithreaded commercial and scientific workloads, demonstrating the large real-world potential of DVFS for CMPs. Contrary to the conventional wisdom, we find that the benefits of per-core DVFS are not necessarily large enough to overcome the complexity of having many independent VFIs per chip. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    MARS-S: Modeling and Reduction of Soft Errors in Sequential Circuits

    Miskov-Zivanov, N. ; Marculescu, D.
    Quality Electronic Design, 2007. ISQED '07. 8th International Symposium on

    DOI: 10.1109/ISQED.2007.100
    Publication Year: 2007 , Page(s): 893 - 898
    Cited by:  Papers (7)

    IEEE Conference Publications

    Due to the shrinking of feature size and reduction in supply voltages, nanoscale circuits have become more susceptible to radiation induced transient faults. In this paper, the authors use a symbolic framework based on BDDs and ADDs that enables analysis of sequential circuit reliability from different aspects: output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of circuit reliability on glitch duration, amplitude, and input patterns. The framework can be used for selective gate sizing targeting radiation hardening which is done only for gates with error impact exceeding a certain threshold. Using such a technique SER can be reduced by 80% for various threshold values, when applied to a subset of ISCAS'89 benchmarks View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    On the impact of manufacturing process variations on the lifetime of sensor networks

    Garg, Siddharth ; Marculescu, D.
    Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2007 5th IEEE/ACM/IFIP International Conference on

    Publication Year: 2007 , Page(s): 203 - 208

    IEEE Conference Publications

    As an emerging technology, sensor networks provide the ability to accurately monitor the characteristics of wide geographical areas over long periods of time. The lifetime of individual nodes in a sensor network depends strongly on the leakage power that the nodes dissipate in the idle state, especially for low-throughput applications. With the introduction of advanced low power design techniques, such as sub-threshold voltage design styles, and the migration of fabrication processes to smaller technology generations, variability in leakage power dissipation of the sensor nodes will lead to increased variability in their lifetimes. In this paper, we analyze how this increased variability in the lifetime of individual sensor nodes affects the performance and lifetime of the network as a whole. We demonstrate how sensor network designers can use the proposed analysis framework to trade-off the cost of a sensor network deployment with the performance it offers. Our results indicate that up to 37% improvement in the critical lifetime of a sensor network (defined as the expected time at which the sensor network becomes disconnected) can be obtained over a baseline design with a 20% increase in the cost of the individual sensor nodes. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    An 0.9 x 1.2", Low Power, Energy-Harvesting System with Custom Multi-Channel Communication Interface

    Stanley-Marbell, P. ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition, 2007. DATE '07

    DOI: 10.1109/DATE.2007.364560
    Publication Year: 2007 , Page(s): 1 - 6
    Cited by:  Papers (2)

    IEEE Conference Publications

    Presented is a self-powered computing system, sunflower, that uses a novel combination of a PIN photodiode array, switching regulators, and a supercapacitor, to provide a small footprint renewable energy source. The design provides software-controlled power-adaptation facilities, for both the main processor and its peripherals. The system's power consumption is characterized, and its energy-scavenging efficiency is quantified with field measurements under a variety of weather conditions View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Soft Error Rate Analysis for Sequential Circuits

    Miskov-Zivanov, N. ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition, 2007. DATE '07

    DOI: 10.1109/DATE.2007.364500
    Publication Year: 2007 , Page(s): 1 - 6
    Cited by:  Papers (8)

    IEEE Conference Publications

    Due to reduction in device feature size and supply voltage, the sensitivity to radiation induced transient faults (soft errors) of digital systems increases dramatically. Intensive research has been done so far in modeling and analysis of combinational circuit susceptibility to soft errors, while sequential circuits have received much less attention. In this paper, we present an approach for evaluating the susceptibility of sequential circuits to soft errors. The proposed approach uses symbolic modeling based on BDDs/ADDs and probabilistic sequential circuit analysis. The SER evaluation is demonstrated by the set of experimental results, which show that, for most of the benchmarks used, the SER decreases well below a given threshold (10-7 FIT) within ten clock cycles after the hit. The results obtained with the proposed symbolic framework are within 4% average error and up to 11000X faster when compared to HSPICE detailed circuit simulation View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    System-Level Process Variation Driven Throughput Analysis for Single and Multiple Voltage-Frequency Island Designs

    Garg, Siddharth ; Marculescu, D.
    Design, Automation & Test in Europe Conference & Exhibition, 2007. DATE '07

    DOI: 10.1109/DATE.2007.364625
    Publication Year: 2007 , Page(s): 1 - 6
    Cited by:  Papers (5)

    IEEE Conference Publications

    Manufacturing process variations are the primary cause of timing yield loss in aggressively scaled technologies. In this paper, we analyze the impact of process variations on the throughput (rate) characteristics of embedded systems comprised of multiple voltage-frequency islands (VFIs) represented as component graphs. We provide an efficient, yet accurate method to compute the throughput of an application in a probabilistic scenario and show that systems implemented with multiple VFIs are more likely to meet throughput constraints than their fully synchronous counterparts. The proposed framework allows designers to investigate the impact of architectural decisions such as the granularity of VFI partitioning on their designs, while determining the likelihood of a system meeting specified throughput constraints. An implementation of the proposed framework is accurate within 1.2% of Monte Carlo simulation while yielding speed-ups ranging from 78times-260times, for a set of synthetic benchmarks. Results on a real benchmark (MPEG-2 encoder) show that a nine clock domain implementation gives 100% yield for a throughput constraint for which a fully synchronous design only yields 25%. For the same throughput constraint, a three clock domain architecture yields 78% View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Architectures for silicon nanoelectronics and beyond

    Bahar, R.I. ; Lau, C. ; Hammerstrom, D. ; Marculescu, D. ; Harlow, J. ; Orailoglu, A. ; Joyner, W.H. ; Pedram, M.
    Computer

    Volume: 40 , Issue: 1
    DOI: 10.1109/MC.2007.7
    Publication Year: 2007 , Page(s): 25 - 33
    Cited by:  Papers (25)

    IEEE Journals & Magazines

    Although nanoelectronics won't replace CMOS for some time, research is needed now to develop the architectures, methods, and tools to maximally leverage nanoscale devices and terascale capacity. Addressing the complementary architectural and system issues involved requires greater collaboration at all levels. The effective use of nanotechnology calls for total system solutions View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Challenges and Promising Results in NoC Prototyping Using FPGAs

    Ogras, U.Y. ; Marculescu, R. ; Hyung Gyu Lee ; Choudhary, P. ; Marculescu, D. ; Kaufman, M. ; Nelson, P.
    Micro, IEEE

    Volume: 27 , Issue: 5
    DOI: 10.1109/MM.2007.4378786
    Publication Year: 2007 , Page(s): 86 - 95
    Cited by:  Papers (11)

    IEEE Journals & Magazines

    Although a significant amount of theoretical work supports the potential of NoC architectures, such results need to be demonstrated by actual implementations before the NoC paradigm becomes a reality. Besides demonstrating the feasibility of the overall approach, prototyping enables accurate evaluation of power, performance, area, and various design trade-offs. This article presents four NoC prototypes, discusses the challenges associated with their design, and assesses the potential of the NoC approach. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Voltage-Frequency Island Partitioning for GALS-based Networks-on-Chip

    Ogras, U.Y. ; Marculescu, R. ; Choudhary, P. ; Marculescu, D.
    Design Automation Conference, 2007. DAC '07. 44th ACM/IEEE

    Publication Year: 2007 , Page(s): 110 - 115
    Cited by:  Papers (11)

    IEEE Conference Publications

    Due to high levels of integration and complexity, the design of multi-core SoCs has become increasingly challenging. In particular, energy consumption and distributing a single global clock signal throughout a chip have become major design bottlenecks. To deal with these issues, a globally asynchronous, locally synchronous (GALS) design is considered for achieving low power consumption and modular design. Such a design style fits nicely with the concept of voltage-frequency islands (VFIs) which has been recently introduced for achieving fine-grain system-level power management. This paper proposes a design methodology for partitioning an NoC architecture into multiple VFIs and assigning supply and threshold voltage levels to each VFI Simulation results show about 40% savings for a real video application and demonstrate the effectiveness of our approach in reducing the overall system energy consumption. The results and functional correctness are validated using an FPGA prototype for an NoC with multiple VFIs. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Design and analysis of a low power VLIW DSP core

    Chan-Hao Chang ; Marculescu, D.
    Emerging VLSI Technologies and Architectures, 2006. IEEE Computer Society Annual Symposium on

    DOI: 10.1109/ISVLSI.2006.36
    Publication Year: 2006
    Cited by:  Papers (2)

    IEEE Conference Publications

    Power consumption has been the primary issue in processor design, with various power reduction strategies being adopted from system-level to circuit-level. In order to develop a power efficient system, architecture design, compiler optimization, as well as user evaluation must be employed in a unified framework. This paper presents an architecture-level power/performance simulator for a VLIW DSP processor core. Relying on parameterized power models and cycle accurate simulation, it provides fast and accurate power estimation for architecture exploration. Furthermore, the proposed modeling methodology can be used with minimal changes in the evaluation of other VLIW processor cores or for characterizing the efficiency of compiler-driven power efficient transformations. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Circuit Reliability Analysis Using Symbolic Techniques

    Miskov-Zivanov, N. ; Marculescu, D.
    Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

    Volume: 25 , Issue: 12
    DOI: 10.1109/TCAD.2006.882592
    Publication Year: 2006 , Page(s): 2638 - 2649
    Cited by:  Papers (32)

    IEEE Journals & Magazines

    Due to the shrinking of feature size and the significant reduction in noise margins, nanoscale circuits have become more susceptible to manufacturing defects, noise-related transient faults, and interference from radiation. Traditionally, soft errors have been a much greater concern in memories than in logic circuits. However, as technology continues to scale, logic circuits are becoming more susceptible to soft errors than memories. To estimate the susceptibility to errors in combinational logic, the use of binary decision diagrams (BDDs) and algebraic decision diagrams (ADDs) for the unified symbolic analysis of circuit reliability is proposed. A framework that uses BDDs and ADDs and enables the analysis of combinational circuit reliability from different aspects, e.g., output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of circuit reliability on glitch duration, amplitude, and input patterns, is presented. This is demonstrated by the set of experimental results, which show that the mean output error susceptibility can vary from less then 0.1% for large circuits and short glitches (20% cycle time) to about 30% for very small circuits and long enough glitches (50% cycle time) View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Hardware based frequency/voltage control of voltage frequency island systems

    Marculescu, D. ; Choudhary, P.
    Hardware/Software Codesign and System Synthesis, 2006. CODES+ISSS '06. Proceedings of the 4th International Conference

    DOI: 10.1145/1176254.1176265
    Publication Year: 2006 , Page(s): 34 - 39
    Cited by:  Papers (12)

    IEEE Conference Publications

    The ability to do fine grain power management via local voltage selection has shown much promise via the use of voltage/ frequency islands (VFIs). VFI-based designs combine the advantages of using fine-grain speed and voltage control for reducing energy requirements, while allowing for maintaining performance constraints. We propose a hardware based technique to dynamically change the clock frequencies and potentially voltages of a VFI system driven by the dynamic workload. This technique tries to change the frequency of a synchronous island such that it will have efficient power utilization while satisfying performance constraints. We propose a hardware design that can be used to change the frequencies of various synchronous islands interconnected together by mixed-clock/mixed-voltage FIFO interfaces. Results show up to 65% power savings for the set of benchmarks considered with no loss in throughput. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    System-Level Process-Driven Variability Analysis for Single and Multiple Voltage-Frequency Island Systems

    Marculescu, D. ; Garg, Siddharth
    Computer-Aided Design, 2006. ICCAD '06. IEEE/ACM International Conference on

    DOI: 10.1109/ICCAD.2006.320171
    Publication Year: 2006 , Page(s): 541 - 546
    Cited by:  Papers (8)

    IEEE Conference Publications

    The problem of determining bounds for application completion times running on generic systems comprised of single or multiple voltage-frequency islands (VFIs) with arbitrary topologies is addressed in the context of manufacturing-driven variability. The approach provides an exact solution for the system-level timing yield in single clock, single voltage (SSV) and VFI systems with an underlying tree-based topology, and a tight upper bound for generic, non-tree based topologies. The results show that: (a) timing yield for overall source-to-sink completion time for generic systems can be modeled in an exact manner for both SSV and VFI systems; and (b) multiple VFI, latency-constrained systems can achieve 11-90% higher timing yield than their SSV counterparts. The results are proven formally and supported by experimental results on two embedded applications, namely software defined radio and MPEG2 encoder View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    MARS-C: modeling and reduction of soft errors in combinational circuits

    Miskov-Zivanov, N. ; Marculescu, D.
    Design Automation Conference, 2006 43rd ACM/IEEE

    DOI: 10.1109/DAC.2006.229323
    Publication Year: 2006 , Page(s): 767 - 772
    Cited by:  Papers (44)

    IEEE Conference Publications

    Due to the shrinking of feature size and reduction in supply voltages, nanoscale circuits have become more susceptible to radiation induced transient faults. In this paper, we present a symbolic framework based on BDDs and ADDs that enables analysis of combinational circuit reliability from different aspects: output susceptibility to error, influence of individual gates on individual outputs and overall circuit reliability, and the dependence of circuit reliability on glitch duration, amplitude, and input patterns. This is demonstrated by the set of experimental results, which show that the mean output error susceptibility can vary from less than 0.1%, for large circuits and small glitches, to about 30% for very small circuits and large enough glitches. The results obtained with the proposed symbolic framework are within 7% average error and up to 5000times speedup when compared to HSPICE detailed circuit simulation. The framework can be used for selective gate sizing targeting radiation hardening which is done only for gates with error impact exceeding a certain threshold. Using such a technique, soft error rate (SER) can be reduced by 25-67% for various threshold values, when applied to a subset of ISCAS'85 and mcnc '91 benchmarks View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Variability and energy awareness: a microarchitecture-level perspective

    Marculescu, D. ; Talpes, E.
    Design Automation Conference, 2005. Proceedings. 42nd

    DOI: 10.1109/DAC.2005.193764
    Publication Year: 2005 , Page(s): 11 - 16
    Cited by:  Papers (9)

    IEEE Conference Publications

    This paper proposes microarchitecture-level models for within die (WID) process and system parameter variability that can be included in the design of high-performance processors. Since decisions taken at microarchitecture level have the largest impact on both performance and power, on one hand, and global variability effect, on the other hand, models and associated metrics are needed for their joint characterization and analysis. To assess how these variations affect or are affected by microarchitecture decisions, we propose a joint performance, power and variability metric that is able to distinguish among various design choices. As a design-driver for the modeling methodology, we consider a clustered high-performance processor implementation, along with its globally asynchronous, locally synchronous (GALS) counterpart. Results show that, when comparing the baseline, synchronous and its GALS counterpart, microarchitecture-driven impact of process variability translates into 2-10% faster local clocks for the GALS case, while when taking into account the effect of on-chip temperature variability, local clocks can be 8-18% faster. If, in addition, voltage scaling (DVS) is employed, the GALS architecture with DVS is 26% better in terms of the joint quality metric employing energy, performance, and variability. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Energy bounds for fault-tolerant nanoscale designs

    Marculescu, D.
    Design, Automation and Test in Europe, 2005. Proceedings

    DOI: 10.1109/DATE.2005.135
    Publication Year: 2005 , Page(s): 74 - 79 Vol. 1
    Cited by:  Papers (2)

    IEEE Conference Publications

    The problem of determining lower bounds for the energy cost of a given nanoscale design is addressed via a complexity theory-based approach. The paper provides a theoretical framework that is able to assess the trade-offs existing in nanoscale designs between the amount of redundancy needed for a given level of resilience to errors and the associated energy cost. Circuit size, logic depth and error resilience are analyzed and brought together in a theoretical framework that can be seamlessly integrated with automated synthesis tools and can guide the design process of nanoscale systems comprised of failure prone devices. The impact of redundancy addition on the switching energy and its relationship with leakage energy is modeled in detail. Results show that 99% error resilience is possible for fault-tolerant designs, but at the expense of at least 40% more energy if individual gates fail independently with probability of 1%. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    System level power and performance modeling of GALS point-to-point communication interfaces

    Niyogi, K. ; Marculescu, D.
    Low Power Electronics and Design, 2005. ISLPED '05. Proceedings of the 2005 International Symposium on

    DOI: 10.1109/LPE.2005.195551
    Publication Year: 2005 , Page(s): 381 - 386
    Cited by:  Papers (2)

    IEEE Conference Publications

    Due to difficulties in distributing a single global clock signal over increasingly large chip areas, a globally asynchronous, locally synchronous design is considered a promising technique in the system on a chip (SoC) era. In the context of today's increasingly complex SoCs, there is a need for design methodologies that start at higher levels of abstraction. Much of the previous work has been devoted to design of asynchronous communication schemes such as mixed clock FIFOs and pausible clocks for globally asynchronous, locally synchronous systems, but at low levels of abstraction, such as circuit level. To enable early design evaluation of such schemes, this paper proposes to use a SystemC-based modeling methodology for the asynchronous communication among various locally synchronous islands. The modeling framework encompasses various levels of abstraction and enables system-level validation of circuit or RT level hardware descriptions, as well as their impact on high-level design decisions. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Execution cache-based microarchitecture for power-efficient superscalar processors

    Talpes, E. ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 13 , Issue: 1
    DOI: 10.1109/TVLSI.2004.840406
    Publication Year: 2005 , Page(s): 14 - 26
    Cited by:  Papers (1)  |  Patents (1)

    IEEE Journals & Magazines

    This paper investigates a possible solution to the problem of power consumption in superscalar, out-of-order processors by proposing a new microarchitecture, specifically designed to reduce increasing power requirements of high-end processors. More precisely, we show that by modifying the well-established superscalar processor architecture, significant savings can be achieved in terms of power consumption. Our approach aims at limiting the growing amount of power used in a typical processor for dynamic optimizations (including out-of-order scheduling and register renaming). Our proposed approach achieves significant power savings by reusing as much as possible from the work done by the front-end of a typical superscalar, out-of-order pipeline, via the use of a special cache nested deeply into the processor structure. By reusing instructions that are already decoded, reordered, and have their registers already renamed, the front end of the pipeline can be turned off for large periods of time with significant savings in the overall power consumption. Experimental results show up to 35% (30% on average) savings in average energy per committed instruction, and 35% (20% on average) savings in energy-delay product, with about 9% average performance loss, over a large spectrum of SPEC95 and SPEC2000 benchmarks. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Speed and voltage selection for GALS systems based on voltage/frequency islands

    Niyogi, K. ; Marculescu, D.
    Design Automation Conference, 2005. Proceedings of the ASP-DAC 2005. Asia and South Pacific

    Volume: 1
    DOI: 10.1109/ASPDAC.2005.1466176
    Publication Year: 2005 , Page(s): 292 - 297 Vol. 1
    Cited by:  Papers (28)

    IEEE Conference Publications

    Due to increasing clock speeds and shrinking technologies, distributing a single global clock signal throughout a chip is becoming a difficult and challenging proposition. In this paper, we address the problem of energy optimal local speed and voltage selection in frequency/voltage island based systems under given performance constraints. Our results show that static voltage and speed assignment can achieve up to 42% savings in total energy for various media and signal processing applications, while application specific dynamic approaches provide up to 44% energy savings in the case of MPEG-2 encoder application, when compared to a single clocked system architecture. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Increased scalability and power efficiency by using multiple speed pipelines

    Talpes, E. ; Marculescu, D.
    Computer Architecture, 2005. ISCA '05. Proceedings. 32nd International Symposium on

    DOI: 10.1109/ISCA.2005.33
    Publication Year: 2005 , Page(s): 310 - 321
    Cited by:  Patents (1)

    IEEE Conference Publications

    One of the most important problems faced by microarchitecture designers is the poor scalability of some of the current solutions with increased clock frequencies and wider pipelines. As several studies show, internal processor structures scale differently with decreasing device sizes. While in some cases the access latency is determined by the speed of the logic circuitry, for others it is dominated by the interconnect delay. Furthermore, while some stages can be super-pipelined with relatively small performance loss, others must be kept atomic. This paper proposes a possible solution to this problem, avoiding the traditional trade-off between parallelism and clock speed. First, allowing instructions to enter and leave the Issue Window in an asynchronously manner enables faster speeds in the front-end at the expense of small synchronization latencies. Second, using an Execution Cache for storing instructions that are already scheduled allows for bypassing the issue circuitry and thus clocking the execution core at higher frequencies. Combined, these two mechanisms result in a 50% to 60% performance increase for our test microarchitecture, without requiring a completely new scheduling mechanism. Furthermore, the proposed microarchitecture requires significantly less energy, with 30% reduction in a 0.1 Sum or 20% in a 0.06um process technology over the original baseline. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Energy awareness and uncertainty in microarchitecture-level design

    Marculescu, D. ; Talpes, E.
    Micro, IEEE

    Volume: 25 , Issue: 5
    DOI: 10.1109/MM.2005.86
    Publication Year: 2005 , Page(s): 64 - 76
    Cited by:  Papers (7)

    IEEE Journals & Magazines

    The authors present microarchitecture-level statistical models for characterizing process and system parameter variability, concentrating on gate length and on-chip temperature variations. To assess the effect of microarchitecture decisions on these variations, and vice versa, they propose a joint performance, power, and variability metric that distinguishes among various design choices. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Toward a multiple clock/voltage island design style for power-aware processors

    Talpes, E. ; Marculescu, D.
    Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

    Volume: 13 , Issue: 5
    DOI: 10.1109/TVLSI.2005.844305
    Publication Year: 2005 , Page(s): 591 - 603
    Cited by:  Papers (14)  |  Patents (4)

    IEEE Journals & Magazines

    Enabled by the continuous advancement in fabrication technology, present-day synchronous microprocessors include more than 100 million transistors and have clock speeds well in excess of the 1-GHz mark. Distributing a low-skew clock signal in this frequency range to all areas of a large chip is a task of growing complexity. As a solution to this problem, designers have recently suggested the use of frequency islands that are locally clocked and externally communicate with each other using mixed clock communication schemes. Such a design style fits nicely with the recently proposed concept of voltage islands that, in addition, can potentially enable fine-grain dynamic power management by simultaneous voltage and frequency scaling. This paper proposes a design exploration framework for application-adaptive multiple-clock processors which provides the means for analyzing and identifying the right interdomain communication scheme and the proper granularity for the choice of voltage/frequency islands in case of superscalar, out-of-order processors. In addition, the presented design exploration framework allows for comparative analysis of newly proposed or already published application-driven dynamic power management strategies. Such a design exploration framework and accompanying results can help designers and computer architects in choosing the right design strategy for achieving better power-performance tradeoffs in multiple-clock high-end processors. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Application adaptive energy efficient clustered architectures

    Marculescu, D.
    Low Power Electronics and Design, 2004. ISLPED '04. Proceedings of the 2004 International Symposium on

    DOI: 10.1109/LPE.2004.1349363
    Publication Year: 2004 , Page(s): 344 - 349
    Cited by:  Patents (7)

    IEEE Conference Publications

    As clock frequency and die area increase, achieving energy efficiency, while distributing a low skew, global clock signal becomes increasingly difficult. Challenges imposed by deep-submicron technologies can be alleviated by using a multiple voltage/multiple frequency island design style, otherwise called the globally asynchronous, locally synchronous (GALS) design paradigm. This paper proposes a clustered architecture that enables application-adaptive energy efficiency through the use of dynamic voltage scaling for application code that is rendered non-critical for the overall performance, at run-time. As opposed to task scheduling using dynamic voltage scaling (DVS) that exploits workload variations across applications, our approach targets workload variations within the same application, while on-the fly classifying code as critical or noncritical and adapting to changes in the criticality of such code portions. Our results show that application adaptive variable voltage/variable frequency clustered architectures are up to 22% better in energy and 11% better in energy-delay product than their non-adaptive counterparts, while providing up to 31% more energy savings when compared to DVS applied globally. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Impact of technology scaling on energy aware execution cache-based microarchitectures

    Talpes, E. ; Marculescu, D.
    Low Power Electronics and Design, 2004. ISLPED '04. Proceedings of the 2004 International Symposium on

    DOI: 10.1109/LPE.2004.1349306
    Publication Year: 2004 , Page(s): 50 - 53

    IEEE Conference Publications

    Reducing total power consumption in high performance microprocessors can be achieved by limiting the amount of logic involved in decoding, scheduling and executing each instruction. One of the solutions to this problem involves the use of a microarchitecture based on an Execution Cache (EC) whose role is to cache already done work for later reuse. In this paper, we explore the design space for such a microarchitecture, looking at how the cache size, associativity and replacement algorithm affect the overall performance and power efficiency. We also look at the scalability of this solution across next process generations, evaluating the energy efficiency of such caching mechanisms in the presence of increasing leakage power. Over a spectrum of SPEC2000 benchmarks, an average of 35% energy reduction is achieved for technologies ranging from 130nm to 90nm and 65nm, at the expense of a negligible performance hit. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Impact of Technology Scaling on Energy Aware Execution Cache-Based Microarchitectures

    Marculescu, D.
    Low Power Electronics and Design, 2004. ISLPED '04. Proceedings of the 2004 International Symposium on

    Publication Year: 2004 , Page(s): 50 - 53

    IEEE Conference Publications

    Reducing total power consumption in high performance microprocessors can be achieved by limiting the amount of logic involved in decoding, scheduling and executing each instruction. One of the solutions to this problem involves the use of a microarchitecture based on an Execution Cache (EC) whose role is to cache already done work for later reuse. In this paper, we explore the design space for such a microarchitecture, looking at how the cache size, associativity and replacement algorithm affect the overall performance and power efficiency. We also look at the scalability of this solution across next process generations, evaluating the energy efficiency of such caching mechanisms in the presence of increasing leakage power. Over a spectrum of SPEC2000 benchmarks, an average of 35% energy reduction is achieved for technologies ranging from 130nm to 90nm and 65nm, at the expense of a negligible performance hit. View full abstract»

  • Full text access may be available. Click article title to sign in or learn about subscription options.

    Mixed-clock issue queue design for energy aware, high-performance cores

    Rapaka, V.S.P. ; Talpes, E. ; Marculescu, D.
    Design Automation Conference, 2004. Proceedings of the ASP-DAC 2004. Asia and South Pacific

    DOI: 10.1109/ASPDAC.2004.1337603
    Publication Year: 2004 , Page(s): 380 - 383
    Cited by:  Papers (2)

    IEEE Conference Publications

    Globally-asynchronous, locally-synchronous (GALS) design style has started to gain interest recently as a possible solution to the increased design complexity, power and thermal costs, as well as an enabler for allowing fine grain speed and voltage management. Due to its inherent complexity, a possible driver application for such a design style is the case of superscalar, out-of-order processors. We propose a novel mixed-clock issue queue design, and compares and contrasts this new implementation with existing synchronous or mixed-clock versions of issue queues, used in standalone mode or in conjunction with mixed-clock FIFO (first-in, first-out) buffers for inter-domain synchronization. Both transistor level, SPICE simulation, as well as cycle-accurate, microarchitectural analysis, show that cores using mixed-clock issue queues are able to provide better energy-performance operating points when compared to their synchronous or asynchronous FIFO-based counterparts. View full abstract»

Skip to Results

SEARCH HISTORY

Search History is available using your personal IEEE account.