By Topic

Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on

Issue 3 • Date March 2007

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (44 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Guest Editorial [intro. to the special issue on the 2006 IEEE/ACM Design, Automation and Test in Europe Conference]

    Page(s): 405 - 407
    Save to Project icon | Request Permissions | PDF file iconPDF (73 KB)  
    Freely Available from IEEE
  • A Framework for Cosynthesis of Memory and Communication Architectures for MPSoC

    Page(s): 408 - 420
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1607 KB) |  | HTML iconHTML  

    Memory and communication architectures have a significant impact on the cost, performance, and time-to-market of complex multiprocessor system-on-chip (MPSoC) designs. The memory architecture dictates most of the data traffic flow in a design, which in turn influences the design of the communication architecture. Thus, there is a need to cosynthesize the memory and communication architectures to avoid making suboptimal design decisions. This is in contrast to traditional platform-based design approaches where memory and communication architectures are synthesized separately. In this paper, the authors propose an automated application-specific cosynthesis framework for memory and communication architecture (COSMECA) in MPSoC designs. The primary objective is to design a communication architecture having the least number of buses, which satisfies performance and memory-area constraints, while the secondary objective is to reduce the memory-area cost. Results of applying COSMECA to several industrial strength MPSoC applications from the networking domain indicate a saving of as much as 40% in number of buses and 29% in memory area compared to the traditional approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Layout-Aware Analysis of Networks-on-Chip and Traditional Interconnects for MPSoCs

    Page(s): 421 - 434
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1106 KB) |  | HTML iconHTML  

    The ever-shrinking lithographic technologies available to chip designers enable performance and functionality breakthroughs; yet, they bring new hard problems. For example, multiprocessor systems-on-chip featuring several processing elements can be conceived, but efficiently interconnecting them while keeping the design complexity manageable is a challenge. Traditional buses are easy to deploy, but cannot provide enough bandwidth for such complex systems. A departure from legacy architectures is therefore called for. One radical path is represented by packet-switching networks-on-chip, whereas a more conservative approach interleaves bandwidth-rich components (e.g., crossbars) within the preexisting fabrics. This paper is aimed at analyzing the strengths and weaknesses of these alternative approaches by performing a thorough analysis based on actual chip floorplans after the interconnection place&route stages and after a clock tree has been distributed across the layout. Performance, area, and power results will be discussed while keeping an eye on the scalability prospects in future technology nodes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Introduction of Architecturally Visible Storage in Instruction Set Extensions

    Page(s): 435 - 446
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1116 KB) |  | HTML iconHTML  

    Instruction set extensions (ISEs) can be used effectively to accelerate the performance of embedded processors. The critical and difficult task of ISE selection is often performed manually by designers. A few automatic methods for ISE generation have shown good capabilities but are still limited in the handling of memory accesses, and so they fail to directly address the memory wall problem. We present here the first ISE identification technique that can automatically identify state-holding application-specific functional units (AFUs) comprehensively, thus being able to eliminate a large portion of memory traffic from cache and the main memory. Our cycle-accurate results obtained by the SimpleScalar simulator show that the identified AFUs with architecturally visible storage gain significantly more than previous techniques and achieve an average speedup of 2.8times over pure software execution with a little area overhead. Moreover, the number of required memory-access instructions is reduced by two thirds on average, suggesting corresponding benefits on energy consumption View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-Power Optimization by Smart Bit-Width Allocation in a SystemC-Based ASIC Design Environment

    Page(s): 447 - 455
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (490 KB) |  | HTML iconHTML  

    The modern era of embedded system design is geared toward the design of low-power systems. One way to reduce power in an application-specified integrated circuit (ASIC) implementation is to reduce the bit-width precision of its computation units. This paper describes algorithms to optimize the bit widths of fixed-point variables for low power in a SystemC-based ASIC design environment. We propose an optimal bit-width allocation algorithm for two variables and a greedy heuristic that works for any number of variables. The algorithms are used in the automation of converting floating-point SystemC programs into ASIC synthesizable SystemC programs. Expected inputs are profiled to estimate errors in the finite precision conversions. Experimental results for the tradeoffs between quantization error, power consumption, and hardware resources used are reported on a set of four SystemC benchmarks that are mapped onto a 0.18-mum ASIC cell library from Artisan Components. We demonstrate that it is possible to reduce the power consumption by 50% on the average by allowing roundoff errors to increase from 0.5% to 1% View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing Sequential Cycles Through Shannon Decomposition and Retiming

    Page(s): 456 - 467
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (512 KB) |  | HTML iconHTML  

    Optimizing sequential cycles is essential for many types of high-performance circuits, such as pipelines for packet processing. Retiming is a powerful technique for speeding pipelines, but it is stymied by tight sequential cycles. Designers usually attack such cycles by manually combining Shannon decomposition with retiming-effectively a form of speculation-but such manual decomposition is error prone. We propose an efficient algorithm that simultaneously applies Shannon decomposition and retiming to optimize circuits with tight sequential cycles. While the algorithm is only able to improve certain circuits (roughly half of the benchmarks we tried), the performance increase can be dramatic (7%-61%) with only a modest increase in area (1%-12%). The algorithm is also fast, making it a practical addition to a synthesis flow View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computing the Soft Error Rate of a Combinational Logic Circuit Using Parameterized Descriptors

    Page(s): 468 - 479
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (603 KB) |  | HTML iconHTML  

    Soft errors have emerged as an important reliability challenge for nanoscale very large scale integration designs. In this paper, we present a fast and efficient soft error rate (SER) analysis methodology for combinational circuits. We first present a novel parametric waveform model based on the Weibull function to represent particle strikes at individual nodes in the circuit. We then describe the construction of the descriptor object that efficiently captures the correlation between the transient waveforms and their associated rate distribution functions. The proposed algorithm consists of operations to inject, propagate, and merge these descriptors while traversing forward along the gates in a circuit. The parameterized waveforms enable an efficient static approach to calculate the SER of a circuit. We exercise the proposed approach on a wide variety of combinational circuits and observe that our algorithm has linear runtime with the size of the circuit. The runtimes for soft error estimation were observed to be in the order of about 1 s, compared to several minutes or even hours for previously proposed methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Systematic Methodology for Designing Reconfigurable ∆Σ Modulator Topologies for Multimode Communication Systems

    Page(s): 480 - 496
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1089 KB) |  | HTML iconHTML  

    This paper proposes a systematic methodology for designing reconfigurable continuous-time DeltaSigma modulator topologies. Topologies are optimized by minimizing the complexity of the topologies, maximizing the sharing of circuits between the different modes, maximizing the topology robustness with respect to circuit nonidealities, and minimizing the total power consumption. This paper presents a case study for designing topologies for a three-mode reconfigurable DeltaSigma modulator and compares the obtained topologies with a state-of-the-art design and topologies obtained using the DeltaSigma Toolbox View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Quantifier Structure in Search-Based Procedures for QBFs

    Page(s): 497 - 507
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (471 KB) |  | HTML iconHTML  

    The best currently available solvers for quantified Boolean formulas (QBFs) process their input in prenex form, i.e., all the quantifiers have to appear in the prefix of the formula separated from the purely propositional part representing the matrix. However, in many QBFs derived from applications, the propositional part is intertwined with the quantifier structure. To tackle this problem, the standard approach is to convert such QBFs in prenex form, thereby losing structural information about the prefix. In the case of search-based solvers, the prenex-form conversion introduces additional constraints on the branching heuristic and reduces the benefits of the learning mechanisms. In this paper, we show that conversion to prenex form is not necessary: current search-based solvers can be naturally extended in order to handle nonprenex QBFs and to exploit the original quantifier structure. We highlight the two mentioned drawbacks of the conversion in prenex form with a simple example, and we show that our ideas can also be useful for solving QBFs in prenex form. To validate our claims, we implemented our ideas in the state-of-the-art search-based solver QuBE and conducted an extensive experimental analysis. The results show that very substantial speedups can be obtained View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SLOPES: Hardware–Software Cosynthesis of Low-Power Real-Time Distributed Embedded Systems With Dynamically Reconfigurable FPGAs

    Page(s): 508 - 526
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (745 KB) |  | HTML iconHTML  

    In this paper, we present a multiobjective hardware-software cosynthesis system, called SLOPES, for multirate low-power real-time distributed embedded systems consisting of dynamically reconfigurable field-programmable gate arrays (FPGAs), processors, and heterogeneous communication resources. This cosynthesis algorithm simultaneously optimizes system price and average power consumption. First, we present an evolutionary algorithm that automatically determines the quantities and types of system resources, assigns tasks to different potentially reconfigurable processing elements, and assigns communication events to communication resources. Second, we propose a dynamic priority multirate scheduling algorithm to determine the times at which all the tasks and communication events in the system occur. This two-dimensional scheduling algorithm determines task priorities based on real-time constraints and detailed frame-by-frame FPGA reconfiguration overhead information. Experimental results indicate that the proposed method reduces schedule length by an average of 34.3% and reconfiguration energy by an average of 40.4%, compared to a method that does not consider the effect of partial reconfiguration during synthesis. SLOPES yields multiple system architectures that tradeoff system price and average power consumption under real-time constraints View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Timing-Aware Power-Noise Reduction in Placement

    Page(s): 527 - 541
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (876 KB) |  | HTML iconHTML  

    We describe a placement-level decoupling capacitance (decap) insertion technique whose objective is to reduce power noise, taking into account circuit timing. Our approach consists of prediction and correction steps. Before placement, we estimate the power noise of each cell considering switching frequency of cells that, after placement, will most likely be in the neighborhood. If a frequently switching cell has neighbors that switch infrequently, it is unlikely that this cell will suffer from a power-noise problem. Based on the cell power-noise estimation, we add decap padding to each cell. Then, we invoke a standard cell placement tool and perform power grid analysis. We eliminate the power grid noise by gate sizing. Our technique can allocate decaps to improve power noise, power consumption, and timing. We propose two gate-sizing algorithms. The first one uses a sequence of linear programs (SLP) formulation, and the second one uses a budgeting-based heuristic algorithm. The SLP algorithm can produce better power-noise results than the heuristic, at the expense of runtime. Experimental results show that our techniques can effectively reduce power noise and still meet timing constraints View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automated Energy/Performance Macromodeling of Embedded Software

    Page(s): 542 - 552
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (609 KB) |  | HTML iconHTML  

    This paper presents an automatic methodology to perform characterization-based high-level software macromodeling. Macromodeling-based estimation can be used to speed up simulation-based software-performance/energy estimation. High-level software macromodels, which are significantly faster to evaluate than detailed models of the target hardware platform, can be used instead of the latter during simulation, resulting in orders-of-magnitude simulation speedup. However, in order to realize this potential, significant challenges need to be overcome in both the generation and use of macromodels-including how to identify the parameters to be used in the macromodel, how to define the template function to which the macromodel is fitted, etc. The authors' methodology attempts to address the aforementioned issues. Given a subprogram to be macromodeled for execution time and/or energy consumption, the proposed methodology automates the steps of parameter identification, data collection through detailed simulation, macromodel template selection, and fitting. The authors propose a novel technique to identify potential macromodel parameters and perform data collection, which draws from the concept of data-structure serialization used in distributed programming. They utilize symbolic-regression techniques to concurrently filter out irrelevant macromodel parameters, construct a macromodel function, and derive the optimal coefficient values to minimize fitting error. Experiments with several realistic benchmarks suggest that the proposed methodology improves estimation accuracy and enables wide applicability of macromodeling to complex embedded software, while realizing its potential for estimation speedup View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application-Dependent Delay Testing of FPGAs

    Page(s): 553 - 563
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (531 KB) |  | HTML iconHTML  

    Testing of field-programmable gate array (FPGA) resources used for mapping a particular design (application-dependent testing) is a key factor in FPGA defect tolerance for yield enhancement and cost reduction as well as online testing in adaptive reliable computing. The majority of the FPGA real estate is dedicated to the interconnect network, and defects in the interconnects manifest themselves as delay faults. In this paper, a very thorough application-dependent interconnect delay testing technique is presented. Achieving a high coverage on path delay fault has been traditionally intractable for application-specific integrated circuits. However, by leveraging the reconfigurability of FPGAs, the presented technique is able to achieve 100% robust path delay coverage on all the paths in the design. This automatically results in 100% transition fault coverage. The required number of test configurations is two or four, depending on the structure of the design. Algorithms with linear time complexity are presented for automatic test configuration and test vector generation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance-Driven Crosstalk Elimination at Postcompiler Level—The Case of Low-Crosstalk Op-Code Assignment

    Page(s): 564 - 573
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (961 KB) |  | HTML iconHTML  

    Significant advances in very large-scale integration process technology have scaled the feature size down. One effect of this scaling down is that coupling capacitances have grown reciprocal in the square of the scaling factor. This crosstalk effect will not only increase the power consumption but also lengthen the propagation delay. Since the data sequences on an instruction bus are known during the compile time, this paper presents two compiler algorithms, rescheduling and renaming, for performance improvement by eliminating crosstalk effects on an instruction bus. The results show that our crosstalk-eliminating postcomplier algorithms significantly reduce the dynamic instruction overhead from 11.50% to 0.52% by eliminating the 4middotC crosstalk. Due to the effective 4middotC crosstalk elimination, our proposed method can improve the instruction fetch time up to 9.59% View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Yield Model for Integrated Circuits and its Application to Statistical Timing Analysis

    Page(s): 574 - 591
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (875 KB) |  | HTML iconHTML  

    A model for process-induced parameter variations is proposed, combining die-to-die, within-die systematic, and within-die random variations. This model is put to use toward finding suitable timing margins and device file settings, to verify whether a circuit meets a desired timing yield. While this parameter model is cognizant of within-die correlations, it does not require specific variation models, layout information, or prior knowledge of intrachip covariance trends. The approach works with a "generic" critical path, leading to what is referred to as a "process-specific" statistical-timing-analysis technique that depends only on the process technology, transistor parameters, and circuit style. A key feature is that the variation model can be easily built from process data. The derived results are "full-chip," applicable with ease to circuits with millions of components. As such, this provides a way to do a statistical timing analysis without the need for detailed statistical analysis of every path in the design View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Verification of Hazard-Freedom in Gate-Level Timed Asynchronous Circuits

    Page(s): 592 - 605
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (403 KB) |  | HTML iconHTML  

    This paper presents an efficient method for verifying hazard-freedom in gate-level timed asynchronous circuits. Timed circuits are a class of asynchronous circuits that are optimized using explicit timing information. In asynchronous circuits, correct operation requires that there are no hazards in the circuit implementation. Therefore, when designing an asynchronous circuit, each internal node and output of the circuit must be verified for hazard-freedom to ensure correct operation. Current verification algorithms for timed circuits require an explicit state exploration that often results in state explosion for even modest-sized examples. The goal of this paper is to abstract the behavior of internal nodes and utilize this information to make a conservative determination of hazard-freedom for each node in the circuit. Experimental results indicate that this approach is substantially more efficient than existing timing verification tools. These results also indicate that this method scales well for large examples that could not be previously analyzed, in that it is capable of analyzing these circuits in less than a second. While this method is conservative in that some false hazards may be reported, our results indicate that their number is small View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On a Generalized Framework for Modeling the Effects of Process Variations on Circuit Delay Performance Using Response Surface Methodology

    Page(s): 606 - 614
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (396 KB) |  | HTML iconHTML  

    A generalized methodology for modeling the effects of process variations on circuit delay performance is proposed by directly relating the variations in process parameters to variations in delay metric of a digital circuit. The 2-input nand gate is used as a library element for 65 nm gate length technology, whose delay is extensively characterized by mixed-mode simulations. This information is then used in a general-purpose circuit simulator SEQUEL, by incorporating appropriate templates for the nand gate library. A 4-bit times 4-bit Wallace tree multiplier circuit, consisting of about 300 2-input nand gates, is used as a representative combinational circuit to demonstrate the proposed methodology. The variation in the multiplier delay is characterized by an extensive Monte Carlo analysis. To extend this methodology for a generic technology library with a variety of library elements, modeling of nand gate delays by response surface methodology (RSM), in terms of process parameters, is carried out using design of experiments (DOE). A simple piecewise quadratic model, based on the least squares method (LSM), is proposed for one-parameter variation to address significant cubic effects observed in the delay response function. Then, a hybrid model for gate delays is generated by superimposing the interaction terms of DOE-RSM model upon the quadratic model of one-parameter variation to address the generalized case of simultaneous variations in multiple process parameters. The proposed methodology has been demonstrated for nand gate library with 266 gates, and the simplicity and generality of the approach make it equally applicable to a large library of cells for both statistical timing analysis and statistical circuit simulation at the gate level View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2007 IEEE International Symposium on Circuits and Systems (ISCAS 2007)

    Page(s): 615
    Save to Project icon | Request Permissions | PDF file iconPDF (617 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Information for authors

    Page(s): 616
    Save to Project icon | Request Permissions | PDF file iconPDF (24 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (24 KB)  
    Freely Available from IEEE

Aims & Scope

The purpose of this Transactions is to publish papers of interest to individuals in the areas of computer-aided design of integrated circuits and systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

VIJAYKRISHNAN NARAYANAN
Pennsylvania State University
Dept. of Computer Science. and Engineering
354D IST Building
University Park, PA 16802, USA
vijay@cse.psu.edu