By Topic

Computer Aided Design, 2004. ICCAD-2004. IEEE/ACM International Conference on

Date 7-11 Nov. 2004

Filter Results

Displaying Results 1 - 25 of 200
  • Accurate estimation of global buffer delay within a floorplan

    Page(s): 706 - 711
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (818 KB) |  | HTML iconHTML  

    Closed formed expressions for buffered interconnect delay approximation have been around for some time. However, previous approaches assume that buffers are free to be placed anywhere. In practice, designs frequently have large blocks that make the ideal buffer insertion solution unrealizable. The theory of Otten (1998) is extended to show how one can model the blocks into a simple delay estimation technique that applies both to two-pin and to multi-pin nets. Even though the formula uses one buffer type, it shows remarkable accuracy in predicting delay when compared to an optimal realizable buffer insertion solution. Potential applications include wire planning, timing analysis during floorplanning or global routing. Our experiments show that our approach accurately predicts delay when compared to constructing a realizable buffer insertion with multiple buffer types. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous short-path and long-path timing optimization for FPGAs

    Page(s): 838 - 845
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (942 KB) |  | HTML iconHTML  

    This work presents the routing cost valleys (RCV) algorithm - the first published algorithm that simultaneously optimizes all short- and long-path timing constraints in a field-programmable gate array (FPGA). RCV is comprised of a new slack allocation algorithm that produces both minimum and maximum delay budgets for each circuit connection, and a new router that strives to meet and, if possible, surpass these connection delay constraints. RCV achieves excellent results. On a set of 100 large circuits, RCV improves both long-path and short-path timing slack significantly vs. an earlier computer-aided design (CAD) system that focuses solely on long-path timing. Even with no short-path timing constraints, RCV improves the clock speed of circuits by 3.9% on average. Finally, RCV is able to meet timing on all 72 peripheral component interconnect (PCI) cores tested, while an earlier algorithm fails to achieve timing on all 72 cores. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A stochastic integral equation method for modeling the rough surface effect on interconnect capacitance

    Page(s): 887 - 891
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB) |  | HTML iconHTML  

    In This work we describe a stochastic integral equation method for computing the mean value and the variance of capacitance of interconnects with random surface roughness. An ensemble average Green's function is combined with a matrix Neumann expansion to compute nominal capacitance and its variance. This method avoids the time-consuming Monte Carlo simulations and the discretization of rough surfaces. Numerical experiments show that the results of the new method agree very well with Monte Carlo simulation results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust analog/RF circuit design with projection-based posynomial modeling

    Page(s): 855 - 862
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (884 KB) |  | HTML iconHTML  

    We propose a robust analog design tool (ROAD) for post-tuning analog/RF circuits. Starting from an initial design derived from hand analysis or analog circuit synthesis based on simplified models, ROAD extracts accurate posynomial performance models via transistor-level simulation and optimizes the circuit by geometric programming. Importantly, ROAD sets up all design constraints to include large-scale process variations to facilitate the tradeoff between yield and performance. A novel convex formulation of the robust design problem is utilized to improve the optimization efficiency and to produce a solution that is superior to other local tuning methods. In addition, a novel projection-based approach for posynomial fitting is used to facilitate scaling to large problem sizes. A new implicit power iteration algorithm is proposed to find the optimal projection space and extract the posynomial coefficients with robust convergence. The efficacy of ROAD is demonstrated on several circuit examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application-specific buffer space allocation for networks-on-chip router design

    Page(s): 354 - 361
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (910 KB) |  | HTML iconHTML  

    We present a system-level buffer planning algorithm that can be used to customize the router design in networks-on-chip (NoCs). More precisely, given the traffic characteristics of the target application and the buffering space budget, our algorithm automatically assigns the buffer depth for each input channel, in different routers across the chip, to match the communication pattern, such that the overall performance is maximized. This is in deep contrast with the uniform assignment of buffering resources (currently used in NoC design) which can significantly degrade the overall system performance. For instance, for a complex audio/video application, about 85% savings in buffering resources can be achieved by smart buffer allocation using our algorithm without any reduction in performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal wire retiming without binary search

    Page(s): 452 - 458
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (856 KB) |  | HTML iconHTML  

    The problem of retiming over a netlist of macro-blocks to achieve the minimal clock period, where the block internal structures may not be changed and flip-flops may not be inserted on some wire segments, is called the optimal wire retiming problem. To the best of our knowledge, there is no polynomial-time approach to solve it and the existence of such an approach is still an open question. We present a brand new algorithm that solves the optimal wire retiming problem with polynomial-time worst case complexity. Since the new algorithm avoids binary search and is essentially incremental, it has the potential of being combined with other optimization techniques. Experimental results show that the new algorithm is very efficient in practice. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel clock distribution and dynamic de-skewing methodology

    Page(s): 626 - 631
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (807 KB)  

    In present day VLSI ICs, intra-die processing variations are becoming harder to control, resulting in a large skew in the clock signals at the end of the clock distribution network. We describe a buffered H-tree technique to distribute the clock signal and to de-skew a clock network. The clock shielding wires (which are connected to GND in normal operation) are, in de-skewing mode, used to selectively return the clock signal for de-skewing, and for serial communication with the clock distribution sites for skew adjustment. Our forward and return clock networks are buffered, with identically sized and co-located wires and buffers. This results in both these networks exhibiting identical delay characteristics in the presence of intra-die process variations. Unlike existing approaches, our method utilizes a single phase detection circuit, and can achieve a very low maximum chip-level clock skew. This skew value is not dependent on the resolution of the phase detector. Further, our technique can be applied dynamically, either at boot time or periodically during the operation of the IC, as necessary. Additionally, our buffered H-tree enables us to implement efficient clock gating by allowing the user to turn off clocks in the distribution network itself, thus disabling entire sections of the clock network. We demonstrate the utility of our technique on a 6-level H-tree clock distribution network. In a clock distribution network which is initially skewed by up to 300ps, our technique can de-skew signals to within 4ps of each other. We show that the total wiring area of our clock distribution and de-skewing methodology is about 35% higher than a traditional H-tree (which does not have a deskewing functionality), while the active logic area overhead is about 25%. The power consumption of our network is 5% lower than that of a traditional H-tree network with no de-skewing functionality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A chip-level electrostatic discharge simulation strategy

    Page(s): 315 - 318
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (664 KB) |  | HTML iconHTML  

    This work presents a chip-level charged device model (CDM) electrostatic discharge (ESD) simulation method. The chip-level simulation is formulated as a DC analysis problem. A network reduction algorithm based on random walks is proposed for rapid analysis, and to support incremental design. A benchmark with a 2.3M-node VDD net and 1000 I/O pads is checked in 13 minutes, and 10 re-simulations for incremental changes take a total of 9 minutes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computation of signal threshold crossing times directly from higher order moments

    Page(s): 246 - 253
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (799 KB) |  | HTML iconHTML  

    This work introduces a simple method for calculating the times at which any signal crosses a pre-specified threshold voltage (e.g. 10%, 20%, 50%, etc.) directly from the moments. The method can use higher order moments to asymptotically improve the accuracy of the estimated crossing times. This technique bypasses the steps involved in calculating poles and residues to obtain time-domain information. Once q moments are calculated, only 2q multiplications and (q-I) additions are required to determine any threshold crossing time at a certain node. Moreover, this technique avoids other problems such as pole instability. Several orders of approximations are presented for different threshold crossing times depending on the number of moments involved. For example, the worst case error of a first to a seventh order (single to seven moments) approximation of 50% RC delay is 1650%, 192.26%, 11.31%, 3.37%, 2.57%, 2.56%, and 1.43%, respectively. If the whole waveform is required it can be easily determined by interpolation between different threshold crossing points. The presented technique works for RC circuits for both step and nonstep inputs, including piecewise linear waveforms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hybrid techniques for electrostatic analysis of nanowires

    Page(s): 241 - 244
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (626 KB)  

    We propose an efficient approach, namely the hybrid BIE/Poisson/Schrodinger approach, for electrostatic analysis of nanowires. In this approach, the interior and the exterior domain electrostatics are described by Poisson's equation (or Poisson's equation coupled with Schrodinger's equation when quantum-mechanical effects are dominant) and the boundary integral formulation of the potential equation, respectively. We employ a meshless finite cloud method and a boundary cloud method to solve the coupled equations self-consistently. The proposed approach significantly reduces the computational cost and provides a higher accuracy of the solution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive sampling and modeling of analog circuit performance parameters with pseudo-cubic splines

    Page(s): 931 - 938
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (940 KB) |  | HTML iconHTML  

    Many approaches to analog performance parameter macro modeling have been investigated by the research community. These models are typically derived from discrete data obtained from circuit simulation using numerous input combinations of component sizes for a given circuit topology. The simulations are computationally intensive, therefore it is advantageous to reduce the number of simulations necessary to build an accurate macro model. We present a new algorithm for adaptively sampling multi-dimensional black box functions based on Duchon pseudo-cubic splines. The splines readily and accurately model high dimensional functions based on discrete unstructured data and require no tuning of parameters as seen in many other interpolation methods. The adaptive sampler, in conjunction with pseudo-cubic splines, is used to accurately model various analog performance parameters for an operational amplifier topology using fewer sample points than traditional gridded and quasi-random sampling methodologies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FLUTE: fast lookup table based wirelength estimation technique

    Page(s): 696 - 701
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (747 KB)  

    Wirelength estimation is an important tool to guide the design optimization process in early design stages. In this paper, we present a wirelength estimation technique called FLUTE. Our technique is based on pre-computed lookup table to make wirelength estimation very fast and very accurate for low degree nets. We show experimentally that for FLUTE, RMST, and HPWL, the average error in wirelength are 0.72%, 4.23%, and -8.71%, respectively, and the normalized runtime are 1, 1.24, and 0.16, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • Backend CAD flows for "restrictive design rules"

    Page(s): 739 - 746
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (806 KB)  

    To meet challenges of deep-subwavelength technologies (particularly 130 nm and following), lithography has come to rely increasingly on data processes such as shape fill, optical proximity correction, and RETs like altPSM. For emerging technologies (65 nm and following) the computation cost and complexity of these techniques are themselves becoming bottlenecks in the design-silicon flow. This has motivated the recent calls for restrictive design rules such as fixed width/pitch/orientation of gate-forming polysilicon features. We have been exploring how design might take advantage of these restrictions, and present some preliminary ideas for how we might reduce the computational cost throughout the back end of the design flow through the post-tapeout data processes while improving quality of results: the reliability of OPC/RET algorithms and the accuracy of models of manufactured products. We also believe that the underlying technology, including simulation and analysis, may be applicable to a variety of approaches to design for manufacturability (DFM). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimizing the number of test configurations for FPGAs

    Page(s): 899 - 902
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (634 KB) |  | HTML iconHTML  

    FPGA test cost can be greatly reduced by minimizing the number of test configurations. A test technique is presented for FPGAs with multiplexer-based routing architectures in which multiple logical paths through each multiplexer is enabled instead of only one path. It is shown that for Xilinx Virtex-II and Spartan-3 FPGAs only 8 test configurations are required to achieve 100% stuck-at, PIP stuck-on, and PIP stuck-off fault coverage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Leakage control through fine-grained placement and sizing of sleep transistors

    Page(s): 533 - 536
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (714 KB)  

    Leakage power is increasingly gaining importance with technology scaling. Multi-threshold CMOS (MTCMOS) technology has become a popular technique for standby power reduction. Sleep transistor insertion in circuits is an effective application of MTCMOS technology for reducing leakage power. In This work we present a fine grained approach where each gate in the circuit is provided an independent sleep transistor. Key advantages of this approach include better circuit slack utilization and improvements in signal integrity (which is a major disadvantage in clustering based approaches). To this end, we propose an optimal polynomial time fine grained sleep transistor sizing algorithm. We also prove the selective sleep transistor placement problem as NP-complete and propose an effective heuristic. Finally, in order to reduce the sleep transistor area penalty (which might get high since clustering is not performed), we propose a placement area constrained sleep transistor sizing formulation. Our experiments show that on an average the sleep transistor placement and optimal sizing algorithm gave 69.7% and 59.0% savings in leakage power as compared to the conventional fixed delay penalty algorithms for 5 and 7% circuit slowdown respectively. Moreover the post placement area penalty was less than 5% which is comparable to clustering schemes according to Mohab Anis et al. (2003). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Clock schedule verification under process variations

    Page(s): 619 - 625
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (807 KB) |  | HTML iconHTML  

    With aggressive scaling down of feature sizes in VLSI fabrication, process variations have become a critical issue in designs, especially for high-performance ICs. Usually having level-sensitive latches for their speed, high-performance IC designs need to verify the clock schedules. With process variations, the verification needs to compute the probability of correct clocking. Because of complex statistical correlations, traditional iterative approaches are difficult to get accurate results. Instead, a statistical checking of the structural conditions for correct clocking is proposed, where the central problem is to compute the probability of having a positive cycle in a graph with random edge weights. The proposed method only traverses the graph once to avoid the correlations among iterations, and it considers not only data delay variations but also clock skew variations. Experimental results showed that the proposed approach has an error of 0.14% on average in comparisons with the Monte Carlo simulations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Timing macro-modeling of IP blocks with crosstalk

    Page(s): 155 - 159
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (744 KB) |  | HTML iconHTML  

    With the increase of design complexities and the decrease of minimal feature sizes, IP reuse is becoming a common practice while crosstalk is becoming a critical issue that must be considered. This work presents two macro-models for specifying the timing behaviors of combinational hard IP blocks with crosstalk effects. The gray-box model keeps a coupling graph and lists the conditions on relative input arrival time combinations for couplings not to take effect. The black-box model stores the output response windows for a basic set of relative input arrival time combinations, and computes the output arrival time for any given input arrival time combination through the union of some combinations in the basic set. Both macro-models are conservative, and can greatly reduce the pessimism existing in the conventional "pin-to-pin" model. This is the first work to deal with timing macro-modeling of combinational hard IP blocks with the consideration of crosstalk effects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Techniques for improving the accuracy of geometric-programming based analog circuit design optimization

    Page(s): 863 - 870
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (787 KB) |  | HTML iconHTML  

    We present techniques for improving the accuracy of geometric-programming (GP) based analog circuit design optimization. We describe major sources of discrepancies between the results from optimization and simulation, and propose several methods to reduce the error. Device modeling based on convex piecewise-linear (PWL) function fitting is introduced to create accurate active and passive device models. We also show that in selected cases GP can enable nonconvex constraints such as bias constraints using monotonicity, which help reduce the error. Lastly, we suggest a simple method to take the modeling error into account in GP optimization, which results in a robust design over the inherent errors in GP device models. Two-stage operational amplifier and on-chip spiral inductor designs are given as examples to demonstrate the presented ideas. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient full-chip thermal modeling and analysis

    Page(s): 319 - 326
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (873 KB) |  | HTML iconHTML  

    The ever-increasing power consumption and packaging density of integrated systems creates on-chip temperatures and gradients that can have a substantial impact on performance and reliability. While it is conceptually understood that a thermal equivalent circuit can be constructed to characterize the temperature gradients across the chip, direct and iterative solutions of the corresponding 3D equations are often intractable for a full-chip analysis. Multigrid accelerated iterative methods can be applied to solve the equivalent circuit problem that is provably symmetric positive definite; however, explicitly building the matrix problem is intractable for most full-chip problems. In This work we present a multigrid iterative approach for the full-chip thermal analysis which does not require explicit construction of the equivalent circuit matrix. We propose specific multigrid treatments to cope with the strong anisotropy of the full-chip thermal problem that is created by the vast difference in material thermal properties and chip geometries. Importantly, we demonstrate that only with careful thermal modeling assumptions and appropriate choices for grid hierarchy, multigrid operators and smoothing steps across grid points, can we accurately and efficiently analyze a full-chip thermal problem. Experimental results demonstrate the efficacy of the proposed multigrid methodology. Our prototyped thermal simulator is able to solve a steady-state problem with more than 10 million unknowns in 125 CPU seconds with a peak memory usage of 231 mega bytes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast simulation of VLSI interconnects

    Page(s): 93 - 98
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (674 KB)  

    This work introduces an efficient and accurate interconnect simulation technique. A new formulation for typical VLSI interconnect structures is proposed which, in addition to providing a compact set of modeling equations, also offers a potential for exploiting sparsity at the simulation level. Simulations show that our approach can achieve 50 × improvement in computation time and memory over INDUCTWISE (which in turn has been shown to be 400 × faster than SPICE) while preserving simulation accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-level synthesis: an essential ingredient for designing complex ASICs

    Page(s): 775 - 782
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (863 KB)  

    It is common wisdom that synthesizing hardware from higher-level descriptions than Verilog incurs a performance penalty. The case study here shows that this need not be the case. If the higher-level language has suitable semantics, it is possible to synthesize hardware that is competitive with hand-written Verilog RTL. Differences in the hardware quality are dominated by architecture differences and, therefore, it is more important to explore multiple hardware architectures. This exploration is not practical without quality synthesis from higher-level languages. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A robust cell-level crosstalk delay change analysis

    Page(s): 147 - 154
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (855 KB) |  | HTML iconHTML  

    In This work we present a robust and efficient methodology for crosstalk-induced delay change analysis for ASIC design styles. The approach employs optimization methods to search for worst aggressor alignment, and it computes crosstalk induced delay change on each stage considering an impact on downstream logic. Computational efficiency is achieved using pre-characterized current models for drivers and compact macromodels for interconnect. The proposed methodology has been implemented in a commercial noise analysis tool. Experimental results obtained on industrial designs demonstrate high accuracy and reduced pessimism of the proposed methodology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.