By Topic

Computer Design, 2001. ICCD 2001. Proceedings. 2001 International Conference on

Date 23-26 Sept. 2001

Filter Results

Displaying Results 1 - 25 of 90
  • Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001

    Save to Project icon | Request Permissions | PDF file iconPDF (603 KB)  
    Freely Available from IEEE
  • The in-car computing network: an embedded systems challenge

    Page(s): 3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (51 KB)  

    Summary form only given, as follows. Modern vehicles are in fact computer networks on wheels. Up to 60 electronic control units are connected using various networking technologies such as CAN (Controler Area Network) or MOST (optical fibres for multimedia content). Many of the control units use state of the art microcontrollers and have complex analog and digital interfacing circuitry. Software is playing an ever increasing role in the definition of vehicle control functions. The speech is giving an overview on modern vehicle electronic architecture and its inherent hardware and software challenges. The growth fields of automotive electronics will be highlighted and examples of next generation applications will be given. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Moore's law meets Shannon's law: the evolution of the communication's industry

    Page(s): 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (74 KB)  

    Summary form only given, as follows. The insatiable demand for data and connectivity at the user level, driven primarily by the everincreasing horsepower of the desktop computer, has dramatically impacted the evolution of the communications market. In a period of 20 years we have progressed from 300 baud modems to multi-terabit fiber backbones. However, the downside to rapid evolution is that it often creates industry fragmentation and broad market swings. The presenter will briefly review the history of the communications industry demonstrating how it has evolved in relation to the desire and need to move vast amounts of data swiftly and costeffectively. He will also examine the industry today - what new demands and requirements are testing today??s designers, how the current technology constraints are being addressed, and how and where Intel intends to play. Finally, he will discuss the complex requirements of the communications industry moving into the future and the likely direction of potential solutions to meet these rapidly changing demands. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gate sizing to eliminate crosstalk induced timing violation

    Page(s): 186 - 191
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (663 KB) |  | HTML iconHTML  

    Digital circuits manufactured in deep sub-micron technologies may experience crosstalk-induced delay and noise signals. Crosstalk-induced delay can be quite significant and sensitive to the driver strength of coupling neighbors. In this paper, we propose gate-sizing techniques to reduce delay in presence of crosstalk effects. The techniques are based on our (2001) previously proposed crosstalk aware static timing analysis. Our experiments show that the proposed techniques are effective and may help designers achieve faster timing closure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Page(s): 557 - 559
    Save to Project icon | Request Permissions | PDF file iconPDF (189 KB)  
    Freely Available from IEEE
  • Arithmetic transforms for verifying compositions of sequential datapaths

    Page(s): 348 - 353
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (520 KB) |  | HTML iconHTML  

    We address the issue of obtaining compact canonical representations of datapath circuits with sequential elements, for the purpose of equivalence checking. First, we demonstrate the mechanisms for efficient compositional construction of arithmetic transform (AT), which is the underlying function representation, used in modern word-level decision diagrams. Next, we introduce a way of generating the canonical transforms of the sequential datapath circuits. Using these principles, we verify by AT the highly sequential distributed arithmetic architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical image computation with dynamic conjunction scheduling

    Page(s): 354 - 359
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (528 KB) |  | HTML iconHTML  

    Image computation is the core operation for optimization and formal verification of sequential systems like controllers or protocols. State exploration techniques based on ordered binary decision diagrams (OBDDs) use a partitioned representation of the transition relation to keep the OBDD-sizes manageable. This paper presents algorithms for building a hierarchically partitioned transition relation and conjunction scheduling based on this partitioning. The conjunction scheduling algorithm allows one to dynamically reorder partitions and is targeted to improve the AndExist operation. Model checking experiments prove the effectiveness of the new algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A performance analysis of the active memory system

    Page(s): 493 - 496
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (344 KB) |  | HTML iconHTML  

    One major problem of using Java in real-time and embedded devices is the non-deterministic turnaround time of dynamic memory management systems (memory allocation and garbage collection). For the allocation, the nondeterminism is often contributed by the time to perform searching, splitting, and coalescing. For the garbage collection, the turnaround time is usually determined by the size of the heap, the number of live objects, the number of object collected, and the amount of garbage collected Even with the current state-of-the-art garbage collectors (generational and incremental schemes), they may or may not guarantee the worst case latency. Moreover such schemes often prolong overall garbage collection time. In this paper, the performance analysis of the proposed Active Memory Module (AMM) for embedded systems is presented Unlike the software counterparts, the AMM can perform a memory allocation in a predictable and hounded fashion (14 cycles). Moreover it can also yield a bounded sweeping time regardless of the number of live objects or heap size. By utilizing the proposed system, the overall speed-up can be as high as 23% over the JDK 1.2.2 running in classic mode View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combined IEEE compliant and truncated floating point multipliers for reduced power dissipation

    Page(s): 497 - 500
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (304 KB) |  | HTML iconHTML  

    Truncated multiplication can be used to significantly reduce power dissipation for applications that do not require correctly rounded results. This paper presents a power efficient method for designing floating point multipliers that can perform either correctly rounded IEEE compliant multiplication or truncated multiplication, based on an input control signal. Compared to conventional IEEE floating point multipliers, these multipliers require only a small amount of additional area and delay, yet provide a significant reduction in power dissipation for applications that do not require IEEE compliant results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A low-power cache design for CalmRISCTM-based systems

    Page(s): 394 - 399
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (656 KB) |  | HTML iconHTML  

    Lowering power consumption in microprocessors, whether used in portables or not, has now become one of the most critical design concerns. On-chip cache memories tend to occupy dominant chip area in microprocessors, and it becomes increasingly important to design power-efficient cache memories. This paper describes an experimental low-power on-chip cache system designed for a 32-bit processor core called CalmRISCTM-32. A number of architectural optimizations were applied to the instruction and data caches, which significantly decrease the number of tag and data memory accesses and the amount of memory traffic to and from off-chip memory. Implemented in a 0.18 μm CMOS technology, the presented instruction and data caches consume 90 μA/MHz and 72 μA/MHz at 1.8 V, respectively View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pre-routing estimation of shielding for RLC signal integrity

    Page(s): 553 - 556
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (288 KB) |  | HTML iconHTML  

    The formula-based Keff model is a figure of merit for inductive coupling, and has been used to solve the simultaneous shield insertion and net ordering (SINO) and simultaneous signal and power routing (SPR) problems. The authors first show that the Keff model has a high fidelity compared to the SPICE-computed noise under an accurate RLC circuit model. We then develop simple yet accurate formulae to estimate numbers of shields needed by optimal SINO solutions under the Keff model. Extensive experiments show that our pre-routing estimation has errors less than 10% compared to solutions given by detailed SINO algorithms. These formulae can be used effectively as a pre-routing congestion estimation for layout planning and synthesis View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Introduction to generalized symbolic trajectory evaluation

    Page(s): 360 - 365
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (496 KB) |  | HTML iconHTML  

    Symbolic trajectory evaluation (STE) is a lattice-based model checking technology based on a form of symbolic simulation. It offers an alternative to 'classical' symbolic model checking that, within its domain of applicability, often is much easier to use and much less sensitive to state explosion. The limitation of STE, however, is that it can only express and verify properties over finite time intervals. In this paper, we present a generalized STE (GSTE) that extends STE style model checking to properties over infinite time intervals. We further strengthen the power of GSTE by introducing a form of backward symbolic simulation. It can be shown that these extensions, together with a notion of fairness, give STE the power to verify all ω-regular properties. We use a large-scale industrial memory design to demonstrate the power and practicality of GSTE View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Determining schedules for reducing power consumption using multiple supply voltages

    Page(s): 546 - 552
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (592 KB) |  | HTML iconHTML  

    Dynamic power is the main source of power consumption in CMOS circuits. It depends on the square of the supply voltage. It may significantly be reduced by scaling down the supply voltage of some computational elements in the circuit, with the penalty of an increase of their execution delay. To reduce the dynamic power consumption, without degrading the performance determined assuming that the circuit operates at the highest available supply voltage, the supply voltage of computational elements off critical paths can be scaled down. Defined here as MinPdyn, the problem of minimizing the dynamic power consumption, under performance constraints, by scaling down the supply voltage of computational elements on non-critical paths is NP-hard in general. Solving MinPdyn for multi-phase clocked sequential circuits may allow to reduce their power consumption and the required number of registers. Reducing the number of registers also allows to reduce the power consumption, the number of control signals, and the area of the circuit. In this paper, we focus on devising methods to efficiently solve MinPdyn for designs modeled as cyclic or acyclic graphs. More precisely, once the circuit is optimized for timing constraints, then we look for schedules that allow the computational elements of the circuit to operate at the lowest possible supply voltage. We present an integer linear programming formulation for that problem, which we use to devise a polynomial time solvable method and an exact algorithm based on a branch-and-bound technique. Experimental results confirm the effectiveness of the method and power reduction factors as high as 53.84% were obtained. Also, they show that the exact algorithm produces optimal results in a small number of tries, which is due to the rules used to prune useless solutions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient algorithms for subcircuit enumeration and classification for the module identification problem

    Page(s): 519 - 522
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (328 KB) |  | HTML iconHTML  

    The problem of extracting RTL modules from a gate level netlist has many interesting applications in digital design (V.K. Madiseti, 1999; P. Schaumont et al., 1999; K. Singh and P. Subrahmunyam, 1995), because it provides a conceptual description of the circuit. We approach this transformation by solving two subproblems: the identification of potential modules (candidate subcircuits) and testing them for functional equivalence to known high-level modules (subcircuit identification). We present a technique for unique and comprehensive enumeration of subgraphs of an arbitrary graph, as well as a method of recognizing subgraph isomorphisms. Combined, these results provide a solution to the problem of candidate subcircuit enumeration. These techniques provide both theoretical and practical contributions within design automation and graph theory View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic generation and validation of memory test models for high performance microprocessors

    Page(s): 526 - 529
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB) |  | HTML iconHTML  

    Memory model generation for test has traditionally been a complex, manual process subject to the test engineer's skill and the designer's cleverness. The paper describes an automated memory model generation and validation framework for embedded memories of custom digital designs. The proposed framework allows the description of the characteristics of the memory through a GUI or a language called Memory Description Language (MDL). The framework accepts user-specified information to generate a customized memory primitive with simulation, phase-accurate, and test views. The generated memory primitive undergoes a two-step validation process to ensure its correct functionality and is then used as a library primitive to describe embedded memories View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Static energy reduction techniques for microprocessor caches

    Page(s): 276 - 283
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (608 KB) |  | HTML iconHTML  

    Microprocessor performance has been improved by increasing the capacity of on-chip caches. However, the performance gain comes at the price of increased static energy consumption due to sub-threshold leakage current. This paper compares three techniques for reducing static energy consumption in on-chip level-1 and level-2 caches. One technique employs low-leakage transistors in the memory cell. Another technique, power supply switching can be used to turn off the memory cells and discard their contents. A third alternative is dynamic threshold modulation, which places the memory cells in a standby state that preserves cell contents. In our experiments, we explore the energy/performance trade-offs of these techniques and find that the dynamic threshold modulation achieves the best results for level-1 caches, improving the energy-delay product by 2% in a level-1 instruction cache and 7% in a level-1 data cache. Low-leakage transistors perform best for the level-2 cache as they reduce the static energy by up to 98% and improve the energy-delay product by more than a factor of 50 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-chip oscilloscopes for noninvasive time-domain measurement of waveforms

    Page(s): 221 - 226
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (448 KB) |  | HTML iconHTML  

    High-speed digital design is becoming increasingly analog. In particular, interconnect response at high frequencies can be non-monotonic with "porch steps" and ringing. Crosstalk (both capacitive and inductive) can result in glitches on wires that can produce functional failures in receiving circuits. Most of these important effects are not addressed with traditional ATPG and BIST techniques, which are limited to the binary abstraction. In this work, we explore the feasibility of integrating primitive sampling oscilloscopes on-chip to provide waveforms on selective critical nets for test and diagnosis. The oscilloscopes rely on subsampling techniques to achieve sub-10 psec timing accuracy. High speed samplers are combined with DLLs and a simple 8-bit ADC to convert the waveforms into digital data that can be incorporated as part of the chip scan chain. We will describe the design and measurement of a chip we have fabricated to incorporate these oscilloscopes with a high frequency interconnect structure in a TSMC 0.25 μm process View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the micro-architectural impact of clock distribution using multiple PLLs

    Page(s): 214 - 220
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (568 KB) |  | HTML iconHTML  

    Clock distribution has traditionally been a circuit design problem with negligible micro-architectural impact. However, for clock distribution networks using multiple phase-locked loops (PLLs), this will most likely not be the case. This paper discusses the micro-architectural impact of using multiple PLLs for clock distribution. Two PLL phase synchronization algorithms are presented and analyzed. They are compared in terms of efficiency, performance, and complexity. For both, the micro-architectural impact is small, but certainly not negligible View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mutable functional units and their applications on microprocessors

    Page(s): 234 - 239
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (608 KB) |  | HTML iconHTML  

    Functional units are the heart of microprocessors as they execute binary instructions of a program. Current microprocessors typically have several types of functional units. In this paper, we propose a new functional unit that combines a floating-point adder and an integer arithmetic and logic unit into a single unit. This functional unit reconfigures itself at run-time to serve different instructions from the program instruction stream. We call such units mutable functional units or MFUs. MFUs can be used in microprocessors to improve functional unit utilization, reduce power consumption, and to improve performance without adding extra functional units. MFUs only require, minor modifications to the existing floating-point adder design. We show that overheads of reconfiguration are small, typically 0 to 1 clock cycle, and at most 2 clock cycles. We demonstrate how integration with a typical current microprocessor can be achieved. This integration allows speedups of non-numerical applications by 8% to 14% while keeping the number of functional units constant. We also show that various enhancements to the base architecture that increase the instruction fetch rate affect the speedups positively View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler-directed classification of value locality behavior

    Page(s): 240 - 248
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (840 KB) |  | HTML iconHTML  

    Value prediction has been suggested as a way to increase the instruction-level parallelism available in a superscalar processor. One of the potential difficulties in cost-effectively predicting values for a given instruction, however, is selecting the proper type of predictor. We propose a compiler-directed classification scheme that statically partitions instructions in a program into several groups, each of which is associated with a specific value predictability pattern. This value predictability pattern is encoded into the instructions to identify the type of value predictor that can be best suited for each instruction at run-time. Both an ideal profile-based compiler implementation and an implementation based on the GCC compiler are studied. We use execution-driven simulation and SPEC95 and SPEC2000 benchmarks to study the performance of this approach. This work also demonstrates the connection between value locality and source-level program structures thereby leading to a deeper understanding of the causes of this behavior View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MCOMA: a multithreaded COMA architecture

    Page(s): 523 - 525
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB) |  | HTML iconHTML  

    The authors present a new Cache Only Memory Architecture, MCOMA, whose performance supersedes that of existing COMA architectures. This performance gain is obtained by basing the execution on multithreading principles and the provision of a separate search interconnection network between group directories to optimize the data search. As a result, our proposed architecture benefits from the data locality of COMA as well as the latency tolerance of multithreaded execution. To demonstrate the performance of our proposed model, we developed an execution-driven simulator, on top of the MINT, a MIPS interpreter (J.E. Veenstra and R.J. Fowler, 1994) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient function approximation for embedded and ASIC applications

    Page(s): 507 - 510
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB) |  | HTML iconHTML  

    In embedded systems and application specific integrated circuits (ASICs) that typically do not have a floating-point processor, measured data or function-sampled data is commonly described by means of an analytic function derived using standard numerical methods. The resultant errors are not caused by rounding the coefficients but by translating a real solution to a restricted fixed-point environment. A genetic algorithm has been constructed that discovers a superior piecewise polynomial approximation with coefficients restricted to the integer target space. This paper discusses the problem being solved and presents an overview of the implemented solution View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lower bound based DDD minimization for efficient symbolic circuit analysis

    Page(s): 374 - 379
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB) |  | HTML iconHTML  

    The determinant decision diagram (DDD) is a variant of binary decision diagrams (BDDs) for representing symbolic matrix determinants and cofactors in symbolic circuit analysis. Inspired by the ideas of Rudell (1993) and Drechsler et al. (2001) on BDD minimization, we present a lower-bound based sifting algorithm for reordering the DDD vertices to minimize the DDD size. Our contributions are (1) an adaptation of Rudell's sifting technique for DDD minimization with new rules for determining vertex signs, and (2) tighter lower bounds developed specifically for DDDs. On a set of DDD examples from symbolic circuit analysis, experimental results have demonstrated that the proposed lower-bound based reordering algorithm can effectively reduce DDD sizes. It has also been demonstrated that sifting with lower bounds uses about 50% less computation compared to sifting without using lower bounds, and sifting with the new lower bounds reduces the computation further by up to 8% compared to sifting with Drechsler's lower bounds for BDDs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance optimization by wire and buffer sizing under the transmission line model

    Page(s): 192 - 198
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (616 KB)  

    As the operating frequency increases to giga hertz and the rise time of a signal is less than or comparable to the time-of-flight delay of a line, it is necessary to consider the transmission line behavior for delay computation. We present an analytical formula for the delay computation under the transmission line model. Extensive simulations with SPICE show the high fidelity of the formula. Compared with previous works of Elmore (1948) and Ismaul et al. (2000), our model leads to smaller average errors in delay estimation. Based on this formula, we show the property that the minimum delay for a transmission line with reflection occurs when the number of round trips is minimized. Besides, we show that the delay of a circuit path is a posynomial function in wire and buffer sizes, implying that a local optimum is equal to the global optimum. Thus, we can apply any efficient search algorithm such as the well-known gradient search procedure to compute the globally optimal solution. Experimental results show that simultaneous wire and buffer sizing is very effective for performance optimization under the transmission line model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.