By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 4 • Date Dec. 1999

Filter Results

Displaying Results 1 - 13 of 13
  • High-level synthesis of recoverable VLSI microarchitectures

    Page(s): 401 - 410
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (193 KB)  

    Two algorithms that combine the operations of scheduling and recovery-point insertion for high-level synthesis of recoverable microarchitectures are presented. The first uses a prioritized cost function in which functional unit (FU) cost is minimized first and register cost second. The second algorithm minimizes a weighted sum of FU and register costs. Both algorithms are optimal according to their respective cost functions and require less than 10 min of central processing unit (CPU) time on widely used high-level synthesis benchmarks. The best previous result reported several hours of CPU time for some of the same benchmarks on a computer of similar computational power. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate prediction of quality metrics for logic level designs targeted toward lookup-table-based FPGAs

    Page(s): 411 - 418
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (221 KB)  

    The importance of efficient area and timing estimation techniques is well-established in high-level synthesis (HLS) since it allows more efficient exploration of the design space while providing HLS tools with the capability of predicting the effects of technology-specific tools on the design space. Much of the previous work has focused on estimation techniques that use very simple cost models based solely on the gate and/or literal count. Those models are not accurate enough to allow effective design space exploration since the effects of interconnect can indeed dominate the final design cost. The situation becomes even worse when the design is targeted to field-programmable gate array (FPGA) technologies since the wire delay may contribute up to 60% of the overall design delay. In this paper, we present an approach of estimating area and timing for lookup-table-based FPGAs that takes into account not only gate area and delay, but also the wiring effects. We select the Xilinx XC4000 series as our main concentration because of their popularity. We tested our estimator with several benchmarks and the results show that we receive accurate area and timing estimates efficiently. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partitioning and pipelining for performance-constrained hardware/software systems

    Page(s): 419 - 432
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB)  

    In order to satisfy cost and performance requirements, digital signal processing and telecommunication systems are generally implemented with a combination of different components, from custom-designed chips to off-the-shelf processors. These components vary in their area, performance, programmability and so on, and the system functionality is partitioned amongst the components to best utilize this tradeoff. However, for performance critical designs, it is not sufficient to only implement the critical sections as custom-designed high-performance hardware, but it is also necessary to pipeline the system at several levels of granularity. We present a design flow and an algorithm to first allocate software and hardware components, and then partition and pipeline a throughput-constrained specification amongst the selected components. This is performed to best satisfy the throughput constraint at minimal application-specific integrated-circuit cost. Our ability to incorporate partitioning with pipelining at several levels of granularity enables us to attain high throughput designs, and also distinguishes this paper from previously proposed hardware/software partitioning algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimizing the required memory bandwidth in VLSI system realizations

    Page(s): 433 - 441
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (193 KB)  

    In this paper, we present the problem of storage bandwidth optimization (SBO) in VLSI system realizations. Our goal is to minimize the required memory bandwidth within the given cycle budget by adding ordering constraints to the flow graph. This allows the subsequent memory allocation and assignment tasks to come up with a cheaper memory architecture with less memories and memory ports. The importance and the effect of SBO is shown on realistic examples both in the video and asynchronous transfer-mode (ATM) domains. We show that it is important to take into account which data is being accessed in parallel, instead of only considering the number of simultaneous memory accesses. Our problem formulation leads to the optimization of a conflict (hyper) graph. For the target domain of ATM, only flat graphs without loops have to be treated. For this subproblem, a prototype tool has been implemented to demonstrate the feasibility of automating this important system design step. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Figures of merit to characterize the importance of on-chip inductance

    Page(s): 442 - 449
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (197 KB)  

    A closed-form solution for the output signal of a CMOS inverter driving an RLC transmission line is presented. This solution is based on the alpha power law for deep submicrometer technologies. Two figures of merit are presented that are useful for determining if a section of interconnect should be modeled as either an RLC or an RC impedance. The damping factor of a lumped RLC circuit is shown to be a useful criterion. The second useful figure of merit considered in this paper is the ratio of the rise time of the input signal at the driver of an interconnect line to the time of flight of the signals across the line. AS/X circuit simulations of an RLC transmission line and a five section RC II circuit based on a 0.25-/spl mu/m IBM CMOS technology are used to quantify and determine the relative accuracy of an RC model. One primary result of this paper is evidence demonstrating that a range for the length of the interconnect exists for which inductance effects are prominent. Furthermore, it is shown that under certain conditions, inductance effects are negligible despite the length of the section of interconnect. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-energy CSMT carry generators and binary adders

    Page(s): 450 - 462
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (343 KB)  

    This paper presents novel hybrid carry-select modified-tree (CSMT) adder architectures for binary carry generators and adders using multiplexers only. These architectures not only require the fewest number of multiplexers, but also consume the least energy for a specified latency. These architectures are based on a carry-select configuration where each block can be a carry-select or tree or modified-tree block. The modified-tree blocks permit ripple in the carry-generation process; which leads to dramatic reduction in the number of multiplexers as well as power consumption. It is shown that, for a block length W, the carry-select block and the modified-tree block with internal ripple of (log/sub 2/ W-1) multiplexer stages require the same number of multiplexers. This is a powerful result because the longer carry-select blocks can be replaced by the modified-tree blocks without increasing the multiplexer complexity. The advantage of this approach is in reduction of power consumption since the amount of ripple in the carry-select block grows linearly with W, while that in the modified-tree block grows logarithmically with W. It is shown that for fastest adder/subtractor designs, the proposed CSMT architecture can reduce the multiplexer complexity by about 40% for word-lengths ranging from 8 to 32, when compared with known tree approaches. It is shown that, for a certain specified latency and specified number of multiplexers, a family of carry-select and CSMT adders can be designed. It is shown that, for a specified latency, the carry-select adders with larger number of blocks and smaller block lengths consume less power. Through extensive simulations, CSMT adder configurations that minimize energy consumption or power-latency product, which are approximately 5% to 10% less than those of known tree and best carry-select adders, are obtained. Finally, based on novel latency-matching and block-increment techniques introduced in this paper, a systematic design methodology for design of CSMT adders with least latency, least number of multiplexers, and least energy consumption is presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic algorithm transformations (DAT)-a systematic approach to low-power reconfigurable signal processing

    Page(s): 463 - 476
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (309 KB)  

    In this paper, dynamic algorithm transformations (DATs) for designing low-power reconfigurable signal-processing systems are presented. These transformations minimize energy dissipation while maintaining a specified level of mean squared error or signal-to-noise ratio. This is achieved by modeling the nonstationarities in the input as temporal/spatial transitions between states in the input state-space. The reconfigurable hardware fabric is characterized by its configuration state-space. The configurable parameters are taken to be the filter taps, coefficient and data precisions, and supply voltage V/sub dd/. An energy-optimal reconfiguration strategy is derived as a mapping from the input to the configuration state-space. In this strategy, taps are powered down starting with the tap with the smallest value [w/sub k//sup 2///spl Sigma//sub m/(w/sub k/)] (where w/sub k/ and /spl Sigma//sub m/(w/sub k/) are, respectively, the adders, redundant-to-binary conversion, tree adders, coefficient and energy dissipation of the kth tap). Optimal values for precision and supply voltage V/sub dd/ are subsequently computed from the roundoff error and critical path delay requirements, respectively. The DAT-based adaptive filter is employed as a near-end crosstalk (NEXT) canceller in a 155.52-Mb/s asynchronous transfer mode-local area network transceiver over category-3 wiring. Simulation results indicate that the energy savings range from -2% to 87% as the cable length varies from 110 to 40 m, respectively, with an average saving of 69%. An average saving of 62% is achieved for the case where the supply voltage V/sub dd/ is kept fixed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical analysis of timing rules for high-speed synchronous VLSI systems

    Page(s): 477 - 482
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (204 KB)  

    Timing skew has been the major limitation for high-speed synchronous operation of a VLSI system. In this paper, a statistical timing model that accounts for both static and random timing skew is proposed. Based on this model, we analyze the timing rules of a synchronous VLSI system consisting of multiple pipelined stages, establish the yield of the system as a function of its device characteristics, and derive the relationship between the maximum throughput of such a system and its timing skew. The following timing schemes are evaluated: conventional pipelining, in which the transmitter cannot initiate the next cycle until the receiver has received the data and wave pipelining, in which the transmitter initiates the next cycle as soon as the current data has been sent out. The results show that the yield of a VLSI system using either of the pipelining schemes exhibits threshold behavior for Gaussian distributed static skew. Furthermore, the system throughput is shown to be very sensitive to the random skew. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pausible clocking-based heterogeneous systems

    Page(s): 482 - 488
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (308 KB)  

    This paper describes a novel communication scheme, which is guaranteed to be free of synchronization failures, amongst multiple synchronous and asynchronous modules operating independently. In this scheme, communication between every pair of modules is done through an asynchronous first-in first-out (FIFO) channel; communication between a module and the FIFO is done using a request/acknowledge handshaking. Synchronization of handshake signals to the local module clock is done in an unconventional way-the local clock built out of a ring oscillator is paused or stretched, if necessary, to ensure that the handshake signal satisfies setup and hold time constraints with respect to the local clock. In order to validate this scheme, we implemented a test chip in 0.5-/spl mu/m CMOS. This chip is designed as a ring, composed of two synchronous modules, an asynchronous module, and two asynchronous FIFOs. Each module functions as a receiver to one module and a sender to another module. Test results show that the chip functions reliably up to 456 MHz. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • EmGen-a module generator for logic emulation applications

    Page(s): 488 - 492
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (110 KB)  

    Logic emulation is a technique that uses dynamically reprogrammable systems for prototyping and design verification. Using an emulator, designers can realize designs through a software configuration process and perform real-time design verification before fabricating the chip into silicon. However, converting designs into an emulator involves the use of multiphase design tasks, which is a very time-consuming process. Hence, shortening the time to emulation is always the main concern for the logic-emulation design process. One approach to shorten the design processing time is to replace portions of the design with macro cells. This paper presents a module generator for logic-emulation applications, which is able to generate macro cells of arbitrarily complex functions described in hardware descriptive languages. Furthermore, the module generator can effectively generate a multiple field-programmable gate array (FPGA) macro for large macros that cannot fit in a single FPGA chip. Experiments using the module generator for logic emulation are reported. The results demonstrate that the module generator can effectively and efficiently generate complex macros from their register transfer-level description. In addition, the results also show that the design processing time is significantly shortened when the module generation method is incorporated into the logic-emulation design flow. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel techniques for bus power consumption reduction in realizations of sum-of-product computation

    Page(s): 492 - 497
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (147 KB)  

    Novel techniques for power-efficient implementation of sum of product computation are presented. The proposed techniques aim at reducing the switching activity required for the successive evaluation of the partial products, in the busses connecting the storage elements where data and coefficients are stored to the functional units. This is achieved through reordering the sequence of evaluation of the partial products. Heuristics based on the traveling salesman problem are proposed to perform the reordering for different categories of algorithms. Information related to both data (dynamic) and coefficients (static) is used to drive the reordering. Experimental results from the application of the proposed techniques on several signal-processing algorithms have proven that significant switching activity savings can be achieved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Page(s): Ind
    Save to Project icon | Request Permissions | PDF file iconPDF (183 KB)  
    Freely Available from IEEE
  • Subject index

    Page(s): Ind
    Save to Project icon | Request Permissions | PDF file iconPDF (173 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu