By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 1 • Date March 1995

Filter Results

Displaying Results 1 - 13 of 13
  • Optimum and heuristic transformation techniques for simultaneous optimization of latency and throughput

    Page(s): 2 - 19
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1878 KB)  

    Although throughput alone can be arbitrarily improved for several classes of systems using previously published techniques, none of those approaches are effective when latency constraints, which are increasingly important in embedded DSP systems, are considered. After formally establishing the relationship between latency and throughput in general computation, we explore the effect of pipelining on latency, and establish necessary and sufficient conditions under which pipelining does not alter latency. Many systems are either linear, or have subsystems that are linear. For such cases we have used a state-space based approach that treats various transformations in an integrated fashion, and answers analytically whether it is possible to simultaneously meet any given combination of constraints on latency and throughput, The analytic approach is constructive in nature, and produces a complete implementation when feasibility conditions are fulfilled. We also present a suboptimal but hardware efficient heuristic approach for the special case of initially-relaxed single-input single-output linear time-invariant computations. A novel software platform consisting of a high-level synthesis system coupled to a symbolic algebra system was used to implement the proposed algorithm transformations. Instead of optimizing to improve throughput and latency, our transformations can also be used to increase the implementation efficiency while achieving the same latency and throughput as the original design.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System level hardware module generation

    Page(s): 20 - 35
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1965 KB)  

    In complex modern day electronic systems, far more time is spent in designing the boards, writing the software to drive and integrate the hardware, and other such system level issues, than is spent in designing any application-specific ICs that may be needed. Unfortunately, most of the research in computer-aided design has been focussed on the more glamorous ASIC design problem, as a result of which the design methodologies and tools at the system level are much more primitive than at the chip level. We have developed a design framework for application-specific systems, called SIERA, that addresses the higher level aspects of system design, including multichip design issues at the board-level, and hardware-software codesign and integration, in addition to the design of individual ASICs. SIERA allows rapid-prototyping of multiboard systems where the functionality is implemented using a mix of dedicated hardware modules and ASICs, as well as software running on programmable hardware modules. A key step in the design methodology provided by SIERA is that of generating the physical implementation of the system hardware from a description of the system architecture. The analogue of this problem at the chip level is referred to as silicon assembly or silicon compilation. In this paper we address this problem at the system level, and describe how the generation and interfacing of board-level modules, board-level physical design, simulation of custom boards, and the overall management of board design are handled in SIERA. While some of the problems could be solved by adapting or extending techniques from the existing ASIC design tools, others required new approaches. Case-studies of several real-life applications are also presented to demonstrate the effectiveness of the board-level physical design methodology embodied in SIERA compared to the traditional PCB design systems.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and realization of high-performance wave-pipelined 8/spl times/8 b multiplier in CMOS technology

    Page(s): 36 - 48
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1163 KB)  

    Wave pipelining is a design technique for increasing the throughput of a digital circuit or system without introducing pipelining registers between adjacent combinational logic blocks in the circuit/system. However, this requires balancing of the delays along all the paths from the input to the output which comes the way of its implementation. Static CMOS is inherently susceptible to delay variation with input data, and hence, receives a low priority for wave pipelined digital design. On the other hand, ECL and CML, which are amenable to wave pipelining, lack the compactness and low power attributes of CMOS. In this paper we attempt to exploit wave pipelining in CMOS technology. We use a single generic building block in Normal Process Complementary Pass Transistor Logic (NPCPL), modeled after CPL, to achieve equal delay along all the propagation paths in the logic structure. An 8/spl times/8 b multiplier is designed using this logic in a 0.8 /spl mu/m technology. The carry-save multiplier architecture is modified suitably to support wave pipelining, viz., the logic depth of all the paths are made identical. The 1 mm/spl times/0.6 mm multiplier core supports a throughput of 400 MHz and dissipates a total power of 0.6 W. We develop simple enhancements to the NPCPL building blocks that allow the multiplier to sustain throughputs in excess of 600 MHz. The methodology can be extended to introduce wave pipelining in other circuits as well.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bus-invert coding for low-power I/O

    Page(s): 49 - 58
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1007 KB)  

    Technology trends and especially portable applications drive the quest for low-power VLSI design. Solutions that involve algorithmic, structural or physical transformations are sought. The focus is on developing low-power circuits without affecting too much the performance (area, latency, period). For CMOS circuits most power is dissipated as dynamic power for charging and discharging node capacitances. This is why many promising results in low-power design are obtained by minimizing the number of transitions inside the CMOS circuit. While it is generally accepted that because of the large capacitances involved much of the power dissipated by an IC is at the I/O little has been specifically done for decreasing the I/O power dissipation. We propose the bus-invert method of coding the I/O which lowers the bus activity and thus decreases the I/O peak power dissipation by 50% and the I/O average power dissipation by up to 25%. The method is general but applies best for dealing with buses. This is fortunate because buses are indeed most likely to have very large capacitances associated with them and consequently dissipate a lot of power.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Testing complex couplings in multiport memories

    Page(s): 59 - 71
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1397 KB)  

    In this paper, the effects of simultaneous write access on the fault modeling of multiport RAMs are investigated. New fault models representing more accurately the actual faults in such memories are then defined. Subsequently, a general algorithm that ensures the detection of all faults belonging to the new fault model is proposed. Unfortunately, the obtained algorithms are of O(n/sup 2/) complexity which is not practical for real purposes. In order to reduce the complexity of the former test algorithm a topological approach has been developed. Finally, a BIST implementation of one of the proposed topological algorithms is presented.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cumulative balance testing of logic circuits

    Page(s): 72 - 83
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1207 KB)  

    We present a new test response compression method called cumulative balance testing (CBT) that extends both balance testing and accumulator compression testing. CBT uses an accumulated balance signature, and it guarantees very high error coverage (over 99%) for various error models. We demonstrate that the single stuck-line (SSL) fault coverage of CBT for many of the ISCAS 85 combinational benchmark circuits is 100%, and for all but one circuit, the fault coverage is over 99.5%. To make processor circuits self-testing, any existing accumulators and counters can be exploited to implement CBT. Its ease of implementation, provably high error coverage, and exceptionally high SSL fault coverage, even with reduced (nonexhaustive) test sets, make CBT suitable for the built-in self testing of processor circuits that require a guaranteed level of test confidence.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A buffer distribution algorithm for high-performance clock net optimization

    Page(s): 84 - 98
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1519 KB)  

    We propose a new approach for optimizing clock trees, especially for high-speed circuits. Our approach provides a useful guideline to a designer, by user-specified parameters, and three of these tradeoffs are provided in this paper. (1) First, to provide a "good" tradeoff between skew and wire length, a new clock tree routing scheme is proposed. The technique is based on a combination of hierarchical bottom-up geometric matching and minimum rectilinear Steiner tree. Our experiments complement the theoretical results. (2) For high-speed clock distribution in the transmission line mode (e.g., multichip modules) where interconnection delay dominates the clock delay, buffer congestion might exist in a layout. Using many buffers in a small wiring area results in substantial interline crosstalks as well as wirability, when the elongation of the imbalanced subtrees is necessary. Placing buffers evenly (locally or globally) over the plane at the minimum impact on wire length increase helps avoid buffer congestion and results in less crosstalk between clock wires. Thus, an effective technique for buffer distribution is proposed. Experimental results verify the effectiveness of the proposed algorithms. (3) Finally, a postprocessing step constraining on phase-delay is also proposed. The technique is based on a combination of hierarchical bottom-up geometric matching and bounded radius minimum spanning tree. The proposed algorithm has an important application in MCM clock net synthesis as well as VLSI clock net synthesis.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A unified design methodology for CMOS tapered buffers

    Page(s): 99 - 111
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1431 KB)  

    In this paper, the various disparate approaches to CMOS tapered buffer design are unified into an integrated design methodology. Circuit speed, power dissipation, physical area, and system reliability are the four performance criteria of concern in tapered buffers, and each places a separate, often conflicting, constraint on the design of a tapered buffer. Enhanced short-channel tapered buffer design equations are presented for propagation delay and power dissipation, as well as a new split-capacitor model of hot-carrier reliability of tapered buffers and a two-component physical area model. Each performance criterion is individually investigated and analyzed with respect to the number of stages and tapering factor, and the interaction of the four criteria is examined to develop both a qualitative and a quantitative understanding of the various design tradeoffs. The creation of process dependent look-up tables for optimal buffer design is described, and a methodology to apply these look-up tables to application-specific tapered buffers for both unconstrained and constrained systems is developed.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A practical methodology for the statistical design of complex logic products for performance

    Page(s): 112 - 123
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1575 KB)  

    Contradictory trends in the industrial design environment have increased uncertainty while decreasing the tolerance to uncertainty. Worst case design techniques, still widely used in industry, do not provide the accuracy required to design under these conditions. On the other hand, statistical design techniques do provide a significant improvement in accuracy, by virtue of their "circuit adaptive" behavior, but at a substantial cost in computational effort. One practical solution to improving the accuracy of worst case design without sacrificing efficiency is considered here. It integrates an efficient statistical circuit simulator with worst case design tools into a hierarchical performance design process. It employs two stages of worst case analysis, calibrated with statistical circuit simulation, serving as filters to screen out circuits that easily meet their performance requirements. This focuses the use of statistical circuit simulation on those circuits for which the improved accuracy provides significant benefit. This methodology has been applied with outstanding results in design and manufacturing.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Physical models and algorithms for optoelectronic MCM layout

    Page(s): 124 - 135
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1365 KB)  

    Future computers will need to incorporate the parallelism of optical interconnections in order to achieve projected performance within reasonable size, power and speed constraints. This is necessary since optical interconnections have advantages in size, power, and speed over "long" distance communication. These features make optical interconnects ideal for inter-module connections in multichip module systems. Free-space optical interconnection can be one form of optical interconnections. Computer generated holograms (CGHs) are extremely attractive optical components for use in free space optical interconnections due to their ability to be computer designed. We will show that the fabrication limitations of CGHs for general interconnection networks require the need for placement algorithms for large processing element (PEs) arrays. In this paper, we will demonstrate that these fundamental CGH fabrication limitations greatly influence the computer aided design of optoelectronic interconnect networks that utilize CGHs for optical interconnections. Specifically, we show that the minimum feature size directly affects the logical placement of processing elements. Various physical models for free-space optical interconnects in parallel optoelectronic MCM systems are then identified from which we derive several logical models for analysis. We then analyze these cases and present algorithms to solve the associated layout problems. Design examples are given to illustrate the benefits of utilizing these placement algorithms in real optoelectronic interconnection networks.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An architecture for a DSP field-programmable gate array

    Page(s): 136 - 141
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (533 KB)  

    This paper describes an application specific architecture for field-programmable gate arrays (FPGAs). Emphasis is placed on the logic module architecture and channel segmentation for the FPGAs targeted for application areas related to digital signal processing (DSP). The proposed logic module architecture is well-suited for efficient implementation of frequently used logic functions in the DSP application area. This is mainly because it is possible to implement most of these functions using one logic module, which results in a reduction in both the net lengths and the number of antifuses used. The performance improvements are achieved by customizing the logic module architecture and the programmable interconnect to suit the requirements of DSP applications.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On general zero-skew clock net construction

    Page(s): 141 - 146
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (648 KB)  

    We propose a simulated annealing based zero-skew clock net construction algorithm that works in any routing spaces, from Manhattan to Euclidean, with the added flexibility of optimizing either the wire length or the propagation delay. We first devise an O(log n) tree grafting perturbation function to construct a zero-skew clock tree under the Elmore delay model. This tree grafting scheme is able to explore the entire solution space asymptotically. A Gauss-Seidel iteration procedure is then applied to optimize the Steiner point positions. Experimental results have shown that our algorithm can achieve substantial delay reduction and encouraging wire length minimization compared to previous works.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • C-testable design techniques for iterative logic arrays

    Page(s): 146 - 152
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (663 KB)  

    A design-for-testability (DFT) approach for VLSI iterative logic arrays (ILA's) is proposed, which results in a small constant number of test patterns. Our technique applies to arrays with an arbitrary dimension, and to arrays with various connection types, e.g., hexagonal or octagonal ones. Bilateral ILA's are also discussed. The DFT technique makes general ILA's C-testable by using a truth-table augmentation approach. We propose an output-assignment algorithm for minimizing the hardware overhead. We give a CMOS systolic array multiplier as an example, and show that an overhead of no more than 5.88% is sufficient to make it C-testable, i.e., 100% single cell-fault testable with only 18 test patterns regardless of the word length of the multiplier. Our technique guarantees that the test set is easy to generate. Its corresponding built-in-self-test structures are also very simple.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu