Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 1 • Date March 1996

Filter Results

Displaying Results 1 - 13 of 13
  • A CMOS IC for Gb/s Viterbi decoding: system design and VLSI implementation

    Publication Year: 1996 , Page(s): 17 - 31
    Cited by:  Papers (14)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2264 KB)  

    At present, the Viterbi algorithm (VA) is widely used in communication systems for decoding and equalization. The achievable speed of conventional Viterbi decoders (VD's) is limited by the inherent nonlinear add-compare-select (ACS) recursion. The aim of this paper is to describe system design and VLSI implementation of a complex system of fabricated ASIC's for high speed Viterbi decoding using the "minimized method" (MM) parallelized VA. We particularly emphasize the interaction between system design, architecture and VLSI implementation as well as system partitioning issues and the resulting requirements for the system design flow. Our design objectives were 1) to achieve the same decoding performance as a conventional VD using the parallelized algorithm, 2) to achieve a speed of more than 1 Gb/s, and 3) to realize a system for this task using a single cascadable ASIC. With a minimum system configuration of four identical ASIC's produced by using 1.0 /spl mu/ CMOS technology, the design objective of a decoding speed of 1.2 Gb/s is achieved. This means, compared to previous implementations of Viterbi decoders, the speed is increased by an order of magnitude. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System design for pixel-parallel image processing

    Publication Year: 1996 , Page(s): 32 - 41
    Cited by:  Papers (11)  |  Patents (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1388 KB)  

    A system design for performing low-level image processing tasks in real time is presented. The design is based on large processor-per-pixel arrays implemented using integrated circuit technology. Two integrated circuit architectures are summarized: an associative parallel processor and a parallel processor employing DRAM cells. In both architectures, the layout pitch of one-bit-wide logic is matched to the pitch of memory cells to form high-density processing element arrays. The system design features an efficient control path implementation, providing high processing element array utilization without demanding complex controller hardware. Sequences of array instructions are generated by a host computer before processing begins, then stored in a simple controller. Once processing begins, the host computer initiates stored sequences to perform pixel-parallel operations. A programming framework implemented using the C++ programming language supports application development. A prototype system employs associative parallel processor devices, a controller, and the programming framework. Three sample applications, smoothing and segmentation, median filtering, and optical flow, establish the suitability of the system for real-time image processing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predictive system shutdown and other architectural techniques for energy efficient programmable computation

    Publication Year: 1996 , Page(s): 42 - 55
    Cited by:  Papers (135)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1677 KB)  

    With the popularity of portable devices such as personal digital assistants and personal communicators, as well as with increasing awareness of the economic and environmental costs of power consumption by desktop computers, energy efficiency has emerged as an important issue in the design of electronic systems. While power efficient ASIC's with dedicated architectures have addressed the energy efficiency issue for niche applications such as DSP, much of the computation continues to be implemented as software running on programmable processors such as microprocessors, microcontrollers, and programmable DSP's. Not only is this true for general purpose computation on personal computers and workstations, but also for portable devices, application-specific systems etc. In fact, firmware and embedded software executing on RISC and DSP processor cores that are embedded in ASIC's has emerged as a leading implementation methodology for speech coding, modem functionality, video compression, communication protocol processing etc. This paper describes architectural techniques for energy efficient implementation of programmable computation, particularly focussing on the computation needed in portable devices where event-driven user interfaces, communication protocols, and signal processing play a dominant role. Two key approaches described here are predictive system shutdown and extended voltage scaling. Results indicate that a large reduction in power consumption can be achieved over current day solutions with little or no loss in system performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Programmable active memories: reconfigurable systems come of age

    Publication Year: 1996 , Page(s): 56 - 69
    Cited by:  Papers (110)  |  Patents (44)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1804 KB)  

    Programmable active memories (PAM) are a novel form of universal reconfigurable hardware coprocessor. Based on field-programmable gate array (FPGA) technology, a PAM is a virtual machine, controlled by a standard microprocessor, which can be dynamically and indefinitely reconfigured into a large number of application-specific circuits. PAM's offer a new mixture of hardware performance and software versatility. We review the important architectural features of PAM's, through the example of DECPeRLe-1, an experimental device built in 1992. PAM programming is presented, in contrast to classical gate-array and full custom circuit design. Our emphasis is on large, code-generated synchronous systems descriptions; no compromise is made with regard to the performance of the target circuits. We exhibit a dozen applications where PAM technology proves superior, both in performance and cost, to every other existing technology, including supercomputers, massively parallel machines, and conventional custom hardware. The fields covered include computer arithmetic, cryptography, error correction, image analysis, stereo vision, video compression, sound synthesis, neural networks, high-energy physics, thermodynamics, biology and astronomy. At comparable cost, the computing power virtually available in a PAM exceeds that of conventional processors by a factor 10 to 1000, depending on the specific application, in 1992. A technology shrink increases the performance gap between conventional processors and PAM's. By Noyce's law, we predict by how much the performance gap will widen with time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System design methodologies: aiming at the 100 h design cycle

    Publication Year: 1996 , Page(s): 70 - 82
    Cited by:  Papers (12)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1801 KB)  

    As methodologies and tools for chip-level design mature, design effort becomes focused on increasingly higher levels of abstraction. We present a tutorial on a design methodology for chip and system design and present a test case that justifies the future goal of a 100 h design cycle. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PSM: an object-oriented synthesis approach to multiprocessor system design

    Publication Year: 1996 , Page(s): 83 - 97
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1597 KB)  

    Although multiprocessor systems are becoming a trend today, few synthesis tools currently available can actually automate the design of multiprocessor systems. Performance synthesis methodology (PSM) is an object-oriented system-level synthesis approach to multiprocessor system design. Since PSM was designed specifically for the synthesis of multiprocessor systems, it is not only much more efficient when synthesizing parallel systems, but also produces better parallel systems than currently available uniprocessor system-level synthesis tools. Colored Petri nets used in modeling system components and object modeling technique used in the design process have both contributed to the shortening of system development time and to the reduction of design cost. First, user specification consisting of functional models and performance constraints is translated into architecture models. Then, the system is configured by selecting the method of control, the memory organization, the type of processor, and the type of system interconnection. Finally, a heuristic design space exploration algorithm is used to generate several near-optimal design alternatives. The best architecture is chosen by evaluating the design alternatives using a flexible performance estimation formula that mainly considers system level design features, such as system throughput, utilization, reliability, scalability, fault-tolerance, and cost. Several systems were successfully synthesized using this top-down object-oriented PSM, thus showing its feasibility as a design automation tool for parallel systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient graph algorithm for FSM scheduling

    Publication Year: 1996 , Page(s): 98 - 112
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1875 KB)  

    This paper presents a new algorithm for scheduling control-dominated designs during high-level synthesis. Our algorithm can schedule systems with arbitrary control flow, including conditional branches and multiple loops. It can handle both upper bound and lower bound timing constraints. The timing constraints can cross basic block boundaries, span different iterations of a loop, and form interlocking cycles in the control flow. A scheduling problem is described by the behavior finite-state machine model, an automaton model for the behavioral specification and synthesis of control-dominated systems. We optimize the performance of the produced digital circuit implementation by minimizing the execution time of each state transition in the state transition graph. The finite-state machines (FSM) scheduling algorithm is based on previous work on cylindrical layout compaction; we extend that work to handle upper bound constraints, allow multiple loops, and not require an initial feasible solution. Experimental results for examples derived from real designs and benchmark descriptions demonstrate that the algorithm can handle complex combinations of constraints very efficiently. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ravel-XL: a hardware accelerator for assigned-delay compiled-code logic gate simulation

    Publication Year: 1996 , Page(s): 113 - 129
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2148 KB)  

    Ravel-XL is a single-board hardware accelerator for gate-level digital logic simulation. It uses a standard levelized-code approach to statically schedule gate evaluations. However, unlike previous approaches based on levelized-code scheduling, it is not limited to zero- or unit-delay gate models and can provide timing accuracy comparable to that obtained from event-driven methods. We review the synchronous waveform algebra that forms the basis of the Ravel-XL simulation algorithm, present an architecture for its hardware realization, and describe an implementation of this architecture as a single VLSI chip. The chip has about 900000 transistors on a die that is approximately 1.4 cm/sup 2/, requires a 256 pin package and is designed to run at 33 MHz. A Ravel-XL board consisting of the processor chip and local instruction and data memory can simulate up to one billion gates at a rate of approximately 6.6 million gate evaluations per second. To better appreciate the tradeoffs made in designing Ravel-XL, we compare its capabilities to those of other commercial and research software simulators and hardware accelerators. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sensing circuit for on-line detection of delay faults

    Publication Year: 1996 , Page(s): 130 - 133
    Cited by:  Papers (21)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (448 KB)  

    A sensing circuit for on-line testing of delay faults is presented. It can be used to monitor the outputs of circuits that are either general, or designed to be self-checking with respect to steady-state errors. Detailed analyses of the proposed circuit have shown that it is preferable to alternate solutions from the point of view of both the accuracy and the self-testing capability that make it suitable for self-checking applications. Checking architectures for delay faults, making use of the proposed sensing circuit and of standard checkers, are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Finite field inversion over the dual basis

    Publication Year: 1996 , Page(s): 134 - 137
    Cited by:  Papers (8)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (485 KB)  

    In this transaction brief we consider the design of dual basis inversion circuits for GF(2/sup m/). Two architectures are presented-one bit-serial and one bit-parallel-both of which are based on Fermat's theorem. Finite field inverters based on Fermat's theorem have previously been presented which operate over the normal basis and the polynomial basis. However there are two advantages to be gained by forcing inversion circuits to operate over the dual basis. First, these inversion circuits can be utilized in circuits using hardware efficient dual basis multipliers without any extra basis converters. And second, the inversion circuits themselves can take advantage of dual basis multipliers, thus reducing their own hardware levels. As both these approaches require squaring in a finite field to take place, a theorem is presented which allows circuits to be easily designed to carry out squaring over the dual basis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An evaluation of asynchronous addition

    Publication Year: 1996 , Page(s): 137 - 140
    Cited by:  Papers (21)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (507 KB)  

    There is considerable interest at present in the design of asynchronous systems based on the use of self-timing components for arithmetic and other operations. Amongst the advantages claimed for asynchronous design are ease of design, high speed, low power, and device speed independence. An often quoted example of the speed improvement possible from self-timed hardware is parallel binary addition, where the carry signals in the worst case must propagate through n stages before the sum can be guaranteed correct. In practice, however, it is not possible to achieve significant speed advantage from the method, and this paper shows that asynchronous adders only give a performance improvement over more conventional hardware in very limited conditions, where the size and regularity of the layout are at a premium. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Built-in self-test (BIST) design of high-speed carry-free dividers

    Publication Year: 1996 , Page(s): 141 - 145
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (476 KB)  

    This paper presents the built-in self-test (BIST) design of a C-testable high-speed carry-free divider which can be fully tested by 72 test patterns irrespective of the divider size. Using a graph labeling scheme, the test patterns, expected outputs, and control signals can be represented by sets of labels and generated by a simple circuitry. As a result, test patterns can be easily generated inside chips, responses to test patterns need not to be stored, and use of expensive test equipment is not necessary. Results show that the hardware cost for generating such labels is virtually constant irrespective of the circuit size. For the BIST design of a 64 b C-testable divider, its hardware overhead is less than 5%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Self-Timed Design in GaAs - Case Study of a High-Speed, Parallel Multiplier

    Publication Year: 1996
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (669 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

Krishnendu Chakrabarty
Department of Electrical Engineering
Duke University
Durham, NC 27708 USA
Krish@duke.edu