By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 3 • Date Sept. 1999

Filter Results

Displaying Results 1 - 11 of 11
  • Efficient realizations of encoders and decoders based on the 2-D discrete wavelet transform

    Publication Year: 1999 , Page(s): 289 - 298
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (282 KB)  

    In this paper, we present architectures and scheduling algorithms for encoders and decoders that are based on the two-dimensional discrete wavelet transform. We consider the design of encoders and decoders individually, as well as in an integrated encoder-decoder system. We propose architectures ranging from a single-instruction multiple-data processor arrays to folded architectures that are suitable for single-chip implementations. The scheduling algorithms for the folded architectures range from those that try to minimize the latency to those that try to minimize the storage and keep the data flow regular. We include a comparison of the performance of these algorithms to aid the designer in choosing one that is best suited for a specific application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable pipelined 2-D convolvers for fast digital signal processing

    Publication Year: 1999 , Page(s): 299 - 308
    Cited by:  Papers (29)  |  Patents (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (583 KB)  

    In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal- and image-processing functions is an important part of every digital signal processor (DSP) developer's toolset. In general, such a library provides high-level interface and mechanisms, therefore, developers only need to know how to use algorithms, not the details of how they work. Complex signal transformations then become function calls, e.g., C-callable functions. Considering the two-dimensional (2-D) convolver function as an example of great significance for DSP's, this paper proposes to replace this software function by an emulation on a field-programmable gate array (FPGA) initially configured by software programming. Therefore, the exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors. Based on the specific convolver, and considering operators supported in the library as hardware accelerators, a series of tradeoffs for efficiently exploiting the bandwidth between the general-purpose DSP and accelerators are proposed. In terms of implementation, this paper explores the performance and architectural tradeoffs involved in the design of an FPGA-based 2-D convolution coprocessor for the TMS320C40 DSP microprocessor available from Texas Instruments Incorporated. However, the proposed concept is not limited to a particular processor. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-power memory mapping through reducing address bus activity

    Publication Year: 1999 , Page(s): 309 - 320
    Cited by:  Papers (28)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (245 KB)  

    Arrays in behavioral specifications that are too large to fit into on-chip registers are usually mapped to off-chip memories during behavioral synthesis. We address the problem of system power reduction through transition count minimization on the memory address bus when these arrays are accessed from memory. We exploit regularity and spatial locality in the memory accesses and determine the mapping of behavioral array references to physical memory locations to minimize address bus transitions. We describe array mapping strategies for two important memory configurations: all behavioral arrays mapped to a single off-chip memory and arrays mapped into multiple memory modules drawn from a library. For the single memory configuration, we describe a heuristic for selecting a memory mapping scheme to achieve low power for each behavioral array. For mapping into a library of multiple memory modules, we formulate the problem as three logical-to-physical memory mapping subtasks and present experiments demonstrating the transition count reductions based on our approach. Our experiments on several image processing benchmarks show power savings of up to 63% through reduced transition activity on the memory address bus in the single memory case. We also observe a further transition count reduction by a factor of 1.5-6.7 over a straightforward mapping scheme in the multiple memories configuration. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design of a SRAM-based field-programmable gate array-Part II: Circuit design and layout

    Publication Year: 1999 , Page(s): 321 - 330
    Cited by:  Papers (30)  |  Patents (138)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB)  

    For Pt.I see ibid., vol.7, pp.191-7 (1999). Field-programmable gate arrays (FPGA's) are now widely used for the implementation of digital systems, and many commercial architectures are available. Although the literature and data books contain detailed descriptions of these architectures, there is very little information on how the high-level architecture was chosen and no information on the circuit-level or physical design of the devices. In Part I of this paper, we described the high-level architectural design of a static random-access memory programmable FPGA. This paper will address the circuit-design issues through to the physical layout. We address area-speed tradeoffs in the design of the logic block circuits and in the connections between the logic and the routing structure. All commercial FPGA designs are done using full-custom hand layout to obtain absolute minimum die sizes. This is both labor and time intensive. We propose a design style with a minitile that contains a portion of all the components in the logic tile, resulting in less full-custom effort. The minitile is replicated in a 4/spl times/4 array to create a macro tile. The minitile is optimized for layout density and speed, and is customized in the array by adding appropriate vias. This technique also permits easy changing of the hard-wired connections in the logic block architecture and the segmentation length distribution in the routing architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A design space exploration scheme for data-path synthesis

    Publication Year: 1999 , Page(s): 331 - 338
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (114 KB)  

    In this paper, we examine the multicriteria optimization involved in scheduling for data-path synthesis (DPS). The criteria we examine are the area cost of the components and schedule time. Scheduling for DPS is a well-known NP-complete problem. We present a method to find nondominated schedules using a combination of restricted search and heuristic scheduling techniques. Our method supports design with architectural constraints such as the total number of functional units, buses, etc. The schedules produced have been taken to completion using GABIND as written by Mandal et al., and the results are promising. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A memory-based architecture for MPEG2 system protocol LSIs

    Publication Year: 1999 , Page(s): 339 - 344
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (529 KB)  

    This paper proposes a memory-based architecture implementing the MPEG2 system protocol large scale integrations (LSIs), and demonstrates its flexibility and performance. The memory-based architecture implements the full functionality of the MPEG2 system protocol for both multiplexing and demultiplexing MPEG2-encoded streams. It consists of a core central processing unit, memories, and dedicated application-specific hardware. It is designed and optimized by hardware/software codesign techniques. The LSI's provide sufficient performance and flexibility for real-time application of the MPEG2 system protocol. They were fabricated with 0.5 /spl mu/m CMOS embedded gate array process technology. They are now in use on MPEG2 codec systems for several multimedia communication and storage services. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost-effective VLSI architectures and buffer size optimization for full-search block matching algorithms

    Publication Year: 1999 , Page(s): 345 - 358
    Cited by:  Papers (14)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (849 KB)  

    This paper presents two efficient very large scale integration (VLSI) architectures and buffer size optimization for full-search block matching algorithms. Starting from an overlapped data flow of search area, both systolic- and semisystolic-array architectural solutions are derived. By means of exploiting stream memory banks, not only input/output (I/O) bandwidth can be minimized, but also processor element efficiency can be improved. In addition, the controller structure for both solutions are very straightforward, making them very suitable for VLSI implementation to meet computational requirements. Moreover, by exploring the dependency graph, we focus on the problem of reducing the internal buffer size under minimal I/O bandwidth constraint to derive guidelines on reducing redundant internal buffer as well as to achieve area-efficient VLSI architectures. Simulation results show that, for N=P=16 (N is the reference block size and P is the search range), I/O bandwidth can be reduced by 2.4 times, while buffer size increases less than 38%. Two prototype chips for N=P=16 have been designed and fabricated. Test results show that clock rate can be up to 90 MHz, implying that more than 87.9-K motion vectors per second can be achieved to meet real-time requirements specified in MPEG-2 MP@ML coding standard. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Information-theoretic bounds on average signal transition activity [VLSI systems]

    Publication Year: 1999 , Page(s): 359 - 368
    Cited by:  Papers (14)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (233 KB)  

    Transitions on high-capacitance buses in very large scale integration systems result in considerable system power dissipation. Therefore, various coding schemes have been proposed in the literature to encode the input signal in order to reduce the number of transitions. In this paper, we derive lower and upper bounds on the average signal transition activity via an information-theoretic approach, in which symbols generated by a process (possibly correlated) with entropy rate H are coded with an average of R bits per symbol. The bounds are asymptotically achievable if the process is stationary and ergodic. We also present a coding algorithm based on the Lempel-Ziv data-compression algorithm to achieve the bounds. Bounds are also obtained on the expected number of ones (or zeros). These results are applied to determine the activity-reducing efficiency of different coding algorithms such as, entropy coding, transition signaling, and bus-invert coding, and determine the lower bound on the power-delay product given H and R. Two examples are provided where transition activity within 4% and 9% of the lower bound is achieved when blocks of eight symbols and 13 symbols, respectively, are coded at a time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VLSI architectures for turbo codes

    Publication Year: 1999 , Page(s): 369 - 379
    Cited by:  Papers (67)  |  Patents (20)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (238 KB)  

    A great interest has been gained in recent years by a new error-correcting code technique, known as "turbo coding", which has been proven to offer performance closer to the Shannon's limit than traditional concatenated codes. In this paper, several very large scale integration (VLSI) architectures suitable for turbo decoder implementation are proposed and compared in terms of complexity and performance; the impact on the VLSI complexity of system parameters like the state number, number of iterations, and code rate are evaluated for the different solutions. The results of this architectural study have then been exploited for the design of a specific decoder, implementing a serial concatenation scheme with 2/3 and 3/4 codes; the designed circuit occupies 35 mm/sup 2/, supports a 2 Mb/s data rate, and for a bit error probability of 10/sup -6/, yields a coding gain larger than 7 dB, with ten iterations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A structure-oriented power modeling technique for macrocells

    Publication Year: 1999 , Page(s): 380 - 391
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    To characterize the power consumption of a macrocell, a general method involves recording the power consumption of all possible input transition events in the look-up tables. However, though this approach is accurate, the size of the table becomes very large. In this paper, we propose a new power modeling technique that takes advantage of the structural information of a macrocell. In this approach, a subset of primary inputs and internal nodes in the macrocell are selected as the state variables to build a state transition graph (STG). These state variables can model the steady-state transitions completely. Moreover, by selecting the characterization patterns properly, the STG can also model the glitch power in the macrocell accurately. To further simplify the complexity of the STG, an incomplete power modeling technique is presented. Without losing much accuracy, the property of compatible patterns is exploited for a macrocell to further reduce the number of edges in the corresponding STG. Experimental results show that our modeling techniques can provide SPICE-like accuracy, while the size of the look-up table is significantly reduced. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bus crosstalk fault-detection capabilities of error-detecting codes for on-line testing

    Publication Year: 1999 , Page(s): 392 - 396
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (122 KB)  

    This paper analyses some of the most common error-detecting codes used in self-checking circuits with respect to the errors induced by crosstalk faults (CFs). The electrical-level behavior of circuits in the presence of CFs has been analyzed by considering these faults as parametric. A logic-level model providing the probability of errors has been abstracted and applied to the case of functional unit outputs (buses). Finally, the probability of detectable and undetectable errors has been evaluated for the parity, two-rail, m-out-of-n, and Berger codes, thus providing some design hint. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

Krishnendu Chakrabarty
Department of Electrical Engineering
Duke University
Durham, NC 27708 USA
Krish@duke.edu