By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 3 • Date March 2004

Filter Results

Displaying Results 1 - 21 of 21
  • Table of contents

    Publication Year: 2004 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (44 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Publication Year: 2004 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE
  • Guest Editorial

    Publication Year: 2004 , Page(s): 233 - 234
    Save to Project icon | Request Permissions | PDF file iconPDF (60 KB)  
    Freely Available from IEEE
  • Power-delay product minimization in high-performance 64-bit carry-select adders

    Publication Year: 2004 , Page(s): 235 - 244
    Cited by:  Papers (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (450 KB) |  | HTML iconHTML  

    This paper analyzes methods to minimize the power-delay product of 64-bit carry-select adders intended for high-performance and low-power applications. A first realization in 0.18-/spl mu/m partially depleted (PD) silicon-on-insulator (SOI), using complex branch-based logic (BBL) cells, results in a delay of 720 ps and a power dissipation of 96 mW at 1.5 V. The reduction of the stack height in the critical path, combined with the optimization of the global carry network with cell sharing and the selection of 8-bit pre-sums, leads to a reduction of the power-delay product by 75%. The automatic tuning of the transistor widths in 0.13-/spl mu/m PD SOI produces an energy-efficient 64-bit adder which has a delay of 326 ps and a power dissipation of 23 mW only at 1.1 V. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DCG: deterministic clock-gating for low-power microprocessor design

    Publication Year: 2004 , Page(s): 245 - 254
    Cited by:  Papers (16)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB) |  | HTML iconHTML  

    With the scaling of technology and the need for higher performance and more functionality, power dissipation is becoming a major bottleneck for microprocessor designs. Because clock power can be significant in high-performance processors, we propose a deterministic clock-gating (DCG) technique which effectively reduces clock power. DCG is based on the key observation that for many of the pipelined stages of a modern processor, the circuit block usage in the near future is known a few cycles ahead of time. Our experiments show an average of 19.9% reduction in processor power with virtually no performance loss for an eight-issue, out-of-order superscalar by applying DCG to execution units, pipeline latches, D-cache wordline decoders, and result bus drivers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory energy minimization by data compression: algorithms, architectures and implementation

    Publication Year: 2004 , Page(s): 255 - 268
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (629 KB)  

    Storing data in compressed form is becoming common practice in high-performance systems, where memory bandwidth constitutes a serious bottleneck to program execution speed. In this paper, we suggest hardware-assisted data compression as a tool for reducing energy consumption of processor-based systems. We propose a novel and efficient architecture for on-the-fly data compression and decompression whose field of operation is the cache-to-memory path. Uncompressed cache lines are compressed before they are written back to main memory, and decompressed when cache refills take place. We explore two classes of table-based compression schemes. The first, based on offline data profiling, is particularly suitable to embedded systems, where predictability of the data set is usually higher than in general-purpose systems. The second solution we introduce is adaptive, that is, it takes decisions on whether data words should be compressed according to the data statistics of the program being executed. We describe in details the architecture of the compression/decompression unit and we provide an insight about its implementation as a hardware (HW) block. We present experimental results concerning memory traffic and energy consumption in the cache-to-memory path of a core-based system running standard benchmark programs. The obtained energy savings range from 8%-39% when profile-driven compression is adopted, and from 7%-26% when the adaptive scheme is used. Performance improvements are also achieved as a by-product, showing the practical applicability of the proposed approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory-access-aware data structure transformations for embedded software with dynamic data accesses

    Publication Year: 2004 , Page(s): 269 - 280
    Cited by:  Papers (5)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (289 KB) |  | HTML iconHTML  

    Embedded systems are evolving from traditional, stand-alone devices to devices that participate in Internet activity. The days of simple, manifest embedded software [e.g. a simple finite-impulse response (FIR) algorithm on a digital signal processor (DSP] are over. Complex, nonmanifest code, executed on a variety of embedded platforms in a distributed manner, characterizes next generation embedded software. One dominant niche, which we concentrate on, is embedded, multimedia software. The need is present to map large scale, dynamic, multimedia software onto an embedded system in a systematic and highly optimized manner. The objective of this paper is to introduce high-level, systematically applicable, data structure transformations and to show in detail the practical feasibility of our optimizations on three real-life multimedia case studies. We derive Pareto tradeoff points in terms of accesses versus memory footprint and obtain significant gains in execution time and power consumption with respect to the initial implementation choices. Our approach is a first step to systematically applying high-level data structure transformations in the context of memory-efficient and low-power multimedia systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler-directed scratch pad memory optimization for embedded multiprocessors

    Publication Year: 2004 , Page(s): 281 - 287
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (500 KB) |  | HTML iconHTML  

    This paper presents a compiler strategy to optimize data accesses in regular array-intensive applications running on embedded multiprocessor environments. Specifically, we propose an optimization algorithm that targets at reducing extra off-chip memory accesses caused by interprocessor communication. This is achieved by increasing the application-wide reuse of data that resides in scratch-pad memories of processors. Our results obtained using four array-intensive image processing applications indicate that exploiting interprocessor data sharing can reduce energy-delay product significantly on a four-processor embedded system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The effect of LUT and cluster size on deep-submicron FPGA performance and density

    Publication Year: 2004 , Page(s): 288 - 298
    Cited by:  Papers (88)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (557 KB) |  | HTML iconHTML  

    In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trading off transient fault tolerance and power consumption in deep submicron (DSM) VLSI circuits

    Publication Year: 2004 , Page(s): 299 - 311
    Cited by:  Papers (24)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (403 KB) |  | HTML iconHTML  

    High fault tolerance for transient faults and low-power consumption are key objectives in the design of critical embedded systems. Systems like smart cards, PDAs, wearable computers, pacemakers, defibrillators, and other electronic gadgets must not only be designed for fault tolerance but also for ultra-low-power consumption due to limited battery life. In this paper, a highly accurate method of estimating fault tolerance in terms of mean time to failure (MTTF) is presented. The estimation is based on circuit-level simulations (HSPICE) and uses a double exponential current-source fault model. Using counters, it is shown that the transient fault tolerance and power dissipation of low-power circuits are at odds and allow for a power fault-tolerance tradeoff. Architecture and circuit level fault tolerance and low-power techniques are used to demonstrate and quantify this tradeoff. Estimates show that incorporation of these techniques results either in a design with an MTTF of 36 years and power consumption of 102 /spl mu/W or a design with an MTTF of 12 years and power consumption of 20 /spl mu/W. Depending on the criticality of the system and the power budget, certain techniques might be preferred over others, resulting in either a more fault tolerant or a lower power design, at the sacrifice of the alternative objective. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Overview of a compiler for synthesizing MATLAB programs onto FPGAs

    Publication Year: 2004 , Page(s): 312 - 324
    Cited by:  Papers (13)  |  Patents (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (610 KB) |  | HTML iconHTML  

    This paper describes a behavioral synthesis tool called AccelFPGA which reads in high-level descriptions of digital signal processing (DSP) applications written in MATLAB, and automatically generates synthesizable register transfer level (RTL) models and simulation testbenches in VHDL or Verilog. The RTL models can be synthesized using commercial logic synthesis tools and place and route tools onto field-programmable gate arrays (FPGAs). This paper describes how powerful directives are used to provide high-level architectural tradeoffs for the DSP designer. Experimental results are reported on a set of eight MATLAB benchmarks that are mapped onto the Xilinx Virtex II and Altera Stratix FPGAs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A CAM with mixed serial-parallel comparison for use in low energy caches

    Publication Year: 2004 , Page(s): 325 - 329
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (239 KB) |  | HTML iconHTML  

    A novel, low-energy content addressable memory (CAM) structure is presented which achieves an approximately four-fold improvement in energy per access, compared to a standard parallel CAM, when used as tag storage for caches. It exploits the address patterns commonly found in application programs, where testing the four least significant bits of the tag is sufficient to determine over 90% of the tag mismatches; the proposed CAM checks those bits first and evaluates the remainder of the tag only if they match. Although, the energy savings come at the cost of a 25% increase in search time, the proposed CAM organization also supports a parallel operating mode without a speed loss but with reduced energy savings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE International Symposium on Circuits and Systems (ISCAS 2004)

    Publication Year: 2004 , Page(s): 330
    Save to Project icon | Request Permissions | PDF file iconPDF (527 KB)  
    Freely Available from IEEE
  • The 13th International Workshop on Logic & Synthesis (IWLS)

    Publication Year: 2004 , Page(s): 331
    Save to Project icon | Request Permissions | PDF file iconPDF (138 KB)  
    Freely Available from IEEE
  • IEEE International Symposium on Low Power Electronics and Design (ISLPED'04)

    Publication Year: 2004 , Page(s): 332
    Save to Project icon | Request Permissions | PDF file iconPDF (163 KB)  
    Freely Available from IEEE
  • 11th IEEE International Conference on Electronics, Circuits and Systems (ICECS 2004)

    Publication Year: 2004 , Page(s): 333
    Save to Project icon | Request Permissions | PDF file iconPDF (495 KB)  
    Freely Available from IEEE
  • IEEE Member Digital Library [advertisement]

    Publication Year: 2004 , Page(s): 334
    Save to Project icon | Request Permissions | PDF file iconPDF (179 KB)  
    Freely Available from IEEE
  • Join IEEE

    Publication Year: 2004 , Page(s): 335
    Save to Project icon | Request Permissions | PDF file iconPDF (322 KB)  
    Freely Available from IEEE
  • Leading the field since 1884 [advertisement]

    Publication Year: 2004 , Page(s): 336
    Save to Project icon | Request Permissions | PDF file iconPDF (218 KB)  
    Freely Available from IEEE
  • IEEE Circuits and Systems Society Information

    Publication Year: 2004 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (26 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Publication Year: 2004 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

Krishnendu Chakrabarty
Department of Electrical Engineering
Duke University
Durham, NC 27708 USA
Krish@duke.edu