By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 1 • Date Jan. 2005

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (42 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (32 KB)  
    Freely Available from IEEE
  • The CSI multimedia architecture

    Page(s): 1 - 13
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (755 KB) |  | HTML iconHTML  

    An instruction set extension designed to accelerate multimedia applications is presented and evaluated. In the proposed complex streamed instruction (CSI) set, a single instruction can process vector data streams of arbitrary length and stride and combines complex memory accesses (with implicit prefetching), program control for vector sectioning, and complex computations on multiple data in a single operation. In this way, CSI eliminates overhead instructions (such as instructions for data sectioning, alignment, reorganization, and packing/unpacking) often needed in applications utilizing MMX-like extensions and accelerates key multimedia kernels. Simulation results demonstrate that a superscalar processor extended with CSI outperforms the same processor enhanced with Sun's VIS extension by a factor of up to 7.77 on key multimedia kernels and by up to 35% on full applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Execution cache-based microarchitecture for power-efficient superscalar processors

    Page(s): 14 - 26
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (620 KB) |  | HTML iconHTML  

    This paper investigates a possible solution to the problem of power consumption in superscalar, out-of-order processors by proposing a new microarchitecture, specifically designed to reduce increasing power requirements of high-end processors. More precisely, we show that by modifying the well-established superscalar processor architecture, significant savings can be achieved in terms of power consumption. Our approach aims at limiting the growing amount of power used in a typical processor for dynamic optimizations (including out-of-order scheduling and register renaming). Our proposed approach achieves significant power savings by reusing as much as possible from the work done by the front-end of a typical superscalar, out-of-order pipeline, via the use of a special cache nested deeply into the processor structure. By reusing instructions that are already decoded, reordered, and have their registers already renamed, the front end of the pipeline can be turned off for large periods of time with significant savings in the overall power consumption. Experimental results show up to 35% (30% on average) savings in average energy per committed instruction, and 35% (20% on average) savings in energy-delay product, with about 9% average performance loss, over a large spectrum of SPEC95 and SPEC2000 benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A process-tolerant cache architecture for improved yield in nanoscale technologies

    Page(s): 27 - 38
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (839 KB) |  | HTML iconHTML  

    Process parameter variations are expected to be significantly high in a sub-50-nm technology regime, which can severely affect the yield, unless very conservative design techniques are employed. The parameter variations are random in nature and are expected to be more pronounced in minimum geometry transistors commonly used in memories such as SRAM. Consequently, a large number of cells in a memory are expected to be faulty due to variations in different process parameters. We analyze the impact of process variation on the different failure mechanisms in SRAM cells. We also propose a process-tolerant cache architecture suitable for high-performance memory. This technique dynamically detects and replaces faulty cells by dynamically resizing the cache. It surpasses all the contemporary fault tolerant schemes such as row/column redundancy and error-correcting code (ECC) in handling failures due to process variation. Experimental results on a 64-K direct map L1 cache show that the proposed technique can achieve 94% yield compared to its original 33% yield (standard cache) in a 45-nm predictive technology under /spl sigma//sub Vt-inter/=/spl sigma//sub Vt-intra/=30 mV. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimum and heuristic synthesis of multiple word-length architectures

    Page(s): 39 - 57
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1032 KB) |  | HTML iconHTML  

    This paper explores the problem of architectural synthesis (scheduling, allocation, and binding) for multiple word-length systems. It is demonstrated that the resource allocation and binding problem, and the interaction between scheduling, allocation, and binding, are complicated by the existence of multiple word-length operators. Both optimum and heuristic approaches to the combined problem are formulated. The optimum solution involves modeling as an integer linear program, while the heuristic solution considers intertwined scheduling, binding, and resource word-length selection. Techniques are introduced to perform scheduling with incomplete word-length information, to combine binding and word-length selection, and to refine word-length information based on critical path analysis. Results are presented for several benchmark and artificial examples, demonstrating significant resource savings of up to 46% are possible by considering these problems within the proposed unified framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Non-RAM-based architectural designs of wavelet-based digital systems based on novel nonlinear I/O data space transformations

    Page(s): 58 - 74
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (815 KB) |  | HTML iconHTML  

    The designs of application specific integrated circuits and/or multiprocessor systems are usually required in order to improve the performance of multidimensional applications such as digital-image processing and computer vision. Wavelet-based algorithms have been found promising among these applications due to the features of hierarchical signal analysis and multiresolution analysis. Because of the large size of multidimensional input data, off-chip random access memory (RAM) based systems have ever been necessary for calculating algorithms in these applications, where either memory address pointers or data preprocessing and rearrangements in off-chip memories are employed. This paper establishes and follows novel concepts in data dependence analysis for generalized and arbitrarily multidimensional wavelet-based algorithms, i.e., the wavelet-adjacent field and the super wavelet-dependence vector. Based on them, a series of novel nonlinear I/O data space transformations for variable localization and dependence graph regularization for wavelet algorithms is proposed. It leads to general designs of non-RAM-based architectures for wavelet-based algorithms where off-chip communications for intermediate calculation results are eliminated without preprocessing or rearranging input data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Current demand balancing: a technique for minimization of current surge in high performance clock-gated microprocessors

    Page(s): 75 - 85
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (532 KB) |  | HTML iconHTML  

    We propose an integrated architectural and physical planning approach to minimize the current surge in high-performance clock-gated microprocessors. In our approach, we use priority assignment optimization (PAO) and dynamic functional unit (FU) selection (DFS) to balance current demand in the floorplan. Two complementary methods-FU ordering with submodule design and issue pattern management-are also proposed to enhance the above techniques. Experimental results show that at the 0.18-/spl mu/m technology node, the PAO can reduce the peak noise by 11.75% and consequently, the decoupling capacitance (Decap) requirement by 24.22% without any degradation in instructions per cycle (IPC). Moreover, an enhanced DFS reduces the peak noise by 13.39% as well as Decap requirement by 29.58%. Experiments at the 90-nm technology node show that our methodology can further reduce the peak noise and the Decap requirement by 16.57% and 44.85% with PAO, or 18.16% and 47.58% with DFS. We also show that our approach does not increase the clock period for 0.18-/spl mu/m technology and beyond. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Virtex FPGA implementation of a pipelined adaptive LMS predictor for electronic support measures receivers

    Page(s): 86 - 95
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1313 KB) |  | HTML iconHTML  

    High-speed field-programmable gate array (FPGA) implementations of an adaptive least mean square (LMS) filter with application in an electronic support measures (ESM) digital receiver, are presented. They employ "fine-grained" pipelining, i.e., pipelining within the processor and result in an increased output latency when used in the LMS recursive system. Therefore, the major challenge is to maintain a low latency output whilst increasing the pipeline stage in the filter for higher speeds. Using the delayed LMS (DLMS) algorithm, fine-grained pipelined FPGA implementations using both the direct form (DF) and the transposed form (TF) are considered and compared. It is shown that the direct form LMS filter utilizes the FPGA resources more efficiently thereby allowing a 120 MHz sampling rate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Layout techniques for FPGA switch blocks

    Page(s): 96 - 105
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (641 KB) |  | HTML iconHTML  

    This paper presents abstract layout techniques for a variety of field-programmable gate array switch block architectures. For subset switch blocks of small size, we find the optimal implementations using a simple metric. We also develop a tractable heuristic that returns the optimal results for small switch blocks and good results for large switch blocks. We show how it is possible to transform universal switch blocks into a subset architecture by using the decomposition property of universal switch blocks. This allows universal switch blocks to exploit the same layout methodologies as presented for subset architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On exploring inter-iteration parallelism within rate-balanced multirate multidimensional DSP algorithms

    Page(s): 106 - 125
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (799 KB) |  | HTML iconHTML  

    Although the notion of the parallelism in multidimensional applications has existed for a long time, it is so far unknown what the bound (if any) of inter-iteration parallelism in multirate multidimensional digital signal processing (DSP) algorithms is, and whether the maximum inter-iteration parallelism can be achieved for arbitrary multirate data flow algorithms. This paper explores the bound of inter-iteration parallelism within rate-balanced multirate multidimensional DSP algorithms and proves that this parallelism can always be achieved in hardware system given the availability of a large number of processors and the interconnections between them. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A robust self-calibrating transmission scheme for on-chip networks

    Page(s): 126 - 139
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (779 KB) |  | HTML iconHTML  

    Systems-on-Chip (SoC) design involves several challenges, stemming from the extreme miniaturization of the physical features and from the large number of devices and wires on a chip. Since most SoCs are used within embedded systems, specific concerns are increasingly related to correct, reliable, and robust operation. We believe that in the future most SoCs will be assembled by using large-scale macro-cells and interconnected by means of on-chip networks. We examine some physical properties of on-chip interconnect busses, with the goal of achieving fast, reliable, and low-energy communication. These objectives are reached by dynamically scaling down the voltage swing, while ensuring data integrity-in spite of the decreased signal to noise ratio-by means of encoding and retransmission schemes. In particular, we describe a closed-loop voltage swing controller that samples the error retransmission rate to determine the operational voltage swing. We present a control policy which achieves our goals with minimal complexity; such simplicity is demonstrated by implementing the policy in a synthesizable controller. Such a controller is an embodiment of a self-calibrating circuit that compensates for significant manufacturing parameter deviations and environmental variations. Experimental results show that energy savings amount up to 42%, while at the same time meeting performance requirements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synchronization overhead in SOC compressed test

    Page(s): 140 - 152
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (818 KB) |  | HTML iconHTML  

    Test data compression is an enabling technology for low-cost test. Compression schemes however, require communication between the system under test and the automated test equipment. This communication, referred to in this paper as synchronization overhead, may hinder the effective deployment of this new test technology for core-based systems-on-chip. This paper analyzes the sources of synchronization overhead and discusses the different tradeoffs, such as area overhead, test time and automatic test equipment extensions. A novel scalable and programmable on-chip distribution architecture is proposed, which addresses the synchronization overhead problem and facilitates the use of low cost testers for manufacturing test. The design of the proposed architecture is introduced in a generic framework, and the implementation issues (including the test controller and test set preparation) have been considered for a particular case. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VLSI implementation of new arithmetic residue to binary decoders

    Page(s): 153 - 158
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (241 KB) |  | HTML iconHTML  

    This paper introduces two arithmetic decoders that decode the residue number system into its binary equivalent. The first one deals with the moduli set: (2/sup n/,2/sup n/-1,2/sup n/+1,2/sup n/-2/sup (n+1/2)/+1,2/sup n/+2/sup (n+1/2)/+1), while the other deals with the moduli set: (2/sup n+1/,2/sup n/-1,2/sup n/+1,2/sup n/-2/sup (n+1/2)/+1,2/sup n/+2/sup (n+1/2)/+1), where n is odd. Compact forms for the multiplicative inverse of each modulus is introduced, which facilitates the implementation of these arithmetic decoders. The proposed hardware realizations for these decoders are based on using six carry save adders and one carry propagate adder. The hardware and time requirements of these decoders are much better than other similar decoders found in literature. A sub-micron silicon implementation for the decoder has been performed and reported. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Impact of on-chip interconnect frequency-dependent R(f)L(f) on digital and RF design

    Page(s): 158 - 162
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (305 KB) |  | HTML iconHTML  

    On-chip global interconnect exhibits clear frequency dependence in both resistance (R) and inductance ( L). In this paper, its impact on modern digital and radio frequency (RF) circuit design is examined. First, a physical and compact ladder circuit model is developed to capture this behavior, which only employs frequency independent R and L elements, and thus, supports transient analysis. Using this new model we demonstrate that the use of dc values for R and L is sufficient for timing analysis (i.e., 50% delay and slew rate) in digital designs. However, RL frequency dependence is critical for the analysis of signal integrity, shield line insertion, power supply stability, and RF inductor performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Call for participation for 2005 IEEE International Symposium on Circuits and Systems (ISCAS2005)

    Page(s): 163
    Save to Project icon | Request Permissions | PDF file iconPDF (523 KB)  
    Freely Available from IEEE
  • International Symposium on Low Power Electronics and Design (ISLPED'05)

    Page(s): 164
    Save to Project icon | Request Permissions | PDF file iconPDF (744 KB)  
    Freely Available from IEEE
  • Quality without compromise [advertisement]

    Page(s): 165
    Save to Project icon | Request Permissions | PDF file iconPDF (319 KB)  
    Freely Available from IEEE
  • Have you visited lately? www.ieee.org

    Page(s): 166
    Save to Project icon | Request Permissions | PDF file iconPDF (221 KB)  
    Freely Available from IEEE
  • IEEE copyright form

    Page(s): 167 - 168
    Save to Project icon | Request Permissions | PDF file iconPDF (1058 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (29 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu