By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 2 • Date April 2003

Filter Results

Displaying Results 1 - 15 of 15
  • Driver modeling and alignment for worst-case delay noise

    Page(s): 157 - 166
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (411 KB)  

    In this paper, we present a new approach to model the impact of cross-coupling noise on interconnect delay. We introduce a new linear driver model that accurately models the noise pulse induced on a switching signal net due to cross-coupling capacitance. The proposed model effectively captures the nonlinear behavior of the victim-driver gate during its transition and has an average error below 8% whereas the traditional approach using a Thevenin model incurs an average error of 48%. The proposed linear driver model enables the use of linear superposition which allows the analysis of large interconnects and an efficient determination of the worst-case transition times of the aggressor nets. We proposed a new approach to determine the worst-case alignment of the aggressor net transitions with respect to the victim net transition, emphasizing the need to maximize not merely the delay of the interconnect alone but the combined delay of the interconnect and receiver gate. We show that in the presence of multiple aggressor nets, the worst case delay may occur when their noise peaks are not aligned, although the error incurred from aligning all peaks is small in practice. We then show that the worst-case alignment time of the combined noise pulse from all aggressors with respect to the victim transition is a nonlinear function of the receiver gate output loading, the victim transition time, and the noise pulsewidth and height. To efficiently compute the worst-case alignment time, we propose a new representation of the alignment such that it closely fits a linear function of the input variables. The worst-case alignment time is then computed for a gate using a precharacterization approach, requiring only eight sample points while maintaining a small error. The proposed methods were implemented in an industrial noise analysis tool called ClariNet. Results on industrial designs, including a large PPCmicroprocessor design, are presented to demonstrate the effectiveness of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A synthesis-for-transparency approach for hierarchical and system-on-a-chip test

    Page(s): 167 - 179
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1990 KB)  

    Test data propagation through modules and test vector translation are two major challenges encountered in hierarchical testing. We propose a new synthesis-for-test approach in which multiplexers are embedded in the behavioral models of the various modules constituting a hierarchical system. This approach can also be applied to system-on-a-chip designs in which synthesizable models are available for the embedded cores. The embedded multiplexers provide complete single-cycle transparency, thereby offering a straightforward yet effective solution to the problems of test data propagation and test vector translation. In order to determine module I/O bitwidths for single-cycle transparency, a global analysis is carried out using a graph-theoretic framework and an optimization method based on integer linear programming. Case studies using high-level synthesis benchmarks and an industrial-strength benchmark show that synthesis for transparency introduces very little area and performance overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ground bounce in digital VLSI circuits

    Page(s): 180 - 193
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (914 KB) |  | HTML iconHTML  

    This paper is concerned with the analysis and optimization of the ground bounce in digital CMOS circuits. First, an analytical method for calculating the ground bounce is presented. The proposed method relies on accurate models of the short-channel MOS device and the chip-package interface parasitics. Next the effect of ground bounce on the propagation delay and the optimum tapering factor of a multistage buffer is discussed and a mathematical relationship for total propagation delay in the presence of the ground bounce is obtained. Effect of an on-chip decoupling capacitor on the ground bounce waveform and circuit speed is analyzed next and a closed form expression for the peak value of the differential-mode component of the ground bounce in terms of the on-chip decoupling capacitor is provided. Finally, a design methodology for controlling the switching times of the output drivers to minimize the ground bounce is presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A true single-phase energy-recovery multiplier

    Page(s): 194 - 207
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1284 KB) |  | HTML iconHTML  

    In this paper, we present the design and experimental evaluation of an 8-bit energy-recovery multiplier with built-in self-test logic and an internal single-phase sinusoidal power-clock generator. Both the multiplier and the built-in self-test have been designed in SCAL-D, a true single-phase adiabatic logic family. Fabricated in a 0.5-/spl mu/m standard n-well CMOS process, the chip has an active area of 0.47 mm/sup 2/. Correct chip operation has been verified for clock rates up to 140 MHz. Moreover, chip dissipation measurements correlate well with HSPICE simulation results. For a selection of biasing conditions that yield correct operation at 140 MHz, total measured average dissipation for the multiplier and the power-clock generator is 250 pJ per operation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures

    Page(s): 208 - 217
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (632 KB) |  | HTML iconHTML  

    Recently, FPGAs (field programmable gate arrays) technology have made significant advances in both speed and capacity. Millions of logic gates are now available for reconfiguration programming. To fully exploit the potential of so many programmable devices, powerful design methodology must be developed. In this paper, we propose a novel systematic computer-aided design methodology that can efficiently implement deeply nested do-loop algorithms on a FPGA. Specifically, our design methodology maps the loop dependence graph onto a linear array of locally connected processing elements to exploit parallelism. Due to the regular structure of this linear array of processors, it can be easily implemented on a FPGA. While this method is based on conventional systolic array design methodology, our proposed approach exhibits two distinct features that contribute to its superior performance: 1) We developed a novel multiple-order dependence graph representation that is able to efficiently represent distinct, yet correct algorithm execution orders. 2) We developed new FPGA-specific architectural constraints during the mapping process. As such, FPGA implementations based on our approach will utilize much fewer lookup tables while achieving superior performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 50-MHz dB-linear programmable-gain amplifier with 98-dB dynamic range and 2-dB gain steps for 3 V power supply

    Page(s): 218 - 223
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (539 KB) |  | HTML iconHTML  

    A programmable-gain amplifier (PGA) circuit introduced in this paper has a dynamic gain range of 98 dB with 2 dB gain steps and is controlled by 6-bit gain control bits for a 3 V power supply. It has been fabricated in a 0.5 /spl mu/m 15 GHz f/sub T/ Si BiCMOS process and draws 13 mA. The active die area taken up by the circuit is 400 /spl mu/m /spl times/ 1170 /spl mu/m. A noise figure (NF) of 4.9 dB was measured at the maximum gain setting. In addition, an analysis of the bias current generation to provide dB-linear gain control is presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximizing throughput over parallel wire structures in the deep submicrometer regime

    Page(s): 224 - 243
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1386 KB)  

    In a parallel multiwire structure, the exact spacing and size of the wires determine both the resistance and the distribution of the capacitance between the ground plane and the adjacent signal carrying conductors, and have a direct effect on the delay. Using closed-form equations that map the geometry to the wire parasitics and empirical switch factor based delay models that show how repeaters can be optimized to compensate for dynamic effects, we devise a method of analysis for optimizing throughput over a given metal area. This analysis is used to show that there is a clear optimum configuration for the wires which maximizes the total bandwidth. Additionally, closed form equations are derived, the roots of which give close to optimal solutions. It is shown that for wide buses, the optimal wire width and spacing are independent of the total width of the bus, allowing easy optimization of on-chip buses. Our analysis and results are valid for lossy interconnects as are typical of wires in submicron technologies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-performance FIR filter design based on sharing multiplication

    Page(s): 244 - 253
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (624 KB) |  | HTML iconHTML  

    Finite impulse response (FIR) filtering can be expressed as multiplications of vectors by scalars. We present high-speed designs for FIR filters based on a computation sharing multiplier which specifically targets computation re-use in vector-scalar products. The performance of the proposed implementation is compared with implementations based on carry-save and Wallace tree multipliers in 0.35-/spl mu/m technology. We show that sharing multiplier scheme improves speed by approximately 52 and 33% with respect to the FIR filter implementations based on the carry-save multiplier and Wallace tree multiplier, respectively. In addition, sharing multiplier scheme has a relatively small power delay product than other multiplier schemes. Using voltage scaling, power consumption of the FIR filter based on computation sharing multiplier can be reduced to 41% of the FIR filter based on the Wallace tree multiplier for the same frequency of operation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-efficiency bounds for deep submicron VLSI systems in the presence of noise

    Page(s): 254 - 269
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (866 KB) |  | HTML iconHTML  

    In this paper, we present an algorithm for computing the bounds on energy-efficiency of digital very large scale integration (VLSI) systems in the presence of deep submicron noise. The proposed algorithm is based on a soft-decision channel model of noisy VLSI systems and employs information-theoretic arguments. Bounds on energy-efficiency are computed for multimodule systems, static gates, dynamic circuits and noise-tolerant dynamic circuits in 0.25-/spl mu/m CMOS technology. As the complexity of the proposed algorithm grows linearly with the size of the system, it is suitable for computing the bounds on energy-efficiency for complex VLSI systems. A key result presented is that noise-tolerant dynamic circuits offer the best trade off between energy-efficiency and noise-immunity when compared to static and domino circuits. Furthermore, employing a 16-bit noise-tolerant Manchester adder in a CDMA receiver, we demonstrate a 31.2%-51.4% energy reduction over conventional systems when operating in the presence of noise. In addition, we compute the lower bounds on energy dissipation for this CDMA receiver and show that these lower bounds are 2.8/spl times/ below the actual energy consumed, and that noise-tolerance reduces the gap between the lower bounds and actual energy dissipation by a factor of 1.9/spl times/. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variable voltage task scheduling algorithms for minimizing energy/power

    Page(s): 270 - 276
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (435 KB) |  | HTML iconHTML  

    In this paper, we propose variable voltage task scheduling algorithms that minimize energy or minimize peak power for the case when the task arrival times, deadline times, execution times, periods, and switching activities are given. We consider aperiodic (earliest due date, earliest deadline first), as well as periodic (rate monotonic, earliest deadline first) scheduling algorithms. We use the Lagrange multiplier method to theoretically determine the relation between the task voltages such that the energy or peak power is minimum, and then develop an iterative algorithm that satisfies the relation. The asymptotic complexity of the existing scheduling algorithms change very mildly with the application of the proposed algorithms. We show experimentally (random experiments as well as real-life cases), that the voltage assignment obtained by the proposed low-complexity algorithm is very close to that of the optimal energy (0.1% error) and optimal peak power (1% error) assignment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Designing fast on-chip interconnects for deep submicrometer technologies

    Page(s): 276 - 280
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (262 KB)  

    This paper proposes a solution to the problem of improving the speed of on-chip interconnects, or wire delay, for deep submicron technologies where coupling capacitance dominates the total line capacitance. Simultaneous redundant switching is proposed to reduce interconnect delays. It is shown to reduce delay more than 25% for a 10-mm long interconnect in a 0.12-/spl mu/m CMOS process compared to using shielding and increased spacing. The paper also proposes possible design approaches to reduce the delay in local interconnects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New power-of-2 RNS scaling scheme for cell-based IC design

    Page(s): 280 - 283
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (306 KB)  

    Previous scaling schemes are based on the conversion of the unpositional residue number system (RNS) digits into a positional number system via Chinese remainder theorem (CRT) or mixed-radix-conversion (MRC) and the back conversion into RNS with an associated size and speed penalty in cell-based integrated circuit (CBIC) designs. This paper presents a new scaling approach, which allows faster and more efficient schemes, because the scaling uses only RNS operations within the small word length channels. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Routing on field-programmable switch matrices

    Page(s): 283 - 287
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (351 KB)  

    In this paper, we address the problem of routing nets on field programmable gate arrays (FPGAs) interconnected by a switch matrix. We extend the switch matrix architecture proposed by Zhu et al. (1993) to route nets between FPGA chips in a multi-FPGA system. Given a limited number of routing resources in the form of programmable connection points within a two-dimensional switch matrix, this problem examines the issue of how to route a given net traffic through the switch matrix structure. First, we define the problem as a general undirected graph in which each vertex has one single color among six possible colors and formulate it as a constraint satisfaction problem. This is further modeled as a 0-1 multidimensional knapsack problem for which a fast approximate solution is applied. Experimental results show that the accuracy of our proposed heuristic is quite high for moderately large switch matrices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-speed VLSI architecture for parallel Reed-Solomon decoder

    Page(s): 288 - 294
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB) |  | HTML iconHTML  

    This paper presents high-speed parallel Reed-Solomon (RS) (255,239) decoder architecture using modified Euclidean algorithm for the high-speed multigigabit-per-second fiber optic systems. Pipelining and parallelizing allow inputs to be received at very high fiber-optic rates and outputs to be delivered at correspondingly high rates with minimum delay. A parallel processing architecture results in speed-ups of as much as or more than 10 Gb, since the maximum achievable clock frequency is generally bounded by the critical path of the modified Euclidean algorithm block. The parallel RS decoders have been designed and implemented with the 0.13-/spl mu/m CMOS standard cell technology in a supply voltage of 1.1 V. It is suggested that a parallel RS decoder, which can keep up with optical transmission rates, i.e., 10 Gb/s and beyond, could be implemented. The proposed channel = 4 parallel RS decoder operates at a clock frequency of 770 MHz and has a data processing rate of 26.6 Gb/s. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu