By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 5 • Date Oct. 2001

Filter Results

Displaying Results 1 - 16 of 16
  • A nonseparable VLSI architecture for two-dimensional discrete periodized wavelet transform

    Page(s): 565 - 576
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (337 KB)  

    A modified two-dimensional (2-D) discrete periodized wavelet transform (DPWT) based on the homeomorphic high-pass filter and the 2-D operator correlation algorithm is developed in this paper. The advantages of this modified 2-D DPWT are that it can reduce the multiplication counts and the complexity of boundary data processing in comparison to other conventional 2-D DPWT for perfect reconstruction. In addition, a parallel-pipeline architecture of the nonseparable computation algorithm is also proposed to implement this modified 2-D DPWT. This architecture has properties of noninterleaving input data, short bus width request, and short latency. The analysis of the finite precision performance shows that nearly half of the bit length can be saved by using this nonseparable computation algorithm. The operation of the boundary data processing is also described in detail. In the three-stage decomposition of an N/spl times/N image, the latency is found to be N/sup 2/+2N+18. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Technology mapping for high-performance static CMOS and pass transistor logic designs

    Page(s): 577 - 589
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (249 KB)  

    Two new techniques for mapping circuits are proposed in this paper. The first method, called the odd-level transistor replacement (OTR) method, has a goal that is similar to that of technology mapping, but without the restriction of a fixed library size and maps a circuit to a virtual library of complex static CMOS gates. The second technique, the static CMOS/pass transistor logic (PTL) method, uses a mix of static CMOS and PTL to realize the circuit and utilizes the relation between PTL and binary decision diagrams. The methods are very efficient and can handle all of the ISCAS'85 benchmark circuits in minutes. A comparison of the results with traditional technology mapping using SIS on different libraries shows an average delay reduction above 18% for OTR, and an average delay reduction above 35% for the static CMOS/PTL method, with significant savings in the area. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A low-power high-performance current-mode multiport SRAM

    Page(s): 590 - 598
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (353 KB) |  | HTML iconHTML  

    This paper presents a new approach for energy reduction and speed improvement of multiport SRAMs. The key idea is to use current-mode for both read and write operations. To toggle a memory cell, a very small voltage swing is first created on the high-capacitive bit lines. This voltage is then translated into a differential current being injected into the cell, which in turn allows complementary potential to be developed on the cell nodes. As compared to the conventional write approach, SPICE simulations using a 0.35-/spl mu/m CMOS process have shown 2.8 to 9.9/spl times/ in energy savings and 1.02 to 6.36/spl times/ reduction in delay, for memory sizes of 32 to 1 K words. We also present a current-mode sense-amplifier that operates in a similar fashion as the write circuit. The design and implementation of a pipelined 32/spl times/64 three-port register file utilizing the proposed technique is described. Measurements of the register file chip have confirmed the feasibility of the approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiclock selection and synthesis for CDFGs using optimal clock sets and genetic algorithms

    Page(s): 599 - 607
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (175 KB) |  | HTML iconHTML  

    Selecting a clock period is an essential step in implementing hardware from behavioral descriptions. Current methods either estimate the clock prior to scheduling or involve exhaustive runs of the high-level synthesis tools to obtain a globally optimum clock period. Further, the potential benefits of allowing the use of multiple clocks for performance optimization has not been investigated. This paper presents a clock selection method that works simultaneously with synthesis by selecting a clock from an optimal clock set. The synthesis is iterative and is optimized by evolutionary techniques. The method is very flexible and can accommodate a large set of potentially optimal clocks. We also present multirate clock synthesis with path-dependent clock selection where different paths in a control data flow graph (CDFG) are optimized with different clock periods. The results shown prove the method's effectiveness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Power estimation in adiabatic circuits: a simple and accurate model

    Page(s): 608 - 615
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (191 KB) |  | HTML iconHTML  

    A simple procedure to evaluate the energy consumption of adiabatic gate circuits is proposed and validated. The proposed strategy is based on a linearization of the circuit and simplifying the analytical result obtained on the equivalent network. The approach leads to simple relationships which can be used for a pencil-and-paper evaluation or implemented on software. The accuracy of the results is validated by means of Spice simulations on an adiabatic full adder designed with a 0.8 /spl mu/m technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On gate level power optimization using dual-supply voltages

    Page(s): 616 - 629
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (428 KB) |  | HTML iconHTML  

    In this paper, we present an approach for applying two supply voltages to optimize power in CMOS digital circuits under the timing constraints. Given a technology-mapped network, we first analyze the power/delay model and the timing slack distribution in the network. Then a new strategy is developed for timing-constrained optimization issues by making full use of stacks. Based on this strategy, the power reduction is translated into the polynomial-time-solvable maximal-weighted-independent-set problem on transitive graphs. Since different supply voltages used in the circuit lead to totally different power consumption, we propose a fast heuristic approach to predict the optimum dual-supply voltages by looking at the lower bound of power consumption in the given circuit. To deal with the possible power penalty due to the level converters at the interface of different supply voltages, we use a "constrained F-M" algorithm to minimize the number of level converters. We have implemented our approach under an SIS environment. Experiment shows that the resulting lower bound of power is tight for most circuits and that the predicted "optimum" supply voltages are exactly or very close to the best choice of actual ones. The total power saving of up to 26% (average of about 20%) is achieved without degrading the circuit performance, compared to the average power improvement of about 7% by the gate sizing technique based on a standard cell library. Our technique provides the power-delay tradeoff by specifying different timing constraints in circuits for power optimization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discrete-time battery models for system-level low-power design

    Page(s): 630 - 640
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB)  

    For portable applications, long battery lifetime is the ultimate design goal. Therefore, the availability of battery and voltage converter models providing accurate estimates of battery lifetime is key for system-level low-power design frameworks. In this paper, we introduce a discrete-time model for the complete power supply subsystem that closely approximates the behavior of its circuit-level continuous-time counterpart. The model is abstract and efficient enough to enable event-driven simulation of digital systems described at a very high level of abstraction and that includes, among their components, also the power supply. The model gives the designer the possibility of estimating battery lifetime during system-level design exploration, as shown by the results we have collected on meaningful case studies. In addition, it is flexible and it can thus be employed for different battery chemistries. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-speed architectures for Reed-Solomon decoders

    Page(s): 641 - 655
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (381 KB) |  | HTML iconHTML  

    New high-speed VLSI architectures for decoding Reed-Solomon codes with the Berlekamp-Massey algorithm are presented in this paper. The speed bottleneck in the Berlekamp-Massey algorithm is in the iterative computation of discrepancies followed by the updating of the error-locator polynomial. This bottleneck is eliminated via a series of algorithmic transformations that result in a fully systolic architecture in which a single array of processors computes both the error-locator and the error-evaluator polynomials. In contrast to conventional Berlekamp-Massey architectures in which the critical path passes through two multipliers and 1+[log/sub 2/,(t+1)] adders, the critical path in the proposed architecture passes through only one multiplier and one adder, which is comparable to the critical path in architectures based on the extended Euclidean algorithm. More interestingly, the proposed architecture requires approximately 25% fewer multipliers and a simpler control structure than the architectures based on the popular extended Euclidean algorithm. For block-interleaved Reed-Solomon codes, embedding the interleaver memory into the decoder results in a further reduction of the critical path delay to just one XOR gate and one multiplexer, leading to speed-ups of as much as an order of magnitude over conventional architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Narrow bus encoding for low-power DSP systems

    Page(s): 656 - 660
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (107 KB) |  | HTML iconHTML  

    High levels of integration in integrated circuits often lead to the problem of running out of pins. Narrow data buses can be used to alleviate this problem provided that the degraded performance due to wait cycles can be tolerated. We address bus coding methods for low-power core-based systems incorporating narrow buses. We show that transition signaling combined with bus-invert coding, which we call BITS coding, is particularly suitable for the data patterns of typical DSP applications on narrow data buses. The application of BITS coding to real circuit design is limited by the extra bus line introduced, which changes the pinout of the chip. We propose a new coding method, which does not require the extra bus line but retains the advantage of BITS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Delay fault testing of IP-based designs via symbolic path modeling

    Page(s): 661 - 678
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (767 KB) |  | HTML iconHTML  

    Predesigned blocks called intellectual property (IP) cores are increasingly used for complex system-on-a-chip (SoC) designs. The implementation details of IP cores are often unknown or unavailable, so delay testing of such designs is difficult. We propose a method that can test paths traversing both IP cores and user-defined blocks, an increasingly important but little-studied problem. It models representative paths in IP circuits using an efficient form of binary decision diagram (BDD) and generates test vectors from the BDD model. We also present a partitioning technique, which reduces the BDD size by orders of magnitude and makes the proposed method practical for large designs. Experimental results are presented that show that it robustly tests selected paths without using extra logic and, at the same time, protects the intellectual contents of IP cores. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resynthesis of combinational logic circuits for improved path delay fault testability using comparison units

    Page(s): 679 - 689
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (218 KB)  

    We propose a resynthesis method that modifies a given circuit to reduce the number of paths in the circuit and thus improve its path delay fault testability. The resynthesis procedure is based on replacing subcircuits of the given circuit by structures called comparison units. A subcircuit can be replaced by a comparison unit if it implements a function belonging to the class of comparison functions defined here. Comparison units are fully testable for stuck-at faults and for path delay faults. In addition, they have small numbers of paths and gates. These properties make them effective building blocks for resynthesis to improve the path delay fault testability of a circuit. Experimental results demonstrate considerable reductions in the number of paths and increased path delay fault testability. These are achieved without increasing the number of gates, or the number of gates along the longest path in the circuit. The random pattern testability for stuck-at faults remains unchanged. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling of mixed control and dataflow systems in MASCOT

    Page(s): 690 - 703
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (184 KB) |  | HTML iconHTML  

    The Matlab and SDL Codesign Technique (MASCOT) method integrates modeling of dataflow and control dominated parts at the system level. Based on the established languages specification and description language (SDL) and Matlab, MASCOT provides a modeling and simulation technique which realizes the communication and synchronization between the two domains. Moreover, it offers modeling guidelines for a disciplined and efficient way of using the technique. Most of the tedious details of modeling synchronization and communication is handled automatically and is transparent to the user. Consequently, the user can focus on the application and on the important tradeoffs to be made at the system level. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical skew modeling for general clock distribution networks in presence of process variations

    Page(s): 704 - 717
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (354 KB) |  | HTML iconHTML  

    Clock skew modeling is important in the performance evaluation and prediction of clock distribution networks. This paper addresses the problem of statistical skew modeling for general clock distribution networks in the presence of process variations. The only available statistical skew model is not suitable for modeling the clock skews of general clock distribution networks in which clock paths are not identical. The old model is also too conservative for estimating the clock skew of a well-balanced clock network that has identical but strongly correlated clock paths (for instance, a well-balanced H-tree). In order to provide a more accurate and more general statistical skew model for general clock distributions, we propose a new approach to estimating the mean values and variances of both clock skews and the maximal clock delay of general clock distribution networks. Based on the new approach, a closed-form model is also obtained for well-balanced H-tree clock distribution networks. The paths delay correlation caused by the overlapped parts of path lengths is considered in the new approach, so the mean values and the variances of both clock skews and the maximal clock delay are accurately estimated for general clock distribution networks. This enables an accurate estimate of yields of both clock skew and maximal clock delay to be made for a general clock distribution network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On effective I/sub DDQ/ testing of low-voltage CMOS circuits using leakage control techniques

    Page(s): 718 - 725
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (304 KB)  

    The use of low-threshold devices in low-voltage CMOS circuits leads to an exponential increase in the intrinsic leakage current. This threatens the effectiveness of I/sub DDQ/ testing for such low-voltage circuits because it is difficult to differentiate a defect-free circuit from defective circuits. Recently, several leakage control techniques have been proposed to reduce intrinsic leakage current, which may benefit I/sub DDQ/ testing. In this paper, we investigate the possibilities of applying different leakage control techniques to improve the fault coverage of I/sub DDQ/ testing. Results on a large number of benchmarks indicate that dual-threshold and vector control techniques can be very effective in improving fault coverage for I/sub DDQ/ testing for some circuits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Highly parallel and energy-efficient exhaustive minimum distance search engine using hybrid digital/analog circuit techniques

    Page(s): 726 - 729
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB) |  | HTML iconHTML  

    A minimum distance search engine (MDSE) is presented as a hardware accelerator for various exhaustive pattern-matching systems. This chip executes highly parallel computations of L/sub 1/-norms between an input query and stored multiple reference records, and searches for the minimum distance among them in a highly parallel fashion. Our architectural-level estimation shows that this MDSE can reduce energy dissipation by orders of magnitude as the number of records increases, compared with the conventional systems. We have designed a prototype 4-bit 8-word MDSE composed of merged memory logic (MML) and digital/analog-mixed winner-take-all circuit (DAM-WTAC) by using hybrid digital/analog circuit techniques. It was fabricated with a 0.6-/spl mu/m single-poly triple-metal CMOS technology. Experimental results show that our chip works properly at 3 V/10 MHz and has approximately four times larger throughput as well as four times higher energy efficiency, compared with the existing 8-bit microcontrollers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An on-chip march pattern generator for testing embedded memory cores

    Page(s): 730 - 735
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (180 KB) |  | HTML iconHTML  

    In this correspondence, we propose an effective approach to integrate 40 existing march algorithms into an embedded low hardware overhead test pattern generator to test the various kinds of word-oriented memory cores. Each march algorithm is characterized by several sets of up/down address orders, read/write signals, read/write data, and lengths of read/write operations. These characteristics are stored on chip so that any desired march algorithm can be generated with very little external control. An efficient procedure to reduce the memory storage for these characteristics is presented. We use only two programmable cyclic shift registers to generate the various read/write signals and data within the steps of the algorithms. Therefore, the proposed pattern generator is capable of generating any march algorithm with small area overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu