By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 9 • Date Sept. 2006

Filter Results

Displaying Results 1 - 17 of 17
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (38 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (34 KB)  
    Freely Available from IEEE
  • A Lossless Data Compression and Decompression Algorithm and Its Hardware Architecture

    Page(s): 925 - 936
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1686 KB) |  | HTML iconHTML  

    In this paper, we propose a new two-stage hardware architecture that combines the features of both parallel dictionary LZW (PDLZW) and an approximated adaptive Huffman (AH) algorithms. In this architecture, an ordered list instead of the tree-based structure is used in the AH algorithm for speeding up the compression data rate. The resulting architecture shows that it not only outperforms the AH algorithm at the cost of only one-fourth the hardware resource but it is also competitive to the performance of LZW algorithm (compress). In addition, both compression and decompression rates of the proposed architecture are greater than those of the AH algorithm even in the case realized by software View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Speed Interpolation Architecture for Soft-Decision Decoding of Reed–Solomon Codes

    Page(s): 937 - 950
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (664 KB) |  | HTML iconHTML  

    Algebraic soft-decision decoding of Reed-Solomon (RS) codes delivers promising coding gains over conventional hard-decision decoding. The most computationally demanding step in soft-decision decoding of RS codes is bivariate polynomial interpolation. In this paper, we present a hybrid data format-based interpolation architecture that is well suited for high-speed implementation of the soft-decision decoders. It will be shown that this architecture is highly scalable and can be extensively pipelined. It also enables maximum overlap in time for computations at adjacent iterations. It is estimated that the proposed architecture can achieve significantly higher throughput than conventional designs with equivalent or lower hardware complexity View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Decimal Floating-Point Division

    Page(s): 951 - 961
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (589 KB) |  | HTML iconHTML  

    A new implementation for decimal floating-point (DFP) division is introduced. The algorithm is based on high-radix SRT division The SRT division algorithm is named after D. Sweeney, J. E. Robertson, and T. D. Tocher. with the recurrence in a new decimal signed-digit format. Quotient digits are selected using comparison multiples, where the magnitude of the quotient digit is calculated by comparing the truncated partial remainder with limited precision multiples of the divisor. The sign is determined concurrently by investigating the polarity of the truncated partial remainder. A timing evaluation using a logic synthesis shows a significant decrease in the division execution time in contrast with one of the fastest DFP dividers reported in the open literature View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Digital VLSI Implementation of Gaussian Mixture Models-Based Classifier

    Page(s): 962 - 974
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (929 KB) |  | HTML iconHTML  

    Gaussian mixture models (GMM)-based classifiers have shown increased attention in many pattern recognition applications. Improved performances have been demonstrated in many applications, but using such classifiers can require large storage and complex processing units due to exponential calculations and a large number of coefficients involved. This poses a serious problem for portable real-time pattern recognition applications. In this paper, first the performance of GMM and its hardware complexity are analyzed and compared with a number of benchmark algorithms. Next, an efficient digital hardware implementation is proposed. A number of design strategies are proposed in order to achieve the best possible tradeoffs between circuit complexity and real-time processing. First, a serial-parallel vector-matrix multiplier combined with an efficient pipelining technique is used. A novel exponential calculation circuit based on a linear piecewise approximation is proposed to reduce hardware complexity. The precision requirement of the GMM parameters in our classifier are also studied for various classification problems. The proposed hardware implementation features programmability and flexibility offering the possibility to use the proposed architecture for different applications with different topologies and precision requirements. To validate the proposed approach, a prototype was implemented in 0.25-mum CMOS technology and its operation was successfully tested for gas identification application View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Asynchronous Low-Power High-Performance Sequential Decoder Implemented With QDI Templates

    Page(s): 975 - 985
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1489 KB) |  | HTML iconHTML  

    This paper presents the design of a channel-based asynchronous sequential decoder implemented with quasi-delay-insensitive templates. The Powermill simulation results in TSMC 0.25-CMOS technology show that the circuit runs at 430 MHz and consumes 32 mW. Techniques to effectively partition and implement the top level design, the implementation of fast shift registers, memories, and various other structures are discussed. Compared to a previously designed synchronous Fano decoder, the asynchronous version consumes 1/3 the power and runs at 2.15 times the speed assuming standard process normalization. The design also highlights the introduction of a standard-cell library and back-end design flow for asynchronous designs based on precharged half buffer (PCHB) templates View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture and Compiler Optimizations for Data Bandwidth Improvement in Configurable Processors

    Page(s): 986 - 997
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1982 KB) |  | HTML iconHTML  

    Many commercially available embedded processors are capable of extending their base instruction set for a specific domain of applications. While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, the limited data bandwidth available in the core processor (e.g., the number of simultaneous accesses to the register file) becomes a potential performance bottleneck. In this paper, we first present a quantitative analysis of the data bandwidth limitation in configurable processors, and then propose a novel low-cost architectural extension and associated compilation techniques to address the problem. Specifically, we embed a single control bit in the instruction op-codes to selectively copy the execution results to a set of hash-mapped shadow registers in the write-back stage. This can efficiently reduce the communication overhead due to data transfers between the core processor and the custom logic. We also present a novel simultaneous global shadow register binding with a hash function generation algorithm to take full advantage of the extension. The application of our approach leads to a nearly optimal performance speedup View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Via-Configurable Routing Architectures and Fast Design Mappability Estimation for Regular Fabrics

    Page(s): 998 - 1009
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2744 KB) |  | HTML iconHTML  

    In this paper, we describe a new via-configurable routing architecture which shows a much better throughput and performance than the previous structures. We demonstrate how to construct a single-via-mask fabric to reduce the mask cost further, and we analyze the penalties which it incurs. To solve the routability problem commonly existing in fabric-based designs, an efficient white-space allocation and an incremental cell movement scheme are suggested, which help to provide a fast design convergence and early prediction of circuit's mappability to a given fabric View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping Data-Parallel Tasks Onto Partially Reconfigurable Hybrid Processor Architectures

    Page(s): 1010 - 1023
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1155 KB) |  | HTML iconHTML  

    Reconfigurable hybrid processor systems provide a flexible platform for mapping data-parallel applications, while providing considerable speedup over software implementations. However, the overhead for reconfiguration presents a significant deterrent in mapping applications onto reconfigurable hardware. Partial runtime reconfiguration is one approach to reduce the reconfiguration overhead. In this paper, we present a methodology to map data-parallel tasks onto hardware that supports partial reconfiguration. The aim is to obtain the maximum possible speedup, for a given reconfiguration time, bus speed, and computation speed. The proposed approach involves using multiple, identical but independent processing units in the reconfigurable hardware. Under nonzero reconfiguration overhead, we show that there exists an upper limit on the number of processing units that can be employed beyond which further reduction in execution time is not possible. We obtain solutions for the minimum processing time, the corresponding load distribution, and schedule for data transfer. To demonstrate the applicability of the analysis, we present the following: 1) various plots showing the variation of processing time with different parameters; 2) hardware simulations for two examples, viz., 1-D discrete wavelet transform and finite impulse response filter, targeted to Xilinx field-programmable gate arrays (FPGAs); and 3) experimental results for a hardware prototype implemented on a FPGA board View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application-Dependent Testing of FPGAs

    Page(s): 1024 - 1033
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (678 KB) |  | HTML iconHTML  

    Testing techniques for interconnect and logic resources of an arbitrary design implemented into a field-programmable gate array (FPGA) are presented. The target fault list includes all stuck-at, open, and pair-wise bridging faults in the mapped design. For interconnect testing, only the configuration of the used logic blocks is changed, and the structure of the design remains unchanged. For logic block testing, the configuration of used logic resources remains unchanged, while the interconnect configuration and unused logic resources are modified. Logic testing is performed in only one test configuration whereas interconnect testing is done in a logarithmic number of test configurations. This approach is able to achieve 100% fault coverage View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis

    Page(s): 1034 - 1039
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (482 KB)  

    Power consumption in datapath modules due to redundant switching is an important design concern for high-performance applications. Operand isolation schemes that reduce this redundant switching incur considerable overhead in terms of delay, power, and area. This paper presents novel operand isolation techniques based on supply gating that reduce overheads associated with isolating circuitry. The proposed schemes also target leakage minimization and additional operand isolation at the internal logic of datapath to further reduce power consumption. We integrate the proposed techniques and power/delay models to develop a synthesis flow for low-power datapath synthesis. Simulation results show that the proposed operand isolation techniques achieve at least 40% reduction in power consumption compared to original circuit with minimal area overhead (5%) and delay penalty (0.15%) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hybrid-Scheduling for Reduced Energy Consumption in High-Performance Processors

    Page(s): 1039 - 1043
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB) |  | HTML iconHTML  

    This paper develops a technique that uniquely combines the advantages of compile-time static scheduling and hardware dynamic scheduling to reduce energy consumption in dynamically scheduled processors. In this hybrid-scheduling paradigm, regions of the application containing large amounts of parallelism visible at compile-time bypass the dynamic scheduling hardware and execute in a low-power static mode. Experiments on several media and scientific benchmarks demonstrate that the proposed scheme can provide significant reduction in energy consumption with negligible performance degradation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimal Energy Asynchronous Dynamic Adders

    Page(s): 1043 - 1047
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (306 KB) |  | HTML iconHTML  

    In battery-operated portable or implantable digital devices, where battery life needs to be maximized, it is necessary to minimize not only power consumption but also energy dissipation. Typical energy optimization measures include voltage reduction and operating at the slowest possible speed. We employ additional methods, including hybrid asynchronous dynamic design to enable operating over a wide range of battery voltage, aggregating large combinational logic blocks, and transistor sizing and reordering. We demonstrate the methods on simple adders, and discuss extension to other circuits. Three novel adders are proposed and analyzed: a 2-bit pass transistor logic (PTL) adder and two dynamic 2-bit adders. Circuit simulations on a 0.18-mum process at low voltage show that leakage energy is below 1%. The proposed adders achieve up to 40% energy savings relative to previously published results, while also operating faster View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Explore IEL IEEE's most comprehensive resource [advertisement]

    Page(s): 1048
    Save to Project icon | Request Permissions | PDF file iconPDF (340 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (25 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu