By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 11 • Date Nov. 2008

Filter Results

Displaying Results 1 - 18 of 18
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (38 KB)  
    Freely Available from IEEE
  • Test Data Compression Using Selective Encoding of Scan Slices

    Page(s): 1429 - 1440
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (759 KB) |  | HTML iconHTML  

    We present a selective encoding method that reduces test data volume and test application time for scan testing of intellectual property (IP) cores. This method encodes the slices of test data that are fed to the scan chains in every clock cycle. To drive N scan chains, we use only c tester channels, where c=lceillog2(N+1)rceil+2 . In the best case, we can achieve compression by a factor of N/c using only one tester clock cycle per slice. We derive a sufficient condition on the distribution of care bits that allows us to achieve the best-case compression. We also derive a probabilistic lower bound on the compression for a given care-bit density. Unlike popular compression methods such as embedded deterministic test (EDT), the proposed approach is suitable for IP cores because it does not require structural information for fault simulation, dynamic compaction, or interleaved test generation. The on-chip decoder is small, independent of the circuit under test and the test set, and it can be shared between different circuits. We present compression results for a number of industrial circuits and compare our results to other recent compression methods targeted at IP cores. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Systematic Software-Based Self-Test for Pipelined Processors

    Page(s): 1441 - 1453
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1280 KB) |  | HTML iconHTML  

    Software-based self-test (SBST) has recently emerged as an effective methodology for the manufacturing test of processors and other components in systems-on-chip (SoCs). By moving test related functions from external resources to the SoC's interior, in the form of test programs that the on-chip processor executes, SBST significantly reduces the need for high-cost, big-iron testers, and enables high-quality at-speed testing and performance binning. Thus far, SBST approaches have focused almost exclusively on the functional (programmer visible) components of the processor. In this paper, we analyze the challenges involved in testing an important component of modern processors, namely, the pipelining logic, and propose a systematic SBST methodology to address them. We first demonstrate that SBST programs that only target the functional components of the processor are not sufficient to test the pipeline logic, resulting in a significant loss of overall processor fault coverage. We further identify the testability hotspots in the pipeline logic using two fully pipelined reduced instruction set computer (RISC) processor benchmarks. Finally, we develop a systematic SBST methodology that enhances existing SBST programs so that they comprehensively test the pipeline logic. The proposed methodology is complementary to previous SBST techniques that target functional components (their results can form the input to our methodology, and thus we can reuse the test development effort behind preexisting SBST programs). We automate our methodology and incorporate it in an integrated software environment (developed using Java, XML, and archC) for the automatic generation of SBST routines for microprocessors. We apply the methodology to the two complex benchmark RISC processors with respect to two fault models: stuck-at fault model and transition delay fault model. Simulation results show that our methodology provides significant improvements for the two fault models, both for the ent- - ire processor (12% fault coverage improvement on average) and for the pipeline logic itself (19% fault coverage improvement on average), compared to a conventional SBST approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Memory Access Management for High-Performance DSP Applications Using High-Level Synthesis

    Page(s): 1454 - 1464
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1609 KB) |  | HTML iconHTML  

    Multimedia applications such as video and image processing are often characterized by a huge number of data accesses. In many digital signal processing applications, array access patterns are regular and periodic. In these cases, optimized architectures using pipelined memory access controllers can be generated. In this paper, we focus on implementing memory interfacing modules that can be automatically generated from a high-level synthesis tool and which can efficiently handle predictable address patterns as well as random ones (i.e., dynamic address computations). The benefits of balancing dynamic address computations from datapath to dedicated computation units in the memory controller is also analyzed as well as operator bitwidth optimization and data locality to save power consumption and reduce latency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware Supported Task Scheduling on Dynamically Reconfigurable SoC Architectures

    Page(s): 1465 - 1474
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (555 KB) |  | HTML iconHTML  

    Dynamically reconfigurable system-on-a-chip (RSoC) technology features embedded microprocessors that are dispersed on the same die with significant amounts of programmable logic fabric. In this paper, we present a strategy to solve the recently emerging problem of how to utilize the flexible but still limited RSoC resources in an effective manner for a multi-task application. The major contribution of this paper is the development of a dynamic task scheduling algorithm that can be implemented in fixed or reconfigurable hardware that will perform the online scheduling of task systems onto the RSoC type architecture. The results from extensive simulations demonstrate the benefits of the proposed dynamic scheduling approach as compared with that of other static scheduling techniques taken from the technical literature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hotspots Elimination and Temperature Flattening in VLSI Circuits

    Page(s): 1475 - 1487
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2591 KB) |  | HTML iconHTML  

    This paper proposes a new solution to the problem of eliminating hotspots from gate-level netlists as well as examines the effects of timing constraints on the temperature reduction and the overall temperature flattening on the chip. Our core technique consists of three steps. First, a thermal analysis is carried out for logic netlists. (The netlists are assumed to be either isolated or embedded in a larger system with macro-cells.) We then apply a new technique, called isothermal logic partitioning technique ( LP-temp), to the netlists, which essentially builds isothermal logic clusters for the netlists and splits each of the logic clusters exceeding the maximum allowed temperature through its hottest point. This will enlarge a contact point for the hotspot to cool down. Finally, the entire system is replaced using a custom designed temperature-aware floorplanner so that the temperature across the entire system is reduced and flattened. We have developed a thermal-aware design flow, integrating our thermal-aware logic partitioning technique with a timing and thermal-aware floorplanner. Two cases were analyzed: (tight timing) LP-temp combined with the timing and thermal-aware floorplanner, where the partitioned units by LP-temp are replaced locally considering a tight timing budget (5% timing degradation); (loose timing) LP-temp combined with thermal-aware replacement, considering a loose timing budget (10% timing degradation). From experimentations using a set of benchmark designs, it is confirmed that our temperature reduction technique is effective, generating designs with an average of 5.54% and 9.9% more reduction of peak temperature (on average) for the cases of tight and loose timing than that of the designs by a conventional thermal-aware floorplanner without using LP-temp, respectively. We also analyzed the effect of our proposed technique on field-programmable gate arrays (FPGAs) in order to contrast its effectiveness on systems with hotspots on hardmacros- - . Results show that our technique can reduce the temperature in these systems on average 3.40% and 6.61% for the case of loose and tight timing constraints respectively compared to the thermal-aware floorplanner without using LP-temp . View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Design-Specific and Thermally-Aware Methodology for Trading-Off Power and Performance in Leakage-Dominant CMOS Technologies

    Page(s): 1488 - 1498
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1163 KB) |  | HTML iconHTML  

    As CMOS technology scales deeper into the nanometer regime, factors such as leakage power and chip temperature emerge as critically important concerns for high-performance VLSI design. Consequently, enhancing processing performance is no longer the most important factor that dominates future circuit design considerations. This paper, for the first time, proposes a systematic methodology to determine a generalized design optimization metric for simultaneously trading-off power and performance in nanometer scale integrated circuits to achieve design-specific targets. The methodology incorporates interconnect effects as well as electrothermal couplings between substrate temperature, power, and performance for nanometer scale design optimization. Implications of choosing a specific design optimization metric on power, performance, and operating temperature are illustrated and discussed. The proposed methodology is shown to provide a more meaningful optimization metric (for power-performance tradeoff analysis) and basis, with considerations of chip-level thermal management including maximum allowable operating temperature and packaging/cooling solutions. Furthermore, implications of CMOS technology scaling and parameter variations on the proposed methodology are discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Mutation-Based Validation Paradigm for High-Level Hardware Descriptions

    Page(s): 1499 - 1512
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (482 KB) |  | HTML iconHTML  

    We present a mutation-based validation paradigm (MVP) technology that can handle complete high-level microprocessor implementations and is based on explicit design error modeling, design error simulation, and model-directed test vector generation. We first present a control-based coverage measure that is aimed at exposing design errors that incorrectly set control signal values. We then describe MVP's high-level concurrent design error simulator that can handle various modeled design errors. We then present fundamental techniques and data structures for analyzing high-level circuit implementations and present various optimizations to speed up the processing of data structures and consequently speed up MVP's overall test generation process. We next introduce a new automatic test vector generation technique for high-level hardware descriptions that generates a test sequence by efficiently solving constraints on multiple finite state machines. To speed up the test generation, MVP is empowered by learning abilities via profiling various aspects of the test generation process. Our experimental results show that MVP's learning abilities and automated test vector generation effectiveness make MVP significantly better than random or pseudorandom validation techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Application of Power Optimized High-Speed CMOS Frequency Dividers

    Page(s): 1513 - 1520
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (645 KB) |  | HTML iconHTML  

    Frequencies in the gigahertz range translate switching activity and internal node capacitance quickly to high power values. Therefore, the power optimized design of high-speed CMOS logic-based frequency dividers is sensitive to circuit partitioning and selection of flip-flop-type and logic family. On the basis of two circuit examples, the design of highly power optimized dividers based on conventional CMOS logic is demonstrated. First, a divide-by-15 circuit based on sense-amplifier and master-slave flip-flops is discussed. A 5.5-GHz demonstrator implemented in a 90-nm low-power CMOS technology consumes only 190 muW/GHz for a supply voltage of 1.1 V. Second, an even faster CMOS divider concept without static power consumption, except leakage power, is proposed. The circuit divides an input signal by two and generates four phases with highly accurate phase skew of 90 deg. The maximum operation frequency is 11.6 GHz for a supply voltage of 1.5 V, slow process and worst case operation parameters. Higher frequencies can be achieved by a hybrid approach where the signal is first divided by a factor of two in a single current mode logic (CML) stage and then by the proposed circuit by another factor of two for the generation of the four phases. The divider is applied to dual modulus pre-scalers and IQ receivers. A variant of the circuit contains an intrinsic phase-rotator, allowing pre-scalers without any phase synchronization. Therewith, the power consumption is not only reduced due to the efficient divider implementation but also by a simplified architecture of the overall pre-scaler. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GlitchLess: Dynamic Power Minimization in FPGAs Through Edge Alignment and Glitch Filtering

    Page(s): 1521 - 1534
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1122 KB) |  | HTML iconHTML  

    This paper describes GlitchLess, a circuit-level technique for reducing power in field-programmable gate arrays (FPGAs) by eliminating unnecessary logic transitions called glitches. This is done by adding programmable delay elements to the logic blocks of the FPGA. After routing a circuit and performing static timing analysis, these delay elements are programmed to align the arrival times of the inputs of each lookup table (LUT), thereby preventing new glitches from being generated. Moreover, the delay elements also behave as filters that eliminate other glitches generated by upstream logic or off-chip circuitry. On average, the proposed implementation eliminates 87% of the glitching, which reduces overall FPGA power by 17%. The added circuitry increases the overall FPGA area by 6% and critical-path delay by less than 1%. Furthermore, since it is applied after routing, the proposed technique requires little or no modifications to the routing architecture or computer-aided design (CAD) flow. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Robust 4-PAM Signaling Scheme for Inter-Chip Links Using Coding in Space

    Page(s): 1535 - 1544
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2487 KB) |  | HTML iconHTML  

    Increasing demand for high-speed inter-chip interconnects requires faster links that consume less power. Channel coding can be used to lower the required signal-to-noise ratio for a specific bit error rate in a channel. There are numerous codes that can be used to approach the theoretical Shannon limit, which is the maximum information transfer rate of a communication channel for a particular noise level. However, the complexity of these codes prohibits their use in high-speed inter-chip applications. A low-complexity signaling scheme is proposed here. This method can achieve 3-5-dB coding gain over uncoded four-level pulse amplitude modulation (PAM). The receiver for this signaling scheme along with a regular 4-PAM receiver was designed and implemented in a 0.18-mum standard CMOS technology. Experimental results show that the receiver is functional up to 2.5 Gb/s. This was verified with a bit error rate tester (BERT) and we were able to achieve error free operation at 2.5-Gb/s channel transfer rate. The entire receiver for this scheme consumes 22 mW at 2.5 Gb/s and occupies an area of 0.2 mm 2. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability and Availability in Reconfigurable Computing: A Basis for a Common Solution

    Page(s): 1545 - 1558
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1105 KB) |  | HTML iconHTML  

    Dynamically reconfigurable SRAM-based field-programmable gate arrays (FPGAs) enable the implementation of reconfigurable computing systems where several applications may be run simultaneously, sharing the available resources according to their own immediate functional requirements. To exclude malfunctioning due to faulty elements, the reliability of all FPGA resources must be guaranteed. Since resource allocation takes place asynchronously, an online structural test scheme is the only way of ensuring reliable system operation. On the other hand, this test scheme should not disturb the operation of the circuit, otherwise availability would be compromised. System performance is also influenced by the efficiency of the management strategies that must be able to dynamically allocate enough resources when requested by each application. As those resources are allocated and later released, many small free resource blocks are created, which are left unused due to performance and routing restrictions. To avoid wasting logic resources, the FPGA logic space must be defragmented regularly. This paper presents a non-intrusive active replication procedure that supports the proposed test methodology and the implementation of defragmentation strategies, assuring both the availability of resources and their perfect working condition, without disturbing system operation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Clock-Skew Test Module for Exploring Reliable Clock-Distribution Under Process and Global Voltage-Temperature Variations

    Page(s): 1559 - 1566
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1143 KB) |  | HTML iconHTML  

    This paper presents a clock-skew test module for exploring reliable clock distribution under process, voltage, and temperature (PVT) variations. The proposed test module enables direct evaluation of the following two important issues: 1) the clock-skew process variations and 2) the robustness against race problems under environmental variations such as voltages and temperatures. The test module was fabricated by using a 90-nm low-power process for system-on-chip (SoC). It contains eight blocks including H-tree blocks and clock tree synthesis (CTS)-tree blocks (i.e., blocks formed by clock-tree synthesis), each of which has 1024 flip-flop (FF) pairs with small hold-time margins. A statistical method has been developed for analyzing the measured hold-time margins of the 1024 FF pairs for 80 chips. The example of the analysis for the measured results is presented, confirming the effectiveness of the proposed test module and analysis method toward reliable design of clock distribution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Elixir: High-Throughput Cost-Effective Dual-Field Processors and the Design Framework for Elliptic Curve Cryptography

    Page(s): 1567 - 1580
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1311 KB) |  | HTML iconHTML  

    We present a design framework that consists of a high-throughput, parallel, and scalable elliptic curve cryptographic (ECC) processor, and its cost-effectiveness methodology for the design exploration. A two-phase scheduling methodology is proposed to optimize the ECC arithmetic over both GF(p) and GF(2m). Based on the methodology, a parallel and scalable ECC architecture is also proposed. Our dual-field ECC architecture supports arbitrary elliptic curves and arbitrary finite fields with different field sizes. The optimization to a variety of applications with different area/throughput requirements can be achieved rapidly and efficiently. Using 0.13-mum CMOS technology, a 160-bit ECC processor core is implemented, which can perform elliptic-curve scalar multiplication in 340 mus over GF(p) and 155 mus over GF(2m), respectively. The comparison of speed and area overhead among different ECC designs justifies the cost-effectiveness of the proposed ECC architecture with its design methodology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Layout of Decoupling Capacitors in IP Blocks for 90-nm CMOS

    Page(s): 1581 - 1588
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2305 KB) |  | HTML iconHTML  

    On-chip decoupling capacitors (decaps) in the form of MOS transistors are widely used to reduce power supply noise. This paper provides guidelines for standard cell layouts of decaps for use within Intellectual Property (IP) blocks in application-specific integrated circuit (ASIC) designs. At 90-nm CMOS technology and below, a tradeoff exists between high-frequency effects and electrostatic discharge (ESD) reliability when designing the layout of such decaps. In this paper, the high-frequency effects are modeled using simple equations. A metric is developed to determine the optimal number of fingers based on the frequency response. Then, a cross-coupled design is described that has been recently introduced by cell library developers to handle ESD problems. Unfortunately, it suffers from poor response times due the large resistance inherent in its design. Improved cross-coupled designs are presented that properly balance issues of frequency response with ESD performance, while greatly reducing thin-oxide gate leakage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (25 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (29 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu