By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 8 • Date Aug. 2006

Filter Results

Displaying Results 1 - 21 of 21
  • Table of contents

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (41 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • Guest Editorial

    Page(s): 789 - 790
    Save to Project icon | Request Permissions | PDF file iconPDF (331 KB)  
    Freely Available from IEEE
  • Retargetable pipeline hazard detection for partially bypassed processors

    Page(s): 791 - 801
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1059 KB) |  | HTML iconHTML  

    Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost, and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of operation tables (OTs) that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. OTs integrate the detection of all kinds of pipeline hazards in a unified framework, and can, therefore, be easily deployed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor running embedded applications from the MiBench suite, demonstrate that accurate pipeline hazard detection by OTs can result in up to 20% performance improvement over the best performing GCC generated code. Finally, we demonstrate the usefulness of OTs over various bypass configurations of the Intel XScale View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Overlay techniques for scratchpad memories in low power embedded processors

    Page(s): 802 - 815
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1618 KB) |  | HTML iconHTML  

    Energy consumption is one of the important parameters to be optimized during the design of portable embedded systems. Thus, most of the contemporary portable devices feature low-power processors coupled with on-chip memories (e.g., caches, scratchpads). Scratchpads are better than traditional caches in terms of power, performance, area, and predictability. However, unlike caches they depend upon software allocation techniques for their utilization. In this paper, we present scratchpad overlay techniques which analyze the application and insert instructions to dynamically copy both variables and code segments onto the scratchpad at runtime. We demonstrate that the problem of overlaying scratchpad is an extension of the Global Register Allocation problem. We present optimal and near-optimal approaches for solving the scratchpad overlay problem. The near-optimal scratchpad overlay approach achieves close to the optimal results and is significantly faster than the optimal approach. Our approaches improve upon the previously known static allocation technique for assigning both variables and code segments onto the scratchpad. The evaluation of the approaches for ARM7 processor reports, average energy, and execution time reductions of 26% and 14% over the static approach, respectively. Additional experiments comparing the overlayed scratchpads against unified caches of the same size, report average energy, and execution time savings of 20% and 10%, respectively. We also report data memory energy reductions of 45%-57% due to the insertion of a 1024-bytes scratchpad memory in the memory hierarchy of a digital signal processor (DSP) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting statistical information for implementation of instruction scratchpad memory in embedded system

    Page(s): 816 - 829
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1484 KB) |  | HTML iconHTML  

    A method to both reduce energy and improve performance in a processor-based embedded system is described in this paper. Comprising of a scratchpad memory instead of an instruction cache, the target system dynamically (at runtime) copies into the scratchpad code segments that are determined to be beneficial (in terms of energy efficiency and/or speed) to execute from the scratchpad. We develop a heuristic algorithm to select such code segments based on a metric, called concomitance. Concomitance is derived from the temporal relationships of instructions. A hardware controller is designed and implemented for managing the scratchpad memory. Strategically placed custom instructions in the program inform the hardware controller when to copy instructions from the main memory to the scratchpad. A novel heuristic algorithm is implemented for determining locations within the program where to insert these custom instructions. For a set of realistic benchmarks, experimental results indicate the method uses 41.9% lower energy (on average) and improves performance by 40.0% (on average) when compared to a traditional cache system which is identical in size View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Expression equivalence checking using interval analysis

    Page(s): 830 - 842
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (959 KB) |  | HTML iconHTML  

    Arithmetic expressions are the fundamental building blocks of hardware and software systems. An important problem in computational theory is to decide if two arithmetic expressions are equivalent. However, the general problem of equivalence checking, in digital computers, belongs to the NP Hard class of problems. Moreover, existing general techniques for solving this decision problem are applicable to very simple expressions and impractical when applied to more complex expressions found in programs written in high-level languages. In this paper, we propose a method for solving the arithmetic expression equivalence problem using partial evaluation. In particular, our technique is specifically designed to solve the problem of equivalence checking of arithmetic expressions obtained from high-level language descriptions of hardware/software systems. In our method, we use interval analysis to substantially prune the domain space of arithmetic expressions and limit the evaluation effort to a sufficiently limited set of subspaces. Our results show that the proposed method is fast enough to be of use in practice View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Probabilistic delay budget assignment for synthesis of soft real-time applications

    Page(s): 843 - 853
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (995 KB) |  | HTML iconHTML  

    Unlike their hard real-time counterparts, soft real-time applications are only expected to guarantee their "expected delay" over input data space. This paradigm shift calls for customized statistical design techniques to replace the conventional pessimistic worst case analysis methodologies. We present a novel statistical time-budgeting algorithm to translate the application expected delay constraint into its components' local delay constraints. We utilize the mathematical properties of the problem to quickly calculate the system expected delay and incrementally estimate the component utility variation with its timing relaxation. Our algorithm determines the optimal maximum weighted timing relaxation of an application under expected delay constraint. Experimental results on core-based synthesis of several multimedia applications targeting field-programmable gate arrays show that our technique always improves the design area. Furthermore, it consistently outperforms optimal time budgeting under hard real-time constraint, which is the best existing competitor. Design area improvements were up to 26% and averaged about 17% on several MediaBench applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SHIM: a deterministic model for heterogeneous embedded systems

    Page(s): 854 - 867
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (587 KB) |  | HTML iconHTML  

    Typical embedded hardware/software systems are implemented using a combination of C and an HDL such as Verilog. While each is well-behaved in isolation, combining the two gives a nondeterministic model of computation whose ultimate behavior must be validated through expensive (cycle-accurate) simulation. We propose an alternative for describing such systems. Our software/hardware integration medium (shim) model, effectively Kahn networks with rendezvous communication, provides deterministic concurrency. We present the Tiny-shim language for such systems and its semantics, demonstrate how to implement it in hardware and software, and discuss how it can be used to model a real-world system. By providing a powerful, deterministic formalism for expressing systems, designing systems, and verifying their correctness will become easier View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scenario-oriented design for single-chip heterogeneous multiprocessors

    Page(s): 868 - 880
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1161 KB)  

    Single-chip heterogeneous multiprocessors (SCHMs) are arising to meet the computational demands of portable and handheld devices. These computing systems are not fully custom designs traditionally targeted by the design automation community, general-purpose designs traditionally targeted by the computer architecture community, nor pure embedded designs traditionally targeted by the real-time community. An entirely new design philosophy will be needed for this hybrid class of computing. The programming of the device will be drawn from a narrower set of applications with execution that persists in the system over a longer period of time than for general-purpose programming. However, the devices will still be programmable, not only at the level of the individual processing element, but across multiple processing elements and even the entire chip. The design of other programmable single chip computers has enjoyed an era where the design tradeoffs could be captured in simulators such as SimpleScalar and performance could be evaluated to the SPEC benchmarks. Motivated by this, we describe new benchmark-based design strategies for SCHMs which we refer to as scenario-oriented design. We include an example and results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An automated design tool for analog layouts

    Page(s): 881 - 894
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2228 KB) |  | HTML iconHTML  

    In this paper, a layout synthesis tool for the design of analog integrated circuits (ICs) is presented. This tool offers great flexibility that allows analog circuit designers to bring their special design knowledge and experiences into the synthesis process to create high-quality analog circuit layouts. Different from conventional layout systems that are limited to the optimization of single devices, our layout generation tool attempts to optimize more complex modules. This tool includes a complete tool suite that covers the following three major analog physical designs stages. 1) Module Generation: designers can develop and maintain their own technology- and application-independent module generators for subcircuits using an in-house developed description language. 2) Placement: a two-stage placement technique, tailored for the analog placement design, is proposed. In particular, this placement algorithm features a novel genetic placement stage followed by a fast simulated reannealing scheme. 3) Routing: the minimum-Steiner-tree-based global routing is developed, and it is actually integrated into the placement procedure to improve reliability and routability of the placement solutions. Following the global routing, a compaction-based constructive detailed routing finally completes the interconnection of the entire layout. Several testing circuits have been applied to demonstrate the design efficiency and the effectiveness of this tool. Experimental results show that this new layout tool is capable of producing high quality layouts comparable to those manually done by layout experts but with much less design time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-power high-performance nand match line content addressable memories

    Page(s): 895 - 905
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1375 KB) |  | HTML iconHTML  

    Content addressable memory (CAM) is used in fully associative VLSI lookup circuits for cache memory, translation lookaside buffers (TLBs), and in Internet Protocol (IP) address comparison. In this paper, the use of dynamic nand match lines is investigated and compared to conventional nor match lines in cache applications. To achieve high speed, a hierarchical match line is used. The dynamic stack charge-sharing noise immunity, speed, and actual power savings over a conventional nor match line are investigated using benchmark data. While random patterns are often used when benchmarking CAM match line power, it is shown to be optimistic compared to address trace data, since addresses are highly correlated. It is also shown that proper address input ordering can provide additional power savings. A simple model for the power in dynamic circuit stacks is derived and compared to power simulation data. Additionally, impact on cache tag area and the scaling of the nand stack to future processes is described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Placement for large-scale floating-gate field-programable analog arrays

    Page(s): 906 - 910
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (289 KB) |  | HTML iconHTML  

    Modern advances in reconfigurable analog technologies are allowing field-programmable analog arrays (FPAAs) to dramatically grow in size, flexibility, and usefulness. Our goal in this paper is to develop the first placement algorithm for large-scale floating-gate-based FPAAs with a focus on the minimization of the parasitic effects on interconnects under various device-related constraints. Our FPAA clustering algorithm first groups analog components into a set of clusters so that the total number of routing switches used is minimized and all IO paths are balanced in terms of routing switches used. Our FPAA placement algorithm then maps each cluster to a computational analog block (CAB) of the target FPAA while focusing on routing switch usage and balance again. Experimental results demonstrate the effectiveness of our approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Virtual memory window for application-specific reconfigurable coprocessors

    Page(s): 910 - 915
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (484 KB) |  | HTML iconHTML  

    The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New degree computationless modified euclid algorithm and architecture for Reed-Solomon decoder

    Page(s): 915 - 920
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (302 KB) |  | HTML iconHTML  

    This paper proposes a new degree computationless modified Euclid (DCME) algorithm and its dedicated architecture for Reed-Solomon (RS) decoder. This architecture has low hardware complexity compared with conventional modified Euclid (ME) architectures, since it can completely remove the degree computation and comparison circuits. The architecture employing a systolic array requires only the latency of 2t clock cycles to solve the key equation without initial latency. In addition, the DCME architecture using 3t+2 basic cells has regularity and scalability since it uses only one processing element. Hence, the proposed DCME architecture provides the short latency and low-cost RS decoding. The DCME architecture has been synthesized using the 0.25-mum Faraday CMOS standard cell library and operates at 200 MHz. The gate count of the DCME architecture is 21 760. Hence, the RS decoder using the proposed DCME architecture can reduce the total gate count by at least 23% and the total latency to at least 10% compared with conventional ME decoders View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2007 IEEE International Symposium on Circuits and Systems (ISCAS 2007)

    Page(s): 921
    Save to Project icon | Request Permissions | PDF file iconPDF (746 KB)  
    Freely Available from IEEE
  • Explore IEL IEEE's most comprehensive resource [advertisement]

    Page(s): 922
    Save to Project icon | Request Permissions | PDF file iconPDF (341 KB)  
    Freely Available from IEEE
  • IEEE order form for reprints

    Page(s): 923
    Save to Project icon | Request Permissions | PDF file iconPDF (354 KB)  
    Freely Available from IEEE
  • Have you visited lately? www.ieee.org [advertisement]

    Page(s): 924
    Save to Project icon | Request Permissions | PDF file iconPDF (220 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (25 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems Information for authors

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (29 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu