By Topic

VLSI, 2001. Proceedings. IEEE Computer Society Workshop on

Date 19-20 April 2001

Filter Results

Displaying Results 1 - 25 of 29
  • Proceedings IEEE Computer Society Workshop on VLSI 2001. Emerging Technologies for VLSI Systems

    Save to Project icon | Request Permissions | PDF file iconPDF (166 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 177
    Save to Project icon | Request Permissions | PDF file iconPDF (50 KB)  
    Freely Available from IEEE
  • A memory management approach for efficient implementation of multimedia kernels on programmable architectures

    Page(s): 171 - 176
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (664 KB) |  | HTML iconHTML  

    A methodology for power optimization of the data memory hierarchy and instruction memory, is introduced. The impact of the methodology on a set of widely used multimedia application kernels, namely Full Search (FS), Hierarchical Search (HS), Parallel Hierarchical One Dimension Search (PHODS), and Three Step Logarithmic Search (3SLS), is demonstrated. We find the power optimal data memory hierarchy applying the appropriate data-use transformation, while the instruction power optimization is done using suitable cache memory. Using data-reuse transformations, performance optimizations techniques, and instruction-level transformations, we perform exhaustive exploration of an the possible alternatives to reach power efficient solutions. Concerning the embedded processor ARM, the experimental results prove the efficiency of the methodology in terms of power for all the multimedia kernels View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application of output prediction logic to differential CMOS

    Page(s): 57 - 65
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (864 KB) |  | HTML iconHTML  

    We apply the output prediction logic (OPL) technique to the differential CMOS logic family. Including the effects of process, voltage and temperature (PVT) variations, we show that OPL differential CMOS is more than 40% faster than the single-rail OPL-dynamic logic family, and nearly 5 times faster than optimized static CMOS. We also demonstrate an OPL-differential 64:2 compressor that is 37% faster than the OPL-dynamic version. Finally, we show that OPL-differential is nearly twice as fast as differential domino View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Electronic nanotechnology and reconfigurable computing

    Page(s): 10 - 15
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (620 KB) |  | HTML iconHTML  

    Chemically assembled electronic nanotechnology (CAEN) is a promising alternative to CMOS for constructing circuits with feature sizes in the tens of nanometers range. In this paper we describe some of the recent advances in CAEN and how they influence the design of digital circuits. We show how reconfigurability supports inexpensive manufacturing. Finally, we describe a molecular latch that overcomes the lack of a viable CAEN-based transistor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A low-energy adaptive bus coding scheme

    Page(s): 118 - 122
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (644 KB) |  | HTML iconHTML  

    We have extracted run-time memory access traces from the Mediabench benchmark set. These traces exhibit a high degree of repetition. We propose an adaptive bus coding scheme that will reduce transition activity by exploiting value repetition. For this scheme, we introduce an extra bitline similar to bus-invert coding View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A pipelined LNS ALU

    Page(s): 155 - 161
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (736 KB) |  | HTML iconHTML  

    A new ALU design is proposed that is more economical than a conventional Logarithmic Number System (LNS) ALU for pipelined multiply-accumulate applications (such as FIR filters). A novel interpolator that accepts both positive and negative arguments allows rearrangement of the fixed-point adders that implement the LNS addition algorithm. The area for the resulting circuit is essentially the same as the traditional LNS approach, but the critical path for the proposed circuit is shorter, allowing a faster cycle time and/or a shorter latency. To make the advantages of the improved LNS ALU available to end users, new primitive operations (increment-multiply and multiply-increment-multiply) should be supported instead of the more traditional add and multiply-accumulate operations. The Verilog coding for such a novel increment-multiply module is given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-efficient link layer for wireless microsensor networks

    Page(s): 16 - 21
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (660 KB) |  | HTML iconHTML  

    Wireless microsensors are being used to form large, dense networks for the purposes of long-term environmental sensing and data collection. Unfortunately these networks are typically deployed in remote environments where energy sources are limited. Thus, designing fault-tolerant wireless microsensor networks with long system lifetimes can be challenging. By applying energy-efficient techniques at all levels of the system hierarchy, system lifetime can be extended. In this paper, energy-efficient techniques that adapt underlying communication parameters will be presented in the context of wireless microsensor networks. In particular, the effect of adapting link and physical layer parameters, such as output transmit power and error control coding, on system energy consumption will be examined View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • LUT-based FPGA technology mapping for power minimization with optimal depth

    Page(s): 123 - 128
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (620 KB) |  | HTML iconHTML  

    In this paper, we study the technology mapping problem for LUT-based FPGAs targeting power minimization. We present the PowerMap algorithm to generate a mapping solution to minimize power consumption while keeping the delay optimal. We compute min-height K-feasible cuts for critical nodes to optimize the depth and compute min-weight K-feasible cuts for noncritical nodes to minimize the power consumption of the mapping solution. We have implemented PowerMap in C and tested it on a number of MCNC benchmark circuits. Compared to FlowMap, a delay-optimal mapper, our algorithm reduces the power consumption by 17.8% and uses 9.4% less LUTs without any depth penalty View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structural design composition for C++ hardware models

    Page(s): 36 - 40
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (496 KB) |  | HTML iconHTML  

    This paper addresses the modeling of layout structure in high level C++ models. Researchers agree that the level of abstraction for integrated circuit design needs to be raised. New languages and methodologies are being proposed, most of them from the software engineering domain. However one of the fundamental hardware design challenges is often overlooked as push button synthesis solutions are sought: physical design predictability. In this paper we describe how C++ constructs should be used to capture structural and physical implementation concerns. Our explanation relies on the importance of the floorplan and component placement estimations at high levels of abstraction. We highlight how using object oriented mechanisms eases the structural modeling of circuit components, and we present a C++ class library design to specify these structural concerns View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VLIW scheduling for energy and performance

    Page(s): 111 - 117
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (868 KB) |  | HTML iconHTML  

    We present and evaluate several instruction scheduling algorithms that reorder a given sequence of instructions taking into account the energy considerations. We first compare a performance oriented scheduling technique with three energy-oriented instruction scheduling algorithms from both performance (execution cycles of the resulting schedules) and energy consumption points of view. Then, we propose scheduling algorithms that consider energy and performance at the same time. The results obtained using randomly generated directed acyclic graphs show that these techniques are quite successful in reducing energy consumption and their performance (in terms of execution cycles) is comparable to that of a pure performance-oriented scheduling View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A low power SIMD architecture for affine-based texture mapping

    Page(s): 129 - 132
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (408 KB) |  | HTML iconHTML  

    This paper presents a novel low power SIMD architecture for texture mapping using transformation. Low power has been achieved by exploring the properties of the affine transformation to reduce the computational cost. The architecture has been prototyped using 0.35 μm CMOS technology with three layers of metal. The proposed architecture can be used in video object motion tracking and texture warping processors View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 1-GSPS CMOS flash A/D converter for system-on-chip applications

    Page(s): 135 - 139
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (548 KB) |  | HTML iconHTML  

    This paper presents an ultrafast CMOS flash A/D converter design and performance. Although the featured A/D converter is designed in CMOS, the performance is compatible to that of GaAs technology currently available. To achieve high-speed in CMOS, the featured A/D converter utilizes the Threshold Inverter Quantization (TIQ) technique. A 6-bit TIQ based flash A/D converter was designed with the 0.25 μm standard CMOS technology parameter. It operates with sampling rates up to 1 GSPS, dissipates 66.87 mW of power at 2.5 V, and occupies 0.013 mm2 area. The proposed A/D converter is suitable for System-on-Chip (SoC) applications in wireless products and other ultra high speed applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating metastability in electronic circuits for random number generation

    Page(s): 99 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (292 KB) |  | HTML iconHTML  

    This paper presents a method for evaluating the metastability of a flip-flop circuit for random number generation applications. It is well known that digital circuits can exhibit metastable behavior when the input to a flip-flop is asynchronous to the system clock. In the past, extensive research has been focused on eliminating metastability in digital systems. Here, we present some preliminary results of our research to exploit metastable behavior in sequential logic circuits to produce random bit streams for random number generation. In particular, we explore the idea of tapping the electronic noise present in D-type flip-flops to produce random bit streams for use as a one-time cryptographic key-pad for encryption algorithms. This research will serve as a basis for further research into the very-large-scale-integration (VLSI) of random number generators (RNGs) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing register and phase requirements for synchronous circuits derived using software pipelining techniques

    Page(s): 71 - 77
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (752 KB) |  | HTML iconHTML  

    A method based on a modulo scheduling algorithm for software pipelining has been recently proposed to optimize clocked circuits. The resulting circuits are multi-phase clocked circuits, where all clocks have the same period. To preserve the functionality of the original circuit, registers must be placed after minimizing the clock period. The placement of these registers is derived from an arbitrary schedule determined during a clock period minimization step. A good schedule may allow one to decrease the number of registers and the number of phases needed in the final circuit. Decreasing the number of registers contributes to minimizing the area occupied by the circuit and reduces its power consumption; while decreasing the number of phases reduces the complexity of the clock generation and distribution task. In this paper, we propose polynomial-time-solvable methods to choose a good schedule once the clock period is minimized. The methods have been tested on a subject of the ISCAS89 benchmarks. Experimental results show that the number of registers which must be inserted in the final circuit, and the number of phases, have been significantly decreased compared to the case where an arbitrary schedule is chosen View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multi-PLL clock distribution architecture for gigascale integration

    Page(s): 30 - 35
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (720 KB) |  | HTML iconHTML  

    This paper proposes a new semi-distributed architecture for clock distribution that is suitable for gigascale integration. First, the limitations associated with conventional clock distribution networks are discussed. Next, some of the alternative solutions to the clock distribution problem are reviewed and compared in terms of architecture, power dissipation, clock inaccuracy, and ease of implementation. The compatibility of the alternatives with established design-for-testability and design-for-debuggability techniques is also evaluated. Then, the proposed architecture is introduced. It employs an array of phase-locked loops (PLLs) synchronized using digital feedback. The new architecture addresses the limitations associated with conventional clocking networks, but does not suffer from the practical shortcomings affecting the alternatives proposed so far View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards a very high bandwidth wireless battery powered device

    Page(s): 3 - 9
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (852 KB) |  | HTML iconHTML  

    We discuss the hardware and software challenges in building a 2 Mbit per second wireless battery powered communications device. Of primary importance is power dissipation. To achieve aggressive power targets, a host of new techniques are required at all levels of the design hierarchy. Techniques for parallelizing saturating arithmetic will become important because of the software optimizations they enable. Highly configurable programmable structures will enable multiprotocol SOC solutions. To program complex SOCs, new compiler techniques will be required. Hardware implementations will need to be intimately aware of these software techniques. In particular both signal processing code written in C and control code written in Java will drive new compilation techniques to enable broadband 3G wireless systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and implementation of a coarse-grained dynamically reconfigurable hardware architecture

    Page(s): 41 - 46
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (732 KB) |  | HTML iconHTML  

    This paper presents the hardware structure and application of a coarse-grained dynamically reconfigurable hardware architecture dedicated to wireless communication systems. The application tailored architecture, called DReAM (D_ynamically R_econfigurable Hardware A_rchitecture for M_obile Communication Systems), is a research project at the Darmstadt University of Technology. It covers the complete design process from analyzing the requirements for the dedicated application field, the specification and VHDL implementation of the architecture, up to the physical layout for the final chip. In the following we provide an overview of the major design stages, starting with a motivation for choosing the concept of distributed arithmetic in reconfigurable computing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transient fault sensitivity analysis of analog-to-digital converters (ADCs)

    Page(s): 140 - 145
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (660 KB) |  | HTML iconHTML  

    Reliability of systems used in space, avionic and biomedical applications is highly critical. Such systems consist of an analog front-end to collect data, an ADC to convert the collected data to digital form and a digital unit to process it. It is important to analyze the fault sensitivities of each of these to effectively gauge and improve the reliability of the system. This paper addresses the issue of fault sensitivity of ADCs. A generic methodology for analyzing the fault sensitivity of ADCs is presented. A novel concept of “node weights” specific to α-particle induced transient faults is introduced to increase the accuracy of such an analysis View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A linear threshold gate implementation in single electron technology

    Page(s): 93 - 98
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB) |  | HTML iconHTML  

    In this paper we focus on the design of threshold logic functions in Single Electron Tunneling (SET) technology, using the tunnel junction's specific behavior i.e., the ability to control the transport of individual electrons. We introduce a novel design of an n-input linear threshold gate which can accommodate both positive and negative weights and built-in signal amplification, using 1 tunnel junction and n+2 true capacitors. As an example we present a 4-input threshold gate with both positive and negative weights View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Current sensing techniques for global interconnects in very deep submicron (VDSM) CMOS

    Page(s): 66 - 70
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (432 KB) |  | HTML iconHTML  

    Sensing current instead of voltage provides an alternative to signaling on the long wires that are increasingly limiting the performance of CMOS as it scales into the VDSM regime (<0.25 μ). Current-mode techniques have been proposed for sensing bit-lines. We present a comparative study of Current-sensing with the optimal repeater insertion technique for wires from 0.35 cm to 1.75 cm in length. Simulation results using SPICE for 0.18 μ showed that current-sensing was faster and lower-power when compared to optimal repeater insertion technique. While the power dissipated by the optimal repeater circuit increased linearly with line length, power dissipated by the current-sensing circuit was almost constant for longer lines. Inductance had little effect on the differential current sensing technique View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Built-in self-testable data path synthesis

    Page(s): 78 - 84
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (788 KB)  

    In this paper, we describe a high-level data path allocation algorithm to facilitate built-in self test. It generates self-testable data path design while maximizing the sharing of modules and test registers. The sharing of modules and test registers enables only a small number of registers is modified for BIST, thereby decreasing the hardware area which is one of the major overheads for BIST technique. In our approach, both module allocation and register allocation are performed incrementally. In each iteration, module allocation is guided by a testability balance technique while register allocation aims at increasing the sharing degrees of registers. With a variety of benchmarks, we demonstrate the advantage of our approach compared with other conventional approaches View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A hybrid wave-pipelined network router

    Page(s): 165 - 170
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (736 KB) |  | HTML iconHTML  

    In this paper a novel hybrid wave-pipelined bit-pattern associative router is presented. A router is an important component in communication network systems. The bit-pattern associative router (BPAR) allows for flexibility and can accommodate a large number of routing algorithms. Wave-pipelining is a high performance approach which implements pipelining in logic without using intermediate registers. In this study a hybrid wave-pipelined approach has been proposed and implemented. Hybrid wave-pipelining allows for the reduction of the delay difference between the maximum and minimum delays by narrowing the gap between each stage of the system. This approach yields narrow “computing cones” that allow faster clocks to be run. This is the first study in wave-pipelining that deals with a system that has a substantially different set of pipeline stages. The bit-pattern associative router has three stages: condition match, selection function, and port assignment. Each stage's data delay paths are tightly controlled to optimize the proper propagation of signals. The simulation results show that using hybrid wave-pipelining significantly reduces the clock period and circuit delays become the limiting factor, preventing further clock cycle time reduction View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved power estimation for behavioral and gate level designs

    Page(s): 102 - 107
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB) |  | HTML iconHTML  

    A technique is presented for accurately computing the power of digital circuits described by behavioral- and gate-level designs. Accurate power estimation for high-level designs provides early warning of potential power problems, supporting design flexibility and a reduction of time and cost. The technique uses a behavioral VHDL specification or gate-level netlist as input. For a variety of combinational benchmark circuits, assuming the zero-delay model and uncorrelated primary inputs, the approach has been tested and compared with the Berkeley SIS power estimator. The proposed technique has been implemented in a program called the Behavioral Level Activity and Power Estimator (BLAPE). Experimental results demonstrate a savings in time with an average error less than 1.00% View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System design of low-energy wearable computers with wireless networking

    Page(s): 25 - 29
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (536 KB) |  | HTML iconHTML  

    The paper describes a system level design approach to the wearable computers and wireless networks project at Carnegie Mellon University (CMU). Over the last almost ten years we have designed and fabricated twenty new generations of wearable computers, with most of them using wireless network infrastructure. We emphasize the importance of wireless communication and the amount of energy it requires. A system-level approach to power/performance optimization is going to be a crucial catalyst for making wearable computers an everyday tool for the general public View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.