By Topic

Advanced Research in VLSI, 1995. Proceedings., Sixteenth Conference on

Date 27-29 March 1995

Filter Results

Displaying Results 1 - 25 of 33
  • Proceedings. Sixteenth Conference on Advanced Research in VLSI

    Publication Year: 1995
    Save to Project icon | Request Permissions | PDF file iconPDF (171 KB)  
    Freely Available from IEEE
  • Author index

    Publication Year: 1995
    Save to Project icon | Request Permissions | PDF file iconPDF (51 KB)  
    Freely Available from IEEE
  • Bit-serial bidirectional A/D/A conversion

    Publication Year: 1995 , Page(s): 108 - 120
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (520 KB)  

    A fault-tolerant VLSI architecture implementing a bi-directional bit-serial A/D/A (analog-to-digital and digital-to-analog) converter is presented. Both functions of algorithmic D/A conversion and successive approximation A/D conversion are combined into a single device, converting bits in the order from most to least significant. The MSB-first order allows for robust implementation, relatively insensitive to component mismatches, offsets and nonlinearities. Also, since the A/D conversion makes use of the intermediate D/A conversion results, matched monotonic characteristics are obtained in both directions of conversion. The final D/A result is available at the end of A/D conversion, and can be used directly in applications calling for analog quantization. More general use of the A/D/A converter allows for bi-directional read/write digital access to local analog information in VLSI. The cell supports dense integration of low-power data conversion units along with digital processors or sensory circuitry in a standard CMOS process. Experimental results from a prototype VLSI implementation are included. Including control logic, the A/D/A cell measures 216 μm×315 μm in a 2 μm CMOS process, and achieves 8-bit untrimmed monotonicity at 200 μW power consumption for a 20 μsec conversion cycle View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • OPTIMUS: a new program for OPTIMizing linear circuits with number-splitting and shift-and-add decompositions

    Publication Year: 1995 , Page(s): 258 - 271
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    Most behavioral synthesis tools perform limited architectural transformations to optimize hardware and power. Previously, researchers have proposed decomposition of multiplications into shifts and adds to achieve average savings of 2.5 times in hardware. In this paper, we propose a new program called OPTIMUS and related algorithms, that combine an architectural transformation procedure called number-splitting with shift-and-add decomposition to obtain up to an additional 2 fold savings, giving a factor of up to 5 savings in overall hardware. The number-splitting transformation changes the circuit interconnections and the descriptions of constant multipliers. The scheme is based on numerical matrix transformation algorithms that allow a given matrix to be expressed as the product of several matrices while maintaining numerical accuracy View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy recovery for low-power CMOS

    Publication Year: 1995 , Page(s): 415 - 429
    Cited by:  Papers (9)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (692 KB)  

    Energy recovery, as a means to trade off power dissipation for performance in CMOS logic circuits, is analyzed and investigated. A mathematical model is presented to estimate the efficiency for two energy-recovery approaches under varying conditions of voltage swing, transition time, and MOS device parameters. This model can be directly compared to the well-known model for supply-voltage scaling, which is the prevalent method for trading power dissipation for performance. The two models are evaluated against SPICE simulations. Excluding body effects, which would not be present in CMOS process technologies such as Silicon-On-Insulator (SOI), the simulations and the equations agree to within 10%. The simulations also indicate that energy recovery, when implemented with circuit techniques such as bootstrapping, can significantly outperform the supply-voltage-scaled approach across a wide range of operating frequencies. To further investigate this result, two eight-bit adder designs, one based on supply-voltage scaling and the other on energy recovery, are simulated and compared View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic CMOS circuit techniques for delay and power reduction in parallel adders

    Publication Year: 1995 , Page(s): 121 - 130
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (404 KB)  

    The successful design of high-speed parallel adders depend mainly on fast calculation of carry signals. A technique based on combining Manchester-Carry chains (MCC) with Clock-and-Data pre-charged dynamic logic blocks (CDPD) is suggested and analysed. This technique, as well as pure MCC and CDPD techniques, was incorporated into the design of carry calculation trees. Simulations indicate that 11-25% decrease of delay at the same time as a 19-29% reduction of power consumption is made possible by combining MCCs with CDPD gates instead of using trees consisting solely of either MCCs of CDPD gates View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Silicon VLSI processing architectures incorporating integrated optoelectronic devices

    Publication Year: 1995 , Page(s): 17 - 27
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB)  

    Integrated optoelectronic interconnects offer a potentially lower cost, higher density alternative to wire-based technologies for I/O and inter-chip communication. This paper outlines two systems being designed at Georgia Tech which incorporate integrated thin film optoelectronic devices onto high throughput VLSI digital processors. The first system places an array of thin film detectors on top of SIMD processing elements allowing direct area connections between sensors and processors. This allows extremely fast frame processing rates (1-10 thousand frames per second) which are required in high speed and scanned imaging systems. The second system presented incorporates inter-chip IR optoelectronic channels which pass transparently through silicon. These links allow communication between three dimensionally stacked chips supporting high throughput interconnect topologies. This paper demonstrates the potential of optoelectronic integrated VLSI systems for providing extremely dense and lightweight solutions in applications such as image processing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient retiming under a general delay model

    Publication Year: 1995 , Page(s): 368 - 382
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (716 KB)  

    The polynomial-time retiming algorithms that were developed in the eighties assumed simple delay models that neglected several timing issues that arise in logic design. Recent retiming algorithms for more comprehensive delay models rely on non-linear formulations and run in worst-case exponential time using branch-and-bound techniques. In this paper, we investigate the retiming problem for edge-triggered circuits under a general delay model that handles load-dependent gate delays, register delays, interconnect delays, and clock skew. We show that in this model the retiming problem can be expressed as a set of integer linear programming constraints that can be solved using general ILP techniques. For the special case where clock skew is monotonic and all registers have equal propagation delays, we give an integer phonotonic programming formulation of the retiming problem, and we present an efficient algorithm for solving it. Our algorithm retimes any given edge-triggered circuit to achieve a specified clock period in O(V3 F) steps, where V is the number of logic gales in the circuit and F is bounded by the number of registers in the circuit. A straightforward extension of our algorithm determines a minimum clock period retiming in O(V3Flg V) steps View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combined DRAM and logic chip for massively parallel systems

    Publication Year: 1995 , Page(s): 4 - 16
    Cited by:  Papers (23)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB)  

    A new 5 V 0.8 μm CMOS technology merges 100 K custom circuits and 4.5 Mb DRAM onto a single die that supports both high density memory and significant computing logic. One of the first chips built with this technology implements a unique Processor-In-Memory (PIM) computer architecture termed EXECUBE and has 8 separate 25 MHz CPU macros and 16 separate 32 K×9 b DRAM macros on a single die. These macros are organized together to provide a single part type for scaleable massively parallel processing applications, particularly embedded ones where minimal glue logic is desired. Each chip delivers 50 Mips of performance at 2.7 W. This paper overviews the basic chip technology and organization some projections on the future of EXECUBE-like PIM chips, and finally some lessons to be learned as to why this technology should radically affect the way we ought think about computer architecture View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low latency self-timed flow-through FIFOs

    Publication Year: 1995 , Page(s): 76 - 90
    Cited by:  Papers (11)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (788 KB)  

    Self-timed flow-through FIFOs are constructed easily using only a single C-element as control for each stage of the FIFO. Throughput can be very high in this type of FIFO as the communication required to send new data to the FIFO is local to only the first element of the FIFO. Circuit density can also be high because the control overhead is very small. However because data must travel through every cell in the FIFO when moving from input to output, latencies can be long. This paper describes some alternative approaches to building self-timed flow-through FIFOs that reduce the latency while retaining the high throughput and relative simplicity of a flow-through design. Five designs are presented: a standard linear flow-through FIFO in which the data pass through every latch in the FIFO, a parallel FIFO in which data are delivered in turn to a set of parallel flow-through FIFOs, a tree FIFO in which data are fanned out into a tree of simple FIFOs, a square FIFO in which the tree is organized as a square array to achieve better layout packing, and a folded FIFO in which data will try to skip as many of the empty FIFO cells as possible to find the shortest path to the output View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HAL: heuristic algorithms for layout synthesis

    Publication Year: 1995 , Page(s): 185 - 199
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (952 KB)  

    This paper describes graph theory based algorithms for layout synthesis of leaf cells. A new layout style termed 1-1/2-d layout style is used for the layouts. The transistors are aligned based on common poly gates or common circuit nodes between two sets of transistors. The two sets of transistors can be a set of PMOS transistors and a set of NMOS transistors, or both the sets can be formed by similar types of transistors. This layout style and the choice of transistor sets provide a unique capability of making efficient use of the layout area for circuits with a large difference in the number of PMOS and NMOS transistors. The algorithms can thus be used to form symbolic layouts for a general class of CMOS circuits, e.g., static dual type of circuitry or static CMOS circuitry with non-dual pullup and pulldown networks and dynamic logic styles (e.g., CPL, Domino, etc.). The algorithms have been implemented in GENIE (Mentor Graphics). In spite of possessing the extra features not usually found in the other algorithms in the literature, these algorithms provide extremely competitive results when compared to the handcrafted layouts and other algorithms in the literature. These algorithms are not only quite flexible in supporting various circuit styles, but are also run time efficient View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A technique for high-speed, fine-resolution pattern generation and its CMOS implementation

    Publication Year: 1995 , Page(s): 131 - 148
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (580 KB)  

    This paper presents an architecture for generating a high-speed data pattern with precise edge placement (resolution) by using the matched delay technique. The technique involves passing clock and data signals through arrays of matched delay elements in such a way that the data rate and resolution of the generated data stream are controlled by the difference of these matched delays. This difference can be made much smaller than an absolute gate delay. Since the resolution of conventional designs is determined by these absolute delays, the matched delay technique yields a much finer resolution than traditional methods and, in addition, generates high data rate patterns without the need of a high-speed clock. The matched delay technique lends itself to high-precision and high-speed applications such as fast network interfaces or test pattern generators. This paper also describes a matched delay data generator submitted for fabrication in a MOSIS 1.2 μm CMOS technology. This implementation used biased delay elements to internally compensate for temperature and process variations. Simulations indicate the implementation described in this paper can generate data signals with on-chip bit rates of 833 Mb/s and resolutions of 100 ps View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multi-sender asynchronous extension to the AER protocol

    Publication Year: 1995 , Page(s): 158 - 169
    Cited by:  Papers (20)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (476 KB)  

    The address-event representation (AER) is an asynchronous point-to-point communications protocol for silicon neural systems. This paper describes an extension of the AER protocol that allows multiple AER senders to share a common bus. A fully-functional silicon implementation of the extended protocol is described, as well as a functional board-level system of several of these chips sharing a common bus View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Abacus: a 1024 processor 8 ns SIMD array

    Publication Year: 1995 , Page(s): 28 - 40
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (700 KB)  

    Describes the Abacus machine at a number of levels. Presents the microarchitecture of the PE comprising the reconfigurable bit-parallel array, a set of arithmetic and communication primitives, details of the VLSI implementation, and system-level design issues of a high-speed SIMD array. The most concrete goal of the Abacus project was to design and build a machine that could be used by members of the MIT Artificial Intelligence Laboratory for real-time early vision processing. Along the way, we explored several architectural ideas View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Non-dissipative rail drivers for adiabatic circuits

    Publication Year: 1995 , Page(s): 404 - 414
    Cited by:  Papers (8)  |  Patents (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB)  

    Energy dissipation of CMOS circuits is becoming a major concern in the design of digital systems. Earlier, we presented a new form of CMOS charge recovery logic (SCRL), with an energy dissipation per operation that falls linearly with operating frequency, as opposed to the constant energy required for conventional CMOS circuits. These SCRL circuits, along with most adiabatic circuit techniques proposed to date, require a set of gradually swinging power supply rails that in effect force all charge transfers within the system to occur quasistatically. Proposals to date for generating these swinging rails have relied on a power MOSFET to gate the oscillation of an inductor, forming an RLC circuit. Even under ideal conditions, dissipation in this MOSFET degraded the overall energy savings of SCRL circuits from 1/T dependence to 1/√T. SCRL and other adiabatic circuits thus exhibited inferior overall energy saving performance when compared with supply voltage scaling of conventional CMOS circuits. In this paper, we present a technique for generating the required rail waveforms without the series power MOSFET to gate the inductor. This new rail driver circuit relies on adding multiple harmonics of the base frequency to generate a rail waveform of any desired shape. Our Harmonic Rail Driver (HRD) can be built using only passive reactive components or by using correctly trimmed transmission line segments. It is non-dissipative to within the achievable Q's of these components. Using HRDs to power and control SCRL circuits, we restore the overall dissipation of SCRL circuits to its attractive 1/T dependence View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Quasi-algebraic decompositions of switching functions

    Publication Year: 1995 , Page(s): 358 - 367
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (684 KB)  

    Brayton (1982-90) and others have developed a rich theory of decomposition of switching functions based on algebraic manipulations of monomials. In this theory, a product g(Xg)·h(Xh ) is algebraic if Xg∩Xh=Ø. There are efficient methods for determining if a function has an algebraic product. If a function does not have an algebraic product, then there are good methods for obtaining a decomposition of the form f=g·h+r where g·h is an algebraic product. Algebraic decompositions have the desirable properties that they are canonical and preserve testability. In this paper we generalize the concept of an algebraic product to decompositions of the form f(X)=g(Xg)??h(Xh) where ?? is any binary Boolean operation and |Xg∩Xh|=k for some k⩾0. We call these decompositions quasi-algebraic decompositions. We begin by showing that we may restrict ourselves to the case where ?? is +(sum),·(product) or ⊕ (enclusive-or). We then give necessary and sufficient conditions for a function to have a quasi-algebraic decomposition for a given Xg and Xh. If a function has such a decomposition we show how to determine the functions g and h in a canonical manner. We also show that these decompositions are fully SSL testable. Finally, using standard benchmark circuits, we show that quasi algebraic decompositions occur often and are useful in reducing circuit size View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An evaluation of bipartitioning techniques

    Publication Year: 1995 , Page(s): 383 - 402
    Cited by:  Papers (24)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1188 KB)  

    Logic partitioning is an important issue in VLSI CAD, and has been an active area of research for at least the last 25 years. Numerous approaches have been developed and many different techniques have been combined for a wide range of applications. In this paper, we examine many of the existing techniques for logic bipartitioning and present a methodology for determining the best mix of approaches. The result is a novel bipartitioning algorithm that includes both new and pre-existing techniques. Our algorithm produces results that are at least 17% better than the state-of-the-art while also being efficient in run time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Single-transistor transparent-latch clocking

    Publication Year: 1995 , Page(s): 331 - 341
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (496 KB)  

    We propose a single-phase clocking scheme for CMOS VLSI designs in which the traditional master-slave data-latches are replaced with transparent-latches where each transparent-latch is implemented using a single NMOS transistor. The clocking scheme places a constraint on the allowable width of the clock pulses which can be satisfied by a clock driver that is integrated with a dynamic buffer. Our example shows that the power dissipation of the single-transistor latch can be 80% less than the two-phase static flip-flop and 70% less than the true single-phase latch. The low power dissipation of the single-transistor latch can therefore be used to improve the gain in architecture-driven voltage scaling where one reduces the supply voltage to reduce power dissipation and applies pipelining to compensate for the reduced speed. In our example, the fraction of power dissipation due to the overhead of the pipelining latches for the single-transistor latch is only 4.7%, versus 15% and 22% for the true single-phase latch and the two-phase static flip-flop, respectively. The single-transistor latch is also very small, which can have a major impact in reducing the area of latch-intensive architectures such as filter structures used in digital signal processing. Our example of a fixed-coefficient transposed-form FIR filter shows that we can reduce the area by 20% in comparison to designs using the true-single-phase latch View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 590,000 transistor 48,000 pixel, contrast sensitive, edge enhancing, CMOS imager-silicon retina

    Publication Year: 1995 , Page(s): 225 - 240
    Cited by:  Papers (45)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (896 KB)  

    We present an experimental analog VLSI focal plane processor for the phototransduction, local gain control and edge enhancement of natural images. The single chip system incorporates 590,000 transistors in 48,000 pixels, and it has been fabricated on a 9.5×9.3 mm die in a 1.2 μm n-well double metal, double poly, digital oriented CMOS technology. The organization of the system abstracts from the structure and function of the vertebrate distal retina. The adopted design style, current-mode subthreshold CMOS using circuits of minimal complexity offers the possibility of ultra low power dissipation and area efficiency, commensurate with VLSI integration View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Code density optimization for embedded DSP processors using data compression techniques

    Publication Year: 1995 , Page(s): 272 - 285
    Cited by:  Papers (27)  |  Patents (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (608 KB)  

    We address the problem of code size minimization in VLSI systems with embedded DSP processors. Reducing code size reduces the production cost of embedded systems. We use data compression methods to develop code size minimization strategies. We present a framework for code size minimization where the compressed data consists of a dictionary and a skeleton. The dictionary can be computed using popular text compression algorithms. We describe two methods to execute the compressed code that have varying performance characteristics and varying degrees of freedom in compressing the code. Experimental results obtained with a TMS320C25 code generator are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization of combinational and sequential logic circuits for low power using precomputation

    Publication Year: 1995 , Page(s): 430 - 444
    Cited by:  Papers (11)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (576 KB)  

    Precomputation is a recently proposed logic optimization technique which selectively disables the inputs of a sequential logic circuit, thereby reducing switching activity and power dissipation, without changing logic functionality. In this paper, we present new precomputation architectures for both combinational and sequential logic and describe new precomputation-based logic synthesis methods that optimize logic circuits for low power. We present a general precomputation architecture for sequential logic circuits and show that it is significantly more powerful than the architectures previously treated in the literature. In this architecture, output values required in a particular clock cycle are selectively precomputed one clock cycle earlier, and the original logic circuit is “turned off” in the succeeding clock cycle. The very power of this architecture makes the synthesis of precomputation logic a challenging problem and we present a method to automatically synthesize precomputation logic for this architecture. We introduce a powerful precomputation architecture for combinational logic circuits that uses transmission gates or transparent latches to disable parts of the logic. Unlike in the sequential circuit architecture, precomputation occurs in an early portion of a clock cycle, and parts of the combinational logic circuit are “turned off” in a later portion of the same clock cycle. Further we are not restricted to perform precomputation on the primary inputs. Preliminary results obtained using the described methods are presented. Up to 66 percent reductions in switching activity and power dissipation are possible using the proposed architectures. For many examples, the proposed architectures result in significantly less power dissipation than previously developed methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic synthesis of gate-level timed circuits with choice

    Publication Year: 1995 , Page(s): 42 - 58
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (888 KB)  

    This paper presents a CAD tool for the automatic synthesis of gate-level timed circuits from general specifications to basic gates such as AND gates, OR gates, and C-elements. Timed circuits are a class of asynchronous circuits that incorporate explicit timing information in the specification which is used throughout the synthesis procedure to optimize the design. Our procedure begins with a textual specification capable of specifying conditional operation, or choice. This specification is systematically transformed to a graphical representation which can be analyzed using an exact and efficient timing analysis algorithm to find the reachable stale space. From this state space, a timed circuit that is hazard-free at the gate-level is derived, facilitating the use of semi-custom components, such as standard-cells and gate-arrays. Because timing information is used to guide the synthesis to reduce circuit complexity while guaranteeing correct operation, the resulting timed circuit implementations are up to 40 percent smaller and 50 percent faster than those produced using other design methodologies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithms for the optimal state assignment of asynchronous state machines

    Publication Year: 1995 , Page(s): 59 - 75
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (856 KB)  

    This paper presents a method for the optimal state assignment of asynchronous state machines. Unlike state assignment for synchronous state machines, state codes must be chosen carefully to ensure the avoidance of critical races and logic hazards. Two related problems are considered: (i) optimal critical race-free state assignment; and (ii) optimal hazard-free and critical race-free state assignment for normal fundamental mode machines. Analogous to a paradigm successfully used for the optimal state assignment of synchronous machines each problem is formulated as an input encoding problem. Solutions are targeted to sum-of-products implementations. Initial results indicate output logic improvements up to 20% for the hazard-free algorithm, and more modest improvement for the optimal critical race-free algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recursive layout generation

    Publication Year: 1995 , Page(s): 172 - 184
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1044 KB)  

    We present a recursive method for generating layout for VLSI chips based on integrating layout directives in the netlist description. The method allows seamless integration of hand-drawn and synthesized layout, so that hand layout need only be used where the increase in density is justified. Layout is generated automatically with predictable results; small changes in the source result in small changes of the overall layout. The system is versatile enough to build dense BiCMOS VLSI microprocessor chips automatically View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Array-of-arrays architecture for parallel floating point multiplication

    Publication Year: 1995 , Page(s): 150 - 157
    Cited by:  Papers (1)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB)  

    This paper presents a new architecture style for the design of a parallel floating point multiplier. The proposed architecture is a synergy of trees and arrays. Architectural models were designed to implement the 53-bit mantissa path of the IEEE standard 754 for floating point multiplication, and tested for functionality in Verilog. The design, which was done in dual-rail domino, simulated in HSpice with estimated capacitive load models in a 1 μm CMOS technology. Multiplication latency of 10 ns (23.3 FO4) at 4.3 V supply and 120°C can be achieved with the best topology of the array-of-arrays architecture. The estimated multiplier area is 3 mm×6 mm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.