By Topic

Solid-State Circuits, IEEE Journal of

Issue 11 • Date Nov. 2001

Filter Results

Displaying Results 1 - 25 of 28
  • Introduction to the memory section

    Page(s): 1699
    Save to Project icon | Request Permissions | PDF file iconPDF (146 KB)  
    Freely Available from IEEE
  • Introduction to the signal processing section

    Page(s): 1756 - 1757
    Save to Project icon | Request Permissions | PDF file iconPDF (149 KB)  
    Freely Available from IEEE
  • An embedded 32-b microprocessor core for low-power and high-performance applications

    Page(s): 1599 - 1608
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (327 KB) |  | HTML iconHTML  

    An embedded RISC microprocessor core fabricated in a six-layer metal 0.18-μm CMOS process implementing the ARMTM V.5TE instruction set is described. The core described is the first implementation of the Intel XScale MicroarchitectureTM. The microprocessor core, which includes caches, memory management units, and a bus controller, comprises a hard-embedded block 16.77 mm2 in size. The implementation is primarily custom logic in a variety of circuit styles. The processor dissipates 450 mW at 1.3 V, 600 MHz, and scales between 55 mW at 0.7 V, 200 MHz, and 900 mW at 1.65 V 800 MHz. Architectural performance is 1000 MIPS at 800 MHz with efficiency ranging from over 850 MIPS/W at 1.65 V to over 4500 MIPS/W at 0.75 V. Architectural and circuit design approaches for low power and high performance are described and measured results from the initial implementation are shown. The first implementation VLSI chip has a 3.3-V pin interface and supports a 0.75-1.65-V core voltage range View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 0.18-μm CMOS IA-32 processor with a 4-GHz integer execution unit

    Page(s): 1617 - 1627
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB)  

    This paper describes the main features and functions of the Pentium(R) 4 processor microarchitecture. We present the front-end of the machine, including its new form of instruction cache called the trace cache, and describe the out-of-order execution engine, including a low latency double-pumped arithmetic logic unit (ALU) that runs at 4 GHz. We also discuss the memory subsystem, including the low-latency Level 1 data cache that is accessed in two clock cycles. We then describe some of the key features that contribute to the Pentium(R) 4 processor's floating-point and multimedia performance. We provide some key performance numbers for this processor, comparing it to the Pentium(R) III processor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 1.8-GHz instruction window buffer for an out-of-order microprocessor core

    Page(s): 1628 - 1635
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB) |  | HTML iconHTML  

    To address the challenges in microprocessor designs beyond a gigahertz, an instruction window buffer (IWB) was designed. The IWB implements the processor parts for renaming, reservation station, and reorder buffer as a unified buffer. Measured results on an experimental chip demonstrated operation of the IWB macros supporting 1.8 GHz, with the chip being at the fast end of the process distribution. The technology is 0.18-μm CMOS8S bulk technology with seven levels of copper interconnect and a 1.5-V supply voltage View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VLSI implementation of a 100-μW multirate FSK receiver

    Page(s): 1821 - 1828
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (448 KB)  

    A very low-power frequency-shift keying (FSK) receiver has been designed for dual-purpose operation: deep space applications and general purpose baseband processing. The receiver is based on a novel, almost all-digital architecture. It supports a wide range of data rates and is very robust against large and fast frequency offsets due to Doppler effects. The architecture utilizes subsampling and l-b data processing together with an FFT-based detection scheme to enable power consumption dramatically lower than a conventional implementation., A system/hardware co-design approach allows the use of a number of circuit-level power reduction techniques while still meeting system-level constraints. In particular, we designed a combination of fully parallel and word-serial decimation stages to simultaneously optimize power consumption and silicon area. We also designed a very efficient FFT block that uses approximate arithmetic and pruning to greatly reduce overall complexity. Additional modules, such as direct digital frequency synthesizer (DDFS) and magnitude computation, have also been optimized in view of the targeted system parameters: signal-to-noise ratio and bit-error rate. The entire architecture has been made maximally flexible and power efficient by utilizing local clock gating and simple interstage, handshaking mechanism. The receiver has been implemented in 0.25-μm CMOS technology and takes up under 1 mm2. The power consumption is below 100 μW for data rates below 20 kb/s. Rates up to 2 Mb/s are supported View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sub-500-ps 64-b ALUs in 0.18-μm SOI/bulk CMOS: design and scaling trends

    Page(s): 1636 - 1646
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (379 KB)  

    In this paper, we present: 1) design of a single-rail energy-efficient 64-b Han-Carlson ALU, operating at 482 ps in 1.5 V, 0.18-μm bulk CMOS; 2) direct port of this ALU to 0.18-μm partially depleted SOI process; 3) SOI-optimal redesign of the ALU using a novel deep-stack quaternary-tree architecture; 4) margining for max-delay pushout due to reverse body bias in SOI designs; and 5) performance scaling trends of the ALU designs in 0.13-μm generation. We show that a direct port of the Han-Carlson ALU to 0.18-μm SOI offers 14% performance improvement after margining. A redesign of the ALU, using an SOI-favored deep-stack architecture improves the margined speedup to 19%. A 10% margin was required for the SOI designs, to account for reverse body-bias-induced max-delay pushout. Preconditioning the intermediate stack nodes in the dynamic ALU designs reduced this margin to 2%. Scaling the ALUs to 0.13-μm generation reduces the overall SOI speedup for both architectures to 9% and 16%, respectively, confirming the trend that speedup offered by SOI technology decreases with scaling View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rotary traveling-wave oscillator arrays: a new clock technology

    Page(s): 1654 - 1665
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (437 KB) |  | HTML iconHTML  

    Rotary traveling-wave oscillators (RTWOs) represent a new transmission-line approach to gigahertz-rate clock generation. Using the inherently stable LC characteristics of on-chip VLSI interconnect, the clock distribution network becomes a low-impedance distributed oscillator. The RTWO operates by creating a rotating traveling wave within a closed-loop differential transmission line. Distributed CMOS inverters serve as both transmission-line amplifiers and latches to power the oscillation and ensure rotational lock. Load capacitance is absorbed into the transmission-line constants whereby energy is recirculated giving an adiabatic quality. Unusually for an LC oscillator, multiphase (360°) square waves are produced directly. RTWO structures are compact and can be wired together to form rotary oscillator arrays (ROAs) to distribute a phase-locked clock over a large chip. The principle is scalable to very high clock frequencies. Issues related to interconnect and field coupling dominate the design process for RTWOs. Taking precautions to avoid unwanted signal couplings, the rise and fall times of 20 ps, suggested by simulation, may be realized at low power consumption. Experimental results of the 0.25-μm CMOS test chip with 950-MHz and 3.4-GHz rings are presented, indicating 5,5-ps jitter and 34-dB power supply rejection ratio (PSRR). Design errors in the test chip precluded meaningful rise and fall time measurements View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multigigabit DRAM technology with 6F2 open-bitline cell, distributed overdriven sensing, and stacked-flash fuse

    Page(s): 1721 - 1727
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (337 KB)  

    A multigigabit DRAM technology was developed that features a low-noise 6F2 open-bitline cell with fully utilized edge arrays, distributed overdriven sensing for operation below 1 V, and a highly reliable post-packaging repair scheme using a stacked-flash fuse. This technology, which can be used to fabricate a 0,13-μm 180-mm2 1-Gb DRAM assembled in a 400-mil package, was verified using a 57.6-mm2, 200-MHz array-cycle, 256-Mb test chip with 0.109-μm2 cells View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An 80/20-MHz 160-mW multimedia processor integrated with embedded DRAM, MPEG-4 accelerator and 3-D rendering engine for mobile applications

    Page(s): 1758 - 1767
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (473 KB) |  | HTML iconHTML  

    A low-power multimedia processor for mobile applications is presented. An 80-MHz 32-b RISC with enhanced multiplier, two 20-MHz hardware accelerators with 7.125-Mb embedded DRAM for MPEG-4 visual SP@L1 decoding and 3-D graphics processing, 2-kB dual-port SRAM, and peripheral blocks are integrated together on a single chip, MPEG-4 SP@L1 video decoding and 3-D graphics rendering with a 16-b depth-buffer alpha-blending double-buffering and gouraud-shading features at 2, 2-Mpolygons/s speed are realized with the help of the dedicated hardware accelerators/ The architecture of the processor is optimized in terms of power consumption and performance, and various low-power circuit techniques are adopted in each hardware block. The chip is implemented using 0.18-μm embedded memory logic (EML) technology. Its area is 84 mm2, and power consumption is 160 mW when all of the functions are activated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A dual-mode NAND flash memory: 1-Gb multilevel and high-performance 512-Mb single-level modes

    Page(s): 1700 - 1706
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (295 KB) |  | HTML iconHTML  

    A 116.7-mm2 NAND flash memory having two modes, 1-Gb multilevel program cell (MLC) and high-performance 512-Mb single-level program cell (SLC) modes, is fabricated with a 0.15-μm CMOS technology. Utilizing simultaneous operation of four independent banks, the device achieves 1.6 and 6.9 MB/s program throughputs for MLC and SLC modes, respectively. The two-step bitline setup scheme suppresses the peak current below 60 mA. The wordline ramping technique avoids program disturbance. The SLC mode uses the 0.5-V incremental step pulse and self-boosting program inhibit scheme to achieve high program performance, and the MLC mode uses 0.15-V incremental step pulse and local self-boosting program inhibit scheme to tightly control the cell threshold voltage Vth distributions. With the small wordline and bitline pitches of 0.3-μm and 0.36-μm, respectively, the cell Vth shift due to the floating gate coupling is about 0.2 V. The read margins between adjacent two program states are optimized resulting in the nonuniform cell Vth distribution for MLC mode View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 250-MHz single-chip multiprocessor for audio and video signal processing

    Page(s): 1768 - 1774
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (307 KB) |  | HTML iconHTML  

    A 250-MHz single-chip multiprocessor, which can implement multichannel decoding, encoding, and transcoding of various audio and video standards, was fabricated using 0.25-μm CMOS technology and consumes 2.38 W at 2.5 V. The multiprocessor integrates four processors and 64-kB shared level-2 cache and exploits coarse-grained parallelism inherent in audio and video signal processing with multithreaded programming. Three coprocessors and scratch-pad memory have been added to each processing element and perform subword parallel processing, background data transfer, and bitstream processing for audio and video signal processing. Useful-skew and clock gating have been utilized to achieve high-speed operation and low power consumption. Consequently, the multiprocessor achieves MPEG2 (MP@HL) video decoding at 20 frames/s View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 2.5-GHz four-phase clock generator with scalable no-feedback-loop architecture

    Page(s): 1666 - 1672
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (308 KB) |  | HTML iconHTML  

    An accurate yet simple multiphase clock generator has been developed by using a delay compensation technique based on phase interpolation that supplies a multiphase clock signal without increasing local circuit area. This generator is applied to the 2.5-GHz four-phase clock distribution of a 5-Gb/s×8-channel receiver fabricated with 0.13-μm CMOS technology. The four-phase generator in the receiver consumes 30 mW and occupies only 0.009 mm2. It requires only 1.5 clock cycles to produce accurate phase differences and can operate from 1.5 to 2.8 GHz, with a range of phase error within ±5 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 80-Mb/s QPSK and 72-Mb/s 64-QAM flexible and scalable digital OFDM transceiver ASICs for wireless local area networks in the 5-GHz band

    Page(s): 1829 - 1838
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (333 KB) |  | HTML iconHTML  

    With the advent of mobile communications, voice telecommunications became wireless. Future applications, however, target multimedia, messaging, and high-speed Internet access, all expressing the need for a broadband high-speed wireless access technique. Both the domestic multimedia and the wireless local area network (WLANs) business markets are addressed. Established systems deliver 2-11 Mb/s based on spectrally inefficient spread-spectrum techniques, where scalability has reached a limit. The next generation of modems requires spectrally more efficient low-power and highly integrated solutions. We describe here the design of two digital baseband orthogonal frequency division multiplex (OFDM) signal processing ASICs, implementing respectively a quaternary phase-shift keying (QPSK)-based 80-Mb/s and a 64 quadrature amplitude modulation (QAM)-based 72-Mb/s digital inner transceiver. The latter partially matches the Hiperlan/2 and IEEE 802.11a standards. Joint development of signal processing algorithms and architectures along with on-chip data transfer, control, and partitioning leads to a low-power, yet flexible and scalable implementation. Both ASICs were designed in a unique object-oriented C++ design flow starting from algorithm level. The ASICs were successfully tested in a 5-GHz testbed both for file data transfer and web-cam multimedia transmission View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 4-GHz clock system for a high-performance system-on-a-chip design

    Page(s): 1693 - 1698
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (247 KB) |  | HTML iconHTML  

    A digital system's clocks must have not only low jitter, but also well-controlled duty cycles in order to facilitate versatile clocking techniques. Power-supply noise is often the most common and dominant source of jitter on a phase-locked loop's (PLL) output clock. Jitter can be minimized by regulating the supply to the PLL's noise-sensitive analog circuit blocks in order to filter out supply noise. This paper introduces a PLL-based clock generator intended for use in a high-speed highly integrated system-on-a-chip design. The generator produces clocks with accurate duty cycles and phase relationships by means of a high-speed divider design. The PLL also achieves a power-supply rejection ratio (PSRR) greater than 40 dB while operating at frequencies exceeding 4 GHz. The high level of noise rejection exceeds that of earlier designs by using a combination of both passive and active filtering of the PLL's analog supply voltage. The PLL system has been integrated in a 0.15-μm single-poly 5-metal digital CMOS technology. The measured performance indicates that at a 4-GHz output frequency the circuit achieves a PSRR greater than 40 dB. The peak cycle-to-cycle jitter is 25 ps at 700 MHz and a 2.8-GHz VCO frequency with a 500-mV step on the regulator's 3.3-V supply. The total power dissipated by the prototype is 130 mW and its active area is 1.48×1.00 mm2 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 150-MHz graphics rendering processor with 256-Mb embedded DRAM

    Page(s): 1775 - 1784
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (373 KB) |  | HTML iconHTML  

    A 150-MHz graphics rendering processor with an integrated 256-Mb embedded DRAM, delivering a rendering rate of 75 M polygons/s, is presented, 287.5 M transistors are integrated on a 21.3×21.7 mm 2 die in a 0.18-μm embedded DRAM CMOS process with six layers of metal. Design methodologies for hierarchical electrical and physical design of this very large-scale IC, including power distribution, fully hierarchical timing design, and verification utilizing a newly developed nonlinear model, clock design, propagation delay, and crosstalk noise management in multi-millimeter RC transmission lines, are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A low-jitter 125-1250-MHz process-independent and ripple-poleless 0.18-μm CMOS PLL based on a sample-reset loop filter

    Page(s): 1673 - 1683
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (420 KB)  

    This paper describes a low-jitter phase-locked loop (PLL) implemented in a 0.18-μm CMOS process. A sample-reset loop filter architecture is used that averages the oscillator proportional control current which provides the feedforward zero over an entire update period and hence leads to a ripple-free control signal. The ripple-free control current eliminates the need for an additional filtering pole, leading to a nearly 90° phase margin which minimizes input jitter peaking and transient locking overshoot. The PLL damping factor is made insensitive to process variations by making it dependent only upon a bandgap voltage and ratios of circuit elements. This ensures tracking between the natural frequency and the stabilizing zero. The PLL has a frequency range of 125-1250 MHz, frequency resolution better than 500 kHz, and rms jitter less than 0.9% of the oscillator period View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multigigahertz clocking scheme for the Pentium(R) 4 microprocessor

    Page(s): 1647 - 1653
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (371 KB) |  | HTML iconHTML  

    Core and I/O clock design for the Pentium(R) 4 microprocessor is described. Two phase-locked loops generate core and I/O clocks supporting concurrent multiple frequencies. A clock distribution network with skew optimization and jitter reduction is designed to achieve low clock inaccuracies for processors at frequencies ⩾2 GHz for the core and ⩾4 GHz for the rapid execution engine. A global medium clock frequency is distributed. Local clock drivers generate pulsed or regular (nonpulsed) clocks at fast, medium, and slow frequencies. A 3.2-GB/s system bus is achieved using a dedicated I/O phase-locked loop with glitch protection and detection. Silicon speed path tools and clock debug features are designed to enable a short debug cycle View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An energy-efficient reconfigurable public-key cryptography processor

    Page(s): 1808 - 1820
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB) |  | HTML iconHTML  

    The ever-increasing demand for security in portable energy-constrained environments that lack a coherent security architecture has resulted in the need to provide energy-efficient algorithm-agile cryptographic hardware. Domain-specific reconfigurability is utilized to provide the required flexibility, without incurring the high overhead costs associated with generic reprogrammable logic. The resulting implementation is capable of performing an entire suite of cryptographic primitives over the integers modulo N, binary Galois fields and nonsupersingular elliptic curves over GF(2n), with fully programmable moduli, field polynomials and curve parameters ranging in size from 8 to 1024 bits. The resulting processor consumes a maximum of 75 mW when operating at a clock rate of 50 MHz and a 2-V supply voltage. In ultralow-power mode (3 MHz at 0.7 V) the processor consumes at most 525 μW. Measured performance and energy efficiency indicate a comparable level of performance to previously reported dedicated hardware implementations, while providing all of the flexibility of a software-based implementation. In addition, the processor is two to three orders of magnitude more energy efficient than optimized software and reprogrammable logic-based implementations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 300-MHz fixed-delay tree search-DFE analog CMOS disk-drive read channel

    Page(s): 1795 - 1807
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (579 KB)  

    A decision feedback equalizer (DFE) with digital error detection and correction implements a fixed-delay tree search with depth of 2. The disk-drive read waveform is first equalized to EPR4 for clock recovery and then re-equalized to the DFE target. A mostly analog implementation of this read channel in 0.6-μm CMOS implements a tapped delay-line forward filter with a cascade of track-and-hold circuits and variable transconductors. Using MTR (2,k) code, the compact read channel IC surpasses a conventional EPR4 read channel with Viterbi detector at user densities in the range 2.0-3.0 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The first MAJC microprocessor: a dual CPU system-on-a-chip

    Page(s): 1609 - 1616
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (327 KB)  

    The first implementation of MAJC architecture achieves high performance by using very long instruction word (VLIW), single instruction multiple data (SIMD), and chip multiprocessing. The chip integrates two processors, a memory controller, two high-speed parallel I/O interfaces, and a PCI controller. The chip, fabricated in a 0.22-μm CMOS process with six layers of copper interconnect, contains 13 million transistors and operates at 500 MHz. It is packaged in a 624-pin ceramic column grid array using flip-chip assembly technology View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A mixed-signal 0.18-μm CMOS SoC for DVD systems with 432-MSample/s PRML read channel and 16-Mb embedded DRAM

    Page(s): 1785 - 1794
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (427 KB)  

    This paper describes a fully integrated single-chip CMOS mixed-signal system on a chip (SoC) for DVD RAM and ROM systems. It integrates a 32-b RISC CPU, formatter, servo digital signal processor (DSP), 16-Mb DRAM, error correction code (ECC), ATA interface, and partial-response-maximum-likelihood (PRML) read channel with 7-b interpolated parallel analog-to-digital converter (ADC). Increasing the bus bandwidth by using embedded DRAM, a hardware ECC engine, and four parallel digital finite-impulse response (FIR) filters contributes to the high playback speed of 16×. PR(3,4,4,3) architecture has been used in the read channel system for optical disc systems. The obtained wide tangential tilt margin of ±0.6° is due to the use of this PRML read channel technique. The interpolated parallel scheme has attained a high number of effective bits of 6.3 for 72-Mz input frequency at 432-MSample/s operation without any calibration technique, with low power consumption of 180 mW in a small core size of 1.05 mm2. This SoC has been fabricated in 0.18-μm 1PS3AL CMOS embedded DRAM technology. It contains 24 million transistors in a 144-mm 2 die and consumes 1.2 W at 432-MSample/s operation. This low power consumption allows the use of a low-cost plastic package. As a result, we can compose highly reliable DVD RAM and ROM systems with this SoC and some tiny components View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 126.6-mm2 AND-type 512-Mb flash memory with 1.8-V power supply

    Page(s): 1707 - 1712
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (267 KB)  

    A 512-Mb flash memory, which is applicable to removable flash media of portable equipment such as audio players, has been developed. The chip is fabricated with a 0.18-μm CMOS process on a 126.6-mm2 die, and uses a multilevel technique (2 bit/1 cell). The memory cell is AND-type, which is suitable for multilevel operation. This paper reports new techniques adopted in the 512-Mb flash memory. First, techniques for low voltage operation are described. The charge pump, control of pumps, and the reference voltage generator are improved to generate internal voltage stably for multilevel flash memory. Next, a method for reducing total memory cost in the removable flash media is described. A new operation mode named read-modify-write is introduced on the chip. This feature makes the memory system simple, because the controller does not have to track sector-erase information View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Universal-Vdd 0.65-2.0-V 32-kB cache using a voltage-adapted timing-generation scheme and a lithographically symmetrical cell

    Page(s): 1738 - 1744
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (311 KB)  

    A universal-Vdd 32-kB four-way-set-associative embedded cache has been developed. A test cache chip was fabricated by using 0.18-μm enhanced CMOS technology, and it was found to continuously operate from 0.65 to 2.0 V. Its operating frequency and power are from 120 MHz and 1.7 mW at 0.65 V to 1.04 GHz and 530 mW at 2.0 V. The cache is based on two new circuit techniques: a voltage-adapted timing-generation scheme with plural dummy cells for the wider voltage-range operation, and use of a lithographically symmetrical cell for lower voltage operation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 76-mm2 8-Mb chain ferroelectric memory

    Page(s): 1713 - 1720
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (329 KB)  

    This paper demonstrates the first 8-Mb chain ferroelectric RAM (chain FeRAM) with 0,25-μm 2-metal CMOS technology. A small die of 76 mm2 and a high average cell/chip area efficiency of 57.4 % have been realized by introducing not only chain architecture but also four new techniques: 1) a one-pitch shift cell realizes small cell size of 5.2 μm2; 2) a new hierarchical wordline architecture reduces row-decoder and plate-driver areas without an extra metal layer; 3) a small-area dummy cell scheme reduces dummy capacitor size to 1/3 of the conventional one; and 4) a new array activation scheme reduces dataline and second amplifier areas. As a result, the chain architecture with these new techniques reduces die size to 65% of that of the conventional FeRAM. Moreover a ferroelectric capacitor overdrive scheme enables sufficient polarization switching, without overbias memory cell array. This scheme lowers the minimum operation voltage by 0.23 V, and enables 2.5-V Vdd operation. Thanks to fast cell plateline drive of chain architecture, the 8-Mb chain FeRAM has achieved the fastest random access time, 40 ns, and read/write cycle time, 70 ns, at 3.0 V so far reported View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Michael Flynn
University of Michigan