Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. For technical support, please contact us at onlinesupport@ieee.org. We apologize for any inconvenience.
By Topic

VLSI, 2009. ISVLSI '09. IEEE Computer Society Annual Symposium on

Date 13-15 May 2009

Filter Results

Displaying Results 1 - 25 of 60
  • [Front cover]

    Publication Year: 2009 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (390 KB)  
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2009 , Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (20 KB)  
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2009 , Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (65 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2009 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (104 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2009 , Page(s): v - ix
    Save to Project icon | Request Permissions | PDF file iconPDF (182 KB)  
    Freely Available from IEEE
  • Conference Information

    Publication Year: 2009 , Page(s): x - xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (96 KB)  
    Freely Available from IEEE
  • Overview of the Scalable Communications Core: A Reconfigurable Wireless Baseband in 65nm CMOS

    Publication Year: 2009 , Page(s): 1 - 6
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (728 KB) |  | HTML iconHTML  

    The scalable communications core (SCC) is a flexible baseband processor that consists of a heterogeneous set of coarse-grained, programmable accelerators connected via a packet-based 3-ary 2-cube Network-on-Chip (NoC). SCC supports multiple wireless protocols to meet the demand for ubiquitous communications and computing with low power and area. We have recently completed a prototype test chip in a 65 nm process and validated it for WiFi and WiMAX protocols. The area and energy efficiency of our test chip is comparable to other basebands found in the literature. To demonstrate its flexibility, additional protocols have been mapped to the architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reduction of Current Mismatch in PLL Charge Pump

    Publication Year: 2009 , Page(s): 7 - 12
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (195 KB) |  | HTML iconHTML  

    Low static phase offset is desired in Phase Locked Loops (PLL) employed in high speed I/O interfaces and frequency synthesizers. In this work, non idealities in phase frequency detector and charge pump contributing to static phase offset have been studied and their relative contributions analyzed in detail. A new charge pump architecture with reduced mismatch between Up and Dn current sources has been presented. It makes use of a single two stage amplifier for both current steering and reduction of mismatch. The efficacy of this architecture has been demonstrated with simulation results on a PLL running at an input reference frequency of 500 MHz in 65 nm CMOS technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low Phase-Noise and Wide Tuning-Range CMOS Differential VCO for Frequency ?S Modulator

    Publication Year: 2009 , Page(s): 13 - 18
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (403 KB) |  | HTML iconHTML  

    In this paper, a novel low phase-noise and wide tuning-range CMOS differential voltage-controlled oscillator (VCO) for a frequency DeltaSigma modulator (FDSM) is presented. The VCO which converts an analog input voltage to phase information is based on a differential ring oscillator with modified symmetric load and a positive feedback in the differential delay cells, combined with a new bias circuit. The proposed VCO with two stages operating at a low power supply voltage of 0.6 V can achieve low power consumption of 212 uW, and wide tuning-range by increasing the operating frequency range by about 22%. The phase noise is -132 dBc/Hz at 600 KHz offset from the centre frequency of 480 MHz. The new VCO has a good linearity reducing harmonic distortion in the DeltaSigma modulator. The circuits are designed using a 65 nm CMOS process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip

    Publication Year: 2009 , Page(s): 19 - 24
    Cited by:  Papers (23)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (455 KB) |  | HTML iconHTML  

    Networks-on-chip (NoCs) can improve the communication bandwidth and power efficiency of multiprocessor systems-on-chip (MPSoC). However, traditional metallic interconnects consume significant amount of power to deliver even higher communication bandwidth required in the near future. Optical NoCs are based on optical interconnects and optical routers, and have significant bandwidth and power advantages. This paper proposed a high-performance low-power low-cost optical router, Cygnus, for optical NoCs. Cygnus is non-blocking and based on silicon microresonators. We compared Cygnus with other microresonator-based routers, and analyzed their power consumption, optical power insertion loss, and the number of microresonators used in detail. The results show that Cygnus has the lowest power consumption and losses, and requires the lowest number of microresonators. For example, Cygnus has 50% less power consumption, 51% less optical power insertion loss, and 20% less microresonators than the optimized traditional optical crossbar router. Comparing to a high-performance 45nm electronic router, Cygnus consumes 96% less power. Moreover, the passive routing feature of Cygnus guarantees that, while using dimension order routing algorithm, the maximum power consumption to route a packet through a network is a small constant number, regardless of the network size. For example, the maximum power consumption is 4.80fJ/bit under current technologies. We simulated and analyzed an 8 times 8 2D mesh NoC built from Cygnus and showed the end-to-end delay and network throughput under different offered loads and packet sizes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High Performance Non-blocking Switch Design in 3D Die-Stacking Technology

    Publication Year: 2009 , Page(s): 25 - 30
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (362 KB) |  | HTML iconHTML  

    Die stacking is a promising new technology that enables integration of devices in the third dimension. It allows the stacking of multiple active layers directly on top of one another with short, dense die-to-die vias providing communication. Previous work has shown significant benefits at all design targets, from stacking memory on logic to partitioning individual architectural units across multiple layers. Many high-speed processor units-ALUs, register files, caches, and instruction schedulers-have all been designed in 3D, achieving significant, simultaneous power savings and performance boosts. Other work has looked at the implementation of network-on-chip in a die stack but restricted the focus to planar designs of the various unit(processors, routers, etc.). This work follows up on these two re-search areas to explore the 3D design of router components, specifically the crossbar. We examine the implementation of a crossbar and two multistage interconnect networks to determine the potential benefits of 3D implementations. Compared to equivalent planar designs,we achieve a maximum delay reduction of 26% and maximum power savings of 24%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Leakage Power and Side Channel Security of Nanoscale Cryptosystem-on-Chip (CoC)

    Publication Year: 2009 , Page(s): 31 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (839 KB) |  | HTML iconHTML  

    This paper investigates the viability of using leakage power consumption as a source of side channel information. The side channel effect is characterized in leakage power. It is shown that the increasing trend of leakage power is highly correlated with security vulnerability of cryptosystems. Addressing the severity of the side channel threat in nanoscale Cryptosystem-on-Chip (CoC), we examine the leakage reduction techniques for the side channel security application. The result shows among the circuit-based reduction techniques high Vth transistor assignment which significantly reduces both average and standard deviation of the leakage power can be exploited as a side channel aware leakage reduction in design and implementation of CoC in submicron era. The findings in this work which are presented for the first time are crucial for the development of side channel resistant cryptosystems in the upcoming CMOS technologies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Context-aware Post Routing Redundant Via Insertion

    Publication Year: 2009 , Page(s): 37 - 42
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (357 KB) |  | HTML iconHTML  

    Effective algorithms have been invented for post-routing redundant via insertion (RVI). However, implementations of these algorithms often ignore some practical issues. In this article, we implement a post-routing RVI algorithm that takes into account interconnect contexts during RVI. Experimental results show that our context-aware RVI on average raises via1 (vias between metal layer 1 and 2) insertion rate from 37.4% to 72.1% and total insertion rate from 72.5% to 85.8%. On average, it increases RVI rate of critical paths by 3.6%. Besides, with redundant pin-area minimization, our approach reduces metal 1 and metal 2 area used for RVI at pins by 3%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Rerouting Algorithms for Congestion Mitigation

    Publication Year: 2009 , Page(s): 43 - 48
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (298 KB) |  | HTML iconHTML  

    Congestion mitigation and overflow avoidance are two of the major goals of the global routing stage. With a significant increase in the chip size and routing complexity,congestion and overflow have become critical issues in physical design automation. In this paper we present several routing algorithms for congestion reduction and overflow avoidance.Our methods are based on ripping up nets that go through the congested regions and replacing them with congestion-aware Steiner trees. We propose several efficient algorithms for finding congestion-aware Steiner trees and evaluate their performance using the ISPD routing benchmarks. We also show that the novel technique of network coding contributes to further improvements in routability and reduction of congestion. Accordingly, we propose an algorithm for identifying efficient congestion-aware network coding topologies and evaluate its performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Non-Uniform Grid Based Ground Plane Model for High Performance Nodes: The Impact of Heterogeneous Cores on Ground Voltage Gradient

    Publication Year: 2009 , Page(s): 49 - 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1144 KB) |  | HTML iconHTML  

    With shift towards heterogeneous core architectures imminent, the uniform grid based ground plane model that is currently employed for chip-multiprocessors will no longer suffice. It is practically impossible to achieve absolute zero potential at all grid nodes of the uniform ground plane model with advent of heterogeneous cores. Differential injection of current into the ground plane by different heterogeneous core partitions results in voltage gradients across the ground plane, which is detrimental to the operation of the processor. The extremely stochastic spiking activity of different cores further accentuates the problem. To overcome the problem of varying voltage distribution across the ground plane, we propose a first-ever ground plane model structured as a non-uniform RLC interconnect grid. A simulated annealing optimization is employed with parameter of dasiatemperaturepsila as each node in the grid and impedance as the cost function psiladeltaepsila to arrive at the non-uniform grid structure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-the-Fly Evaluation of FPGA-Based True Random Number Generator

    Publication Year: 2009 , Page(s): 55 - 60
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (199 KB) |  | HTML iconHTML  

    Many embedded security chips require a high-quality digital True Random Number Generator (TRNG). Recently, some new TRNGs have been proposed in the literature, innovating by their new architectures. Moreover, some of them don't need to use the post-processing unit usually required in TRNG constructions. As a result, the TRNG data rate is enhanced and the produced random bits only depend on the noise source and its sampling. However, selecting a TRNG can be a delicate problem. In a hardware context (e.g. Field-Programmable Gate Array (FPGA) or Application-Specific Integrated Circuit (ASIC) implementation), the design area and power consumption are important criterions. To the best of our knowledge, no effective comparison of several TRNGs appears in the literature. This paper evaluates the randomness behavior, the area and the power consumption of the latest TRNGs. These investigations are realized into real conditions, by implementing the TRNGs into FPGA circuits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lifetime Reliability Aware Design Flow Techniques for Dual-Vdd Based Platform FPGAs

    Publication Year: 2009 , Page(s): 61 - 66
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (557 KB) |  | HTML iconHTML  

    Increasing on-chip power densities with aggressive technology scaling has led to a low-power FPGA fabric with dual supply voltages. Such low-power techniques coupled with the heterogeneity of components on a FPGA have led to non-uniform aging of components due to temperature and voltage dependent failure mechanisms. In this paper, we present techniques in placement and routing stages of the design flow that will increase the average life-time of components by ensuring uniform aging. We first study the impact of temperature and voltage variations on lifetime reliability of components. In the presence of such variations, we study the impact of aging in FPGA interconnects due to Electromigration (EM), and dielectric breakdown due to Time Dependent Dielectric Breakdown (TDDB). We also consider the performance degradation due to Hot Carrier Instability (HCI) in our design flow optimizations. The proposed reliability aware design flow techniques achieve an average of 65.8% and 75% improvement in lifetime of LUTs and interconnect wires respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Self-Reconfigurable Platform for Scalable DCT Computation Using Compressed Partial Bitstreams and BlockRAM Prefetching

    Publication Year: 2009 , Page(s): 67 - 72
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (181 KB) |  | HTML iconHTML  

    In this paper, we propose a self-reconfigurable platform which can reconfigure the architecture of DCT computations during run-time using dynamic partial reconfiguration. The scalable architecture of DCT computations can compute different number of DCT coefficients in the zigzag scan order to adapt to different requirements, such as power consumption, hardware resource, and performance. We propose a configuration manager which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during run-time. In addition, we use LZSS algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve 400 MBytes/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Speed Low-Current Duobinary Signaling Over Active Terminated Chip-to-Chip Interconnect

    Publication Year: 2009 , Page(s): 73 - 78
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1698 KB) |  | HTML iconHTML  

    In this work we propose high-speed low-current duobinary signaling scheme over an active terminated chip-to-chip interconnect. The active termination scheme eliminates the need of any dedicated passive terminator both at the transmitter and receiver, avoiding signal reflection. Elimination of the passive terminator helps to reduce the transmitted signal level without effecting signal detect-ability of the receiver and also removes the thermal noise of the terminator. To implement bandwidth efficient duobinary signaling, we present a current-mode high-speed precoder operating at 10-Gb/s. A low-current active terminated driver based on modified Cherry-Hooper topology is proposed. At the receive-end, we propose an active terminated current-mode receiver(Rx) with regulated gate cascode (RGC) based transimpedance amplifier(TIA). Folded active inductor peaking is used to enhance the bandwidth of this TIA. We also propose low-power broadband equalizer topology for channel equalization. The duobinary transmitter and receiver circuits are implemented in 1.8-V, 0.18-mum Digital CMOS technology with an fT of 27-GHz. The designed high speed duobinary Tx/Rx circuits work up-to 8-Gb/s speed while transmitting the data over FR4 PCB trace of length 29.5-inch and for the targeted bit-error-rate(BER) of 10-12. The power consumed in the transmitter and receiver circuits is 42.9-mW at 8-Gb/s. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modern Floorplanning with Boundary Clustering Constraint

    Publication Year: 2009 , Page(s): 79 - 84
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (594 KB) |  | HTML iconHTML  

    With the development of SOC designs, modern floorplanning typically needs to provide extra options to meet the different emerging requirements in the hierarchical designs, such as boundary constraint for I/O connection, clustering constraint for performance and reliability, etc. This paper addresses modern floorplanning with boundary clustering constraint. It has been empirically shown that the modern constraints extremely restrict the solution space; that is, a large number of randomly generated floorplans might be infeasible. In order to effectively search the feasible solutions, the feasible conditions based on B*-tree representation with boundary clustering constraint are investigated. The properties, coupled with an efficient simulated annealing algorithm, provide the way to produce feasible floorplans by dynamic repairing, which can transform an infeasible solution into a feasible one if the constraint is violated. Our algorithm is verified by using the MCNC and GSRC benchmarks, and the empirical results show that our algorithm can obtain promising solutions in acceptable time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Hardware Architecture for Multimedia Encryption and Authentication Using the Discrete Wavelet Transform

    Publication Year: 2009 , Page(s): 85 - 90
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1462 KB) |  | HTML iconHTML  

    This paper introduces a zero-overhead encryption and authentication scheme for real-time embedded multimedia systems. The parametrized construction of the Discrete Wavelet Transform (DWT) compression block is used to introduce a free parameter in the design. It allows building a keyspace for lightweight multimedia encryption. The parametrization yields rational coefficients leading to an efficient fixed point hardware implementation. A clock speed of over 240 MHz was achieved on a Xilinx Virtex 5 FPGA. Comparison with existing approaches was performed to indicate the high throughput and low hardware overhead in adding the security feature to the DWT architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Placement Algorithm for Reduction of Soft Errors in Macrocell Based Design of Nanometer Circuits

    Publication Year: 2009 , Page(s): 91 - 96
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (222 KB) |  | HTML iconHTML  

    The rates of transient faults such as soft errors have been significantly impacted due to the aggressive scaling trends in the nanometer regime. In the past, several circuit optimization techniques have been proposed for preventing soft errors in logic circuits. These approaches include, inclusion of concurrent error detection circuits on selective nodes, selective gate sizing, dual-VDD assignment and selective node hardening at the transistor level. However, we show in this paper that larger wirelengths for nets can act as larger RC ladders and can effectively filter out the transient glitches due to radiation strikes. Based on this, we propose a simulated annealing based placement algorithm that significantly reduces the SER of logic circuits. We accurately capture the soft error masking effects by using a new metric called the logical observability. The cost function for simulated annealing is modeled as the summation of the logical observability weighted with the netlength for each net, while simultaneously constraining the total area and the total wirelength. The algorithm tries to assign higher wirelengths for nets with low masking probability for higher glitch reduction, while maintaining low delay and area penalty for the overall circuit. Each placement configuration is represented as a sequence pair and the moves in the space of sequence pairs are probabilistically accepted depending upon the cost gradient and the iteration count. Higher cost moves have a higher probability of acceptance at initial iterations for better state space exploration, while at later iterations the algorithm greedily tries to minimize the cost. To the best of our knowledge, this is the first time that soft error rate reduction is attempted during the placement stage. The proposed algorithm has been implemented and validated on the ISCAS85 benchmarks. We have experimented using the FreePDK 45nm Process Design Kit and the OSU cell library which indicate that our radiation immune plac- ement algorithm can significantly reduce the SER in logic circuits with very low overheads in delay and area. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximally Redundant High-Radix Signed-Digit Adder: New Algorithm and Implementation

    Publication Year: 2009 , Page(s): 97 - 102
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (323 KB) |  | HTML iconHTML  

    Redundant number systems have been widely used in fast arithmetic circuits design. Signed-digit (SD) or generally high-radix SD (HRSD) number system is one of the most important redundant number systems. HRSD additions are used in many arithmetic functions as basic operations. Hence, improving the additions characteristics will improve the performance of almost all arithmetic modules. Several HRSD adders have been introduced in literatures. In this paper a new maximally redundant HRSD adder is proposed. This adder is compared to some most efficient HRSD adders previously published. The proposed adder is fabricated using a standard TSMC 65 nm CMOS technology at 1 volt supply voltage. The adder consumes 2.5% less power than the best previous published HRSD design. These implementations are also synthesized with FPGA flow on Xilinx Virtex2. The experimental result shows 5% and 6% decreases in the area and delay, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transition Inversion Based Low Power Data Coding Scheme for Synchronous Serial Communication

    Publication Year: 2009 , Page(s): 103 - 108
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (601 KB) |  | HTML iconHTML  

    Reducing off-chip bus power consumption has become one of the key issues for low power system design. Although methods have been proposed to reduce the power dissipated in parallel buses, these techniques do not apply to serial communication since they work on consecutive data words. The data line in synchronous serial communication is a major source of power dissipation, apart from the clock line. The clock line cannot be modified due to the requirements of data recovery. This work outlines a novel transition inversion based data coding protocol by which these transitions on the data line can be reduced for synchronous serial buses like JTAG, SPI, I2C etc. Simulation results show up to 31.9% reduction in transitions, with negligible performance loss. Analysis on the utility of the proposed technique for error detection shows that the technique can be used instead of the parity bit technique since both are found to have the same average error detection capability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-line MPSoC Scheduling Considering Power Gating Induced Power/Ground Noise

    Publication Year: 2009 , Page(s): 109 - 114
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (439 KB) |  | HTML iconHTML  

    Power gating induced power/ground (P/G) noise is a major reliability problem facing by low power MPSoCs using power gating techniques. Powering on and off a process unit in MPSoCs will induce large P/G noise and can cause timing divergence and even functional errors in surrounding processing units. P/G noise is different from thermal or energy which is an accumulative effect. The noise level should be predicted and victim circuits should be protected before the noise is induced. hence, the power gating-aware scheduling problem with the consideration of P/G noise should be solved using an on-line method considering the run-time variation of tasks' execution time. In this paper, we formulate an on-line task scheduling problem with the consideration of P/G noise based on our detailed P/G noise analysis platform for MPSoC. An efficient on-line Greedy Heuristic (GH) algorithm that adapts well to real-time variation is proposed to reduce noise protection penalty and improve MPSoC performance. Our experiments show that the algorithm can achieve an average 26% performance improvement together with an average 73% noise protection penalty saving compared with the conservative stop-go method. We also compare our technique with a two-step solution that computes a static schedule at compile time and make adjustment on the schedule according to runtime variations. For benchmark with larger task number, GH method achieves impressive performance improvement comparing with the two-step solution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.