By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 7 • Date July 2012

Filter Results

Displaying Results 1 - 25 of 27
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (149 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • A Low-Power Low-Cost Design of Primary Synchronization Signal Detection

    Page(s): 1161 - 1166
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (573 KB) |  | HTML iconHTML  

    Synchronization is an important component of a practical communication system. Furthermore, network entry including synchronization is important. Since the detection of primary synchronization signal (PSS) is the first step of network entry in long term evolution (LTE) systems, thus it may be a critical path for practical systems. Therefore, tradeoff between performance and low power consumption and low cost of PSS detection needs to be made carefully. This paper presents a new synchronization method for low power and low cost design. The approach of a 1-bit analog-to-digital converter (ADC) with down-sampling is compared with that of a 10-bit ADC without down-sampling under multi-path fading conditions defined in LTE standard for user equipment (UE) performance test . The simulation results of PSS are obtained on several kinds of channels. The simulation results explicitly show that the performance of the method with down-sampling for 1-bit ADC does not degrade even if frequency offset exists. Based on the simulation results, different implementation architectures and their synthesis report and analysis are present. A low-power low-cost design with high performance to detect PSS is derived in this paper. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fast-Response Pseudo-PWM Buck Converter With PLL-Based Hysteresis Control

    Page(s): 1167 - 1174
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1904 KB) |  | HTML iconHTML  

    Hysteresis voltage-mode control is a simple and fast control scheme for switched-mode power converters. However, it is well-known that the switching frequency of a switched-mode power converter with hysteresis control depends on many factors such as loading current and delay of the controller which vary from time to time. It results in a wide noise spectrum and leads to difficulty in shielding electro-magnetic interference. In this work, a phase-lock loop (PLL) is utilized to control the hysteresis level of the comparator used in the controller, while not interfering with the intrinsic behavior of the hysteresis controller. Some design techniques are used to solve the integration problem and to improve the settling speed of the PLL. Moreover, different control modes are implemented. A buck converter with proposed control scheme is fabricated using a commercial 0.35-μ m CMOS technology. The chip area is 1900 μm × 2200 μm. The switching frequency is locked to 1 MHz, and the measured frequency deviation is within ±1%. The measured load transient response between 160 and 360 mA is 5 μ s only. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Power Management of MIMO Network Interfaces on Mobile Systems

    Page(s): 1175 - 1186
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB) |  | HTML iconHTML  

    High-speed wireless network interfaces are among the most power-hungry components on mobile systems. This is particularly true for multiple-input-multiple-output (MIMO) network interfaces which use multiple RF chains simultaneously. In this paper, we present a novel power management solution for MIMO network interfaces on mobile systems, called antenna management. The key idea is to adaptively disable a subset of antennas and their RF chains to reduce circuit power consumption, when the capacity improvement of using a large number of antennas is small. Antenna management judiciously determines the number of active antennas to minimize energy per bit while satisfying the data rate requirement. This work provides both theoretical framework and system design of antenna management. We first present an algorithm that efficiently solves the problem of minimizing energy per bit and, then offer its 802.11n-compliant system designs. We employ both Matlab-based simulation and prototype-based experiment to validate the energy efficiency benefit of antenna management. The results show that antenna management can achieve 21% one-end energy per bit reduction to the front end of the MIMO network interface, compared to a static MIMO configuration that keeps all antennas active. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Temperature-Aware Idle Time Distribution for Leakage Energy Optimization

    Page(s): 1187 - 1200
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2316 KB) |  | HTML iconHTML  

    Large-scale integration with deep sub-micron technologies has led to high power densities and high chip working temperatures. At the same time, leakage energy has become the dominant energy consumption source of circuits due to reduced threshold voltages. Given the close interdependence between temperature and leakage current, temperature has become a major issue to be considered for power-aware system level design techniques. In this paper, we address the issue of leakage energy optimization through temperature aware idle time distribution (ITD). We first propose an offline ITD technique to optimize leakage energy consumption, where only static idle time is distributed. To account for the dynamic slack, we then propose an online ITD technique where both static and dynamic idle time are considered. To improve the efficiency of our ITD techniques, we also propose an analytical temperature analysis approach which is accurate and, yet, sufficiently fast to be used inside the energy optimization loop. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Independently-Controlled-Gate FinFET Schmitt Trigger Sub-Threshold SRAMs

    Page(s): 1201 - 1210
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2180 KB) |  | HTML iconHTML  

    In this work, we propose three novel independently-controlled-gate Schmitt Trigger (IG_ST) FinFET SRAM cells for sub-threshold operation. The proposed IG_ST 8 T SRAM cells utilize split-gate FinFET devices with the front-gate devices serving as the stacking devices, and the back-gate devices serving as the intermediate node conditioning devices to provide built-in feedback mechanism for Schmitt Trigger action, thus reducing the cell transistor count/area and achieving improved static noise margin (SNM) and better tolerance to process variation and random variations. 3-D mixed-mode simulations are used to evaluate the Read static noise margin (RSNM), Write static noise margin (WSNM), hold static noise margin (HSNM), and Standby leakage of proposed cells, and results are compared with the standard 6 T cells and previously reported 10 T Schmitt Trigger sub-threshold SRAM cells. Compared with the conventional tied-gate 6 T cell, the proposed IG_ST SRAM cells demonstrate 1.81X and 2.11X higher nominal RSNM at VCS= 0.4 and 0.15 V, respectively. The cell layouts and areas are assessed based on scaled ground rules from 32 nm node, and the density advantage over previously reported 10 T Schmitt Trigger sub-threshold SRAM cells are illustrated. The cell AC performance (Read access time, Write time, and Read access time versus the number of cells per bit-line considering worst-case data pattern for bit-line leakage) and temperature dependence are evaluated, and shown to be adequate for the intended sub-threshold applications. Compared with previously reported 10 T Schmitt Trigger sub-threshold SRAM cells, the proposed cells exhibit comparable or better RSNM, higher density, and lower Standby leakage current. 3-D mixed-mode Monte Carlo simulations are performed to investigate the impacts of process variations (Leff, EOT, Wfin, and Hfin) and random variations (Gate LER and Fin LER) on RSNM, WSNM, and HSNM. Our results- indicate that even at the worst corner, two of the proposed cells can provide sufficient margin of μ/σ ratio. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonrandom Device Mismatch Considerations in Nanoscale SRAM

    Page(s): 1211 - 1220
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1643 KB) |  | HTML iconHTML  

    Competitive density, performance, and functional objectives of the SRAM bit cell require design rules which are much more aggressive than those used in base logic designs. Because soft fail yield in SRAM is dependent on the device threshold and threshold mismatch in the bit cell, much research has been directed toward addressing the random contributors to within-cell device threshold variation. We examine four sources of potential nonrandom threshold mismatch that can arise from the use of aggressive design rules in the bit cell: 1) implanted ion straggle in SiO2; 2) polysilicon inter-diffusion driven counter-doping; 3) lateral ion straggle from the photoresist; and 4) photoresist implant shadowing. Using simulation and hardware measurements, we quantify the device parametric impacts and provide a statistical treatment forming the basis for quantification of the functional margin impacts on the bit cell. We examine two lithography-compliant bit-cell layout topologies and quantify the impact of systematic mismatch on the margin limited yield. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonlinear Multi-Error Correction Codes for Reliable MLC nand Flash Memories

    Page(s): 1221 - 1234
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (929 KB) |  | HTML iconHTML  

    Multi-level cell (MLC) nand flash memories are popular storage media because of their power efficiency and large storage density. Conventional reliable MLC nand flash memories based on BCH codes or Reed-Solomon (RS) codes have a large number of undetectable and miscorrected errors. Moreover, standard decoders for BCH and RS codes cannot be easily modified to correct errors beyond their error correcting capability t=[(d-1/2)], where d is the Hamming distance of the code. In this paper, we propose two general constructions of nonlinear multi-error correcting codes based on concatenations or generalized from Vasil'ev codes. The proposed constructions can generate nonlinear bit-error correcting or digit-error correcting codes with very few or even no errors undetected or miscorrected for all codewords. Moreover, codes generated by the generalized Vasil'ev construction can correct some errors with multiplicities larger than t without any extra overhead in area, latency, and power consumption compared to schemes where only errors with multiplicity up to t are corrected. The design of reliable MLC nand flash architectures can be based on the proposed nonlinear multi-error correcting codes. The reliability, area overhead and the penalty in latency and power consumption of the architectures based on the proposed codes are compared to architectures based on BCH codes and RS codes. The results show that using the proposed nonlinear error correcting codes for the protection of MLC nand flash memories can reduce the number of errors undetected or miscorrected for all codewords to be almost 0 at the cost of less than 20% increase in power and area compared to architectures based on BCH codes and RS codes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-Throughput Soft-Output MIMO Detector Based on Path-Preserving Trellis-Search Algorithm

    Page(s): 1235 - 1247
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1265 KB) |  | HTML iconHTML  

    In this paper, we propose a novel path-preserving trellis-search (PPTS) algorithm and its high-speed VLSI architecture for soft-output multiple-input-multiple-output (MIMO) detection. We represent the search space of the MIMO signal with an unconstrained trellis, where each node in stage k of the trellis maps to a possible complex-valued symbol transmitted by antenna k. Based on the trellis model, we convert the soft-output MIMO detection problem into a multiple shortest paths problem subject to the constraint that every trellis node must be covered in this set of paths. The PPTS detector is guaranteed to have soft information for every possible symbol transmitted on every antenna so that the log-likelihood ratio (LLR) for each transmitted data bit can be more accurately formed. Simulation results show that the PPTS algorithm can achieve near-optimal error performance with a low search complexity. The PPTS algorithm is a hardware-friendly data-parallel algorithm because the search operations are evenly distributed among multiple trellis nodes for parallel processing. As a case study, we have designed and synthesized a fully-parallel systolic-array detector and two folded detectors for a 4 × 4 16-QAM system using a 1.08 V TSMC 65-nm CMOS technology. With a 1.18 mm2 core area, the folded detector can achieve a throughput of 2.1 Gbps. With a 3.19 mm2 core area, the fully-parallel systolic-array detector can achieve a throughput of 6.4 Gbps. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource Efficient Implementation of Low Power MB-OFDM PHY Baseband Modem With Highly Parallel Architecture

    Page(s): 1248 - 1261
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2068 KB) |  | HTML iconHTML  

    The multi-band orthogonal frequency-division multiplexing modem needs to process large amount of computations in short time for support of high data rates, i.e., up to 480 Mbps. In order to satisfy the performance requirement while reducing power consumption, a multi-way parallel architecture has been proposed. But the use of the high degree parallel architecture would increase chip resource significantly, thus a resource efficient design is essential. In this paper, we introduce several novel optimization techniques for resource efficient implementation of the baseband modem which has highly, i.e., 8-way, parallel architecture, such as new processing structures for a (de)interleaver and a packet synchronizer and algorithm reconstruction for a carrier frequency offset compensator. Also, we describe how to efficiently design several other components. The detailed analysis shows that our optimization technique could reduce the gate count by 27.6% on average, while none of techniques degraded the overall system performance. With 0.18-μm CMOS process, the gate count and power consumption of the entire baseband modem were about 785 kgates and less than 381 mW at 66 MHz clock rate, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrated Hardware Architecture for Efficient Computation of the n -Best Bio-Sequence Local Alignments in Embedded Platforms

    Page(s): 1262 - 1275
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1343 KB) |  | HTML iconHTML  

    A flexible hardware architecture that implements a set of new and efficient techniques to significantly reduce the computational requirements of the commonly used Smith-Waterman sequence alignment algorithm is presented. Such innovative techniques use information gathered by the hardware accelerator during the computation of the alignment scores to constrain the size of the subsequence that has to be post-processed in the traceback phase using a general purpose processor (GPP). Moreover, the proposed structure is also capable of computing the n-best local alignments according to the Waterman-Eggert algorithm, becoming the first hardware architecture that is able to simultaneously evaluate the n-best alignments of a given sequence pair, by incorporating a set of ordering units that work in parallel with the systolic array. A complete alignment system was developed and implemented in a Virtex-4 FPGA, by integrating the proposed accelerator architecture with a Leon3 GPP. The obtained experimental results demonstrate that the proposed system is flexible and allows the alignment of large sequences in memory constrained systems. As an example, a speedup of 17 was obtained with the conceived system when compared with a regular implementation of the LALIGN35 program running on an Intel Core2 Duo processor running at a 40 × higher frequency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware Implementation of Nakagami and Weibull Variate Generators

    Page(s): 1276 - 1284
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1159 KB) |  | HTML iconHTML  

    An efficient implementation of Nakagami- m and Weibull variate generators on a single field-programmable gate array (FPGA) is presented. The hardware model first generates a correlated Rayleigh fading variate sequence and then transforms it into a sequence of Nakagami-m or Weibull fading variates. A biquad processor facilitates the compact implementation of a Rayleigh variate generator with arbitrary autocorrelation properties. A combination of logarithmic and linear domain segmentations along with piece-wise linear approximations is used to accurately implement the nonlinear numerical functions required to transform the correlated Rayleigh fading process into Nakagami-m or Weibull fading processes. When implemented on a Xilinx Virtex-5 5VSX240TFF1738-2 FPGA, the fading simulator uses only 1.6% of the configurable slices, 1.2% of the DSP48E modules and 3 block memories, while operating at 120 MHz, generating 120 million complex variates per second. The throughput can be increased up to 373 MHz with this FPGA if two separate clock sources are utilized. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 65fJ/b Inter-Chip Inductive-Coupling Data Transceivers Using Charge-Recycling Technique for Low-Power Inter-Chip Communication in 3-D System Integration

    Page(s): 1285 - 1294
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1991 KB) |  | HTML iconHTML  

    This paper presents a low-power inductive-coupling link in 90-nm CMOS. Our newly proposed transmitter circuit uses a charge-recycling technique for power-aware 3-D system integration. The cross-type daisy chain enables charge recycling and achieves power reduction without sacrificing communication performance such as a high timing margin, low bit error rate and high bandwidth. There are two design issues in the cross-type daisy chain: pulse amplitude reduction and another is inter-channel skew. To compensate for these issues, an inductor design and a replica circuit are proposed and investigated. Test chips were designed and fabricated in 90-nm CMOS to verify the validity of the proposed transmitter. Measurements revealed that the proposed cross-type daisy chain transmitter achieved an energy efficiency of 65 fJ/bit without degrading the timing margin, data rate, or bit error rate. In order to investigate the compatibility of the transmitter with technology scaling, a simulation of each technology node was performed. The simulation results indicate that the energy dissipation can be potentially reduced to less than 10 fJ/bit in 22 nm CMOS with proposed cross-type daisy chain. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing Floating Point Units in Hybrid FPGAs

    Page(s): 1295 - 1303
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1454 KB) |  | HTML iconHTML  

    This paper introduces a methodology to optimize coarse-grained floating point units (FPUs) in a hybrid field-programmable gate array (FPGA), where the FPU consists of a number of interconnected floating point adders/subtracters (FAs), multipliers (FMs), and wordblocks (WBs). The wordblocks include registers and lookup tables (LUTs) which can implement fixed point operations efficiently. We employ common subgraph extraction to determine the best mix of blocks within an FPU and study the area, speed and utilization tradeoff over a set of floating point benchmark circuits. We then explore the system impact of FPU density and flexibility in terms of area, speed, and routing resources. Finally, we derive an optimized coarse-grained FPU by considering both architectural and system-level issues. This proposed methodology can be used to evaluate a variety of FPU architecture optimizations. The results for the selected FPU architecture optimization show that although high density FPUs are slower, they have the advantages of improved area, area-delay product, and throughput. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dual-Layer Adaptive Error Control for Network-on-Chip Links

    Page(s): 1304 - 1317
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2073 KB) |  | HTML iconHTML  

    In this work, we present a new error control method to improve the energy efficiency and reliability of network-on-chip (NoC) links. The proposed method combines the error control coding (ECC) capabilities of the NoC's datalink and network layers to dynamically adjust the error control strength in variable noise conditions. Network-layer ECC is used in low noise conditions and error control strength is enhanced by adding datalink-layer ECC in high noise regions. To switch between the two ECC modes at runtime without interrupting normal operation, we propose a dual-layer cooperative error control protocol and its hardware-efficient implementation using the concept of product codes. Theoretical analyses of residual error rate and performance show the proposed method outperforms previous single-layer fixed and adaptive error control schemes. Compared to previous solutions, the proposed method reduces residual packet error rate by up to four orders of magnitude, achieves up to 72% energy reduction and improves average latency by up to 64%. The energy and latency reduction benefits are maintained as the routing path length and packet size increase, at the cost of a moderate increase in area overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel Interpolation and Polynomial Selection for Low-Complexity Chase Soft-Decision Reed-Solomon Decoding

    Page(s): 1318 - 1322
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (339 KB) |  | HTML iconHTML  

    Algebraic soft-decision decoding (ASD) of Reed-Solomon (RS) codes can achieve substantial coding gain with polynomial complexity. Particularly, the low-complexity Chase (LCC) ASD decoding has better performance-complexity tradeoff. In the LCC decoding, 2η test vectors need to be interpolated over, and a polynomial selection scheme needs to be employed to select one interpolation output to send to the rest decoding steps. The interpolation and polynomial selection can account for a significant part of the LCC decoder area, especially in the case of long RS codes and large η . In this paper, simplifications are first proposed for a low-complexity polynomial selection scheme. Then a novel interpolation scheme is developed by making use of the simplified polynomial selection. Instead of interpolating over each vector, our scheme first generates information necessary for the polynomial selection. Then only the selected vectors are interpolated over. The proposed interpolation and polynomial selection schemes can lead to 162% higher efficiency in terms of throughput-over-area ratio for an example LCC decoder with η = 8 for a (458, 410) RS code over GF(210). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multi-Resolution Fast Filter Bank for Spectrum Sensing in Military Radio Receivers

    Page(s): 1323 - 1327
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (318 KB) |  | HTML iconHTML  

    In this paper, we propose a multi-resolution filter bank (MRFB)-based on the fast filter bank design for multiple resolution spectrum sensing in military radio receivers. The proposed method overcomes the constraint of fixed sensing resolution in spectrum sensors based on conventional discrete Fourier transform filter banks (DFTFB). The flexibility in realizing multiple sensing resolution spectrum sensor is achieved by suitably designing the prototype filter and efficiently selecting the varying resolution subbands without hardware re-implementation. Design examples show that the sensing performance of proposed MRFB is comparable to that of conventional fixed resolution DFTFB. The complexity comparison shows that the proposed MRFB architecture has a gate count reduction of 36.5% over the DFTFB. The proposed MRFB architecture achieves an average power reduction of 20.8% over DFTFB. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Investigating the Impact of Logic and Circuit Implementation on Full Adder Performance

    Page(s): 1327 - 1331
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (346 KB) |  | HTML iconHTML  

    This paper presents the design and characterization of 12 full-adder circuits in the IBM 90-nm process. These include three new full-adder circuits using the recently proposed split-path data driven dynamic logic. Based on the logic function realized, the adders were characterized for performance and power consumption when operated under various supply voltages and fan-out loads. The adders were then further deployed in a 32 bit ripple carry adder and 8×4 multiplier to evaluate the impact of sum and carry propagation delays on the performance, power of these systems. Performance characterization of the adder circuits in the presence of process and voltage variations was also performed through Monte Carlo simulations. Besides analyzing and comparing circuit performance, the possible impact of the choice of logic function has also been underlined in this study. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Temperature Characteristics and Analysis of Monolithic Microwave CMOS Distributed Oscillators With {G}_{m} -Varied Gain Cells and Folded Coplanar Interconnects

    Page(s): 1332 - 1336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (591 KB) |  | HTML iconHTML  

    The performance of a novel Monolithic Microwave CMOS Distributed Oscillator is reported over a temperature range of -25°C to 75 °C for the first time, along with an analysis of its design characteristics and its temperature stability. The oscillator is stable over the entire temperature range of 100°C. The monolithic distributed oscillator (DO) is designed and fabricated in an industry standard 0.18 μ m CMOS process, using an n-FET-based traveling wave amplifier (TWA), coplanar waveguides (CPW), and a new coplanar interconnect structure called a 'folded CPW'. The measured loss of the “folded CPW” is 1.259 dB at 10 GHz. The distributed oscillator uses a novel architecture of Gm-varied gain cells and operates at a bias of 1.8 V. The measured oscillation frequency is 11.7 GHz with 6.1 dBm output power and the measured phase noise is -116.02 dBc/Hz at 1 MHz offset, which represent the best reported power and one of the best phase noise results for silicon DOs with temperature stability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Comparative Study of 20-Gb/s NRZ and Duobinary Signaling Using Statistical Analysis

    Page(s): 1336 - 1341
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB) |  | HTML iconHTML  

    A statistical analysis technique for estimating bit-error rate (BER) and eye opening is presented for both non-return-to-zero (NRZ) and duobinary signaling schemes. This method enables fast and accurate BER distribution simulation of a serial link transceiver including channel and circuit imperfections, such as finite pulse rise/fall time, duty cycle variation and both receiver and transmitter forwarded-clock jitter. A comparison between 20-Gb/s NRZ and duobinary transmitters using this simulator shows that while duobinary transmission relaxes the requirements on the receiver equalizer due to the lower Nyquist frequency of the transmitted data, significant eye-opening and BER degradation can arise from clock non-idealities. The proposed statistical analysis is verified against traditional time-domain, transient eye-diagram simulations at 20-Gb/s, transmitted through measured s-parameter channel characteristics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • UDSM Trends Comparison: From Technology Roadmap to UltraSparc Niagara2

    Page(s): 1341 - 1346
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (408 KB) |  | HTML iconHTML  

    The increased leakage, yield inefficiency, process, power supply, and temperature variations have significant aftereffects on the performance of complex VLSI architectures especially if mapped on ultra deep sub micrometer (UDSM) technologies. In this paper we assess the technology trend based on three industrial technologies (90, 65, and 45 nm) using a state of the art processor as benchmark: The UltraSparc Niagara 2 from SUN Microsystem. We analyze frequency, dynamic, and static power and area after synthesis varying power supply voltage and temperature. We then compare these exhaustive analyses of system level performance as a function of technology to ITRS device level estimations. The results suggest that this prediction can be of help when addressing both the technological scaling and the variability scenario of the selected technology. We believe that correctly predicting specific values on performance variations when realistic conditions and technologies are changed could provide a valuable information for the architect. Our analysis advises the designer on the effective applicability of the ITRS trends to system performance, but also pinpoints that a reliable system level prediction should better take into account the design complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unified Architecture for Reed-Solomon Decoder Combined With Burst-Error Correction

    Page(s): 1346 - 1350
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (470 KB) |  | HTML iconHTML  

    Reed-Solomon (RS) codes are widely used as forward correction codes (FEC) in digital communication and storage systems. Correcting random errors of RS codes have been extensively studied in both academia and industry. However, for burst-error correction, the research is still quite limited due to its ultra high computation complexity. In this brief, starting from a recent theoretical work, a low-complexity reformulated inversionless burst-error correcting (RiBC) algorithm is developed for practical applications. Then, based on the proposed algorithm, a unified VLSI architecture that is capable of correcting burst errors, as well as random errors and erasures, is firstly presented for multi-mode decoding requirements. This new architecture is denoted as unified hybrid decoding (UHD) architecture. It will be shown that, being the first RS decoder owning enhanced burst-error correcting capability, it can achieve significantly improved error correcting capability than traditional hard-decision decoding (HDD) design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Return Data Interleaving for Multi-Channel Embedded CMPs Systems

    Page(s): 1351 - 1354
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (544 KB) |  | HTML iconHTML  

    Using multi-channel memory subsystems is an efficient way of satisfying high volume memory requests from CMPs. At the same time, the imbalance between memory bandwidth and bus performance opens up new possibility of optimization before they are sent to bus. This paper presents a new memory controller design for embedded CMPs systems when the return data from the return buffer is sent back to bus. Our scheduling policy, called return data interleaving (RDI) interleaves the return data of each request in a round robin manner. Further, for each request, it sends the critical word first. To evaluate our technique, we model an Intel XScale-based CMPs using M5 simulator for CMPs simulation and DRAMsim for memory subsystem simulation and examine the performance of MiBench and SPEC2000 benchmarks. Simulation results show that for memory-bound benchmarks running on the CMPs systems with the number of cores from 6 to 16, RDI can improve the execution time by average 11% and up to 16.9%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems information for authors

    Page(s): 1355
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu