By Topic

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on

Issue 8 • Date Aug. 2012

Filter Results

Displaying Results 1 - 25 of 25
  • Table of Contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (151 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • A Highly-Integrated 3–8 GHz Ultra-Wideband RF Transmitter With Digital-Assisted Carrier Leakage Calibration and Automatic Transmit Power Control

    Page(s): 1357 - 1367
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1395 KB) |  | HTML iconHTML  

    This paper presents a highly-integrated 3-8 GHz ultra-wideband (UWB) RF transmitter implemented in a 1.2 V 0.13m CMOS technology. The transmitter integrates an analog baseband (PGAs and filter), an IQ modulator, a variable gain amplifier (VGA), a differential-to-single-ended amplifier, a power amplifier, as well as a transmitted signal strength indicator (TSSI). The RF VGA and the TSSI cooperate to perform an automatic transmit power control. The IQ modulator and an off-chip digital circuit implemented by a FPGA perform a carrier leakage calibration. Measured maximum output power and OP1 dB are -5 and +1.5 dBm, respectively. Measured worst carrier leakage suppression is 21 dB (before calibration) at 6.6 GHz. Measured worst sideband suppression is 29.1 dB at 7.6 GHz. The high linearity and accurate IQ modulation lead to an error vector magnitude (EVM) of -28 dB under the data rate of 480 Mb/s in WiMedia Mode 1. The entire transmitter consumes 66 mW under supply voltage of 1.2 V. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Highly-Digital VCO-Based Analog-to-Digital Converter Using Phase Interpolator and Digital Calibration

    Page(s): 1368 - 1372
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1052 KB) |  | HTML iconHTML  

    A first-order time-based ΔΣ modulator using voltage-controlled oscillator (VCO) is presented. The proposed modulator employs phase interpolation technique to enhance the time resolution of the VCO and digital calibration to improve the linearity of the VCO tuning curve. The proposed modulator, implemented in 0.13 μm CMOS process, achieves 55 dB peak signal-to-noise ratio and 52.5 dB peak signal-to-noise-and-distortion ratio at 600 MHz sampling frequency for 20 MHz input bandwidth and consumes 14.3 mW. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Jitter Analysis of Polyphase Filter-Based Multiphase Clock in Frequency Multiplier

    Page(s): 1373 - 1382
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2038 KB) |  | HTML iconHTML  

    This paper presents the random jitter and deterministic jitter analysis on the proposed polyphase filter (PPF)-based multiphase clock in frequency multiplier with reference to the benchmark jitter analysis of the multiphase clock counterpart using conventional delay-locked loop (DLL) approach. The analysis results have shown that the jitter performance of PPF-based design is better than that of DLL-based design. Jitter measurement on the PPF-based multiphase clock chip has been conducted. The overall comparison has shown excellent agreement among prediction results from theory and realistic simulation results from a combination of all the transistor-level circuits in conjunction with the proposed behavioral model. The comparison results confirm the proposed time domain jitter analysis method. The results have shown that not only do the PPF-based demonstrate the improved jitter performance, the deterministic jitter performance is also independent of components mismatch. Finally, the practical measurement results of the fabricated chip identifies the practical pitfalls of the proposed PPF-based DLL design, suggesting further jitter reduction and demonstrating the potential low-jitter design using the PPF-based DLL. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fourier Series Approximation for Max Operation in Non-Gaussian and Quadratic Statistical Static Timing Analysis

    Page(s): 1383 - 1391
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2322 KB) |  | HTML iconHTML  

    The most challenging problem in the current block-based statistical static timing analysis (SSTA) is how to handle the max operation efficiently and accurately. Existing SSTA techniques suffer from limited modeling capability by using a linear delay model with Gaussian distribution, or have scalability problems due to expensive operations involved to handle non-Gaussian variation sources or nonlinear delays. To overcome these limitations, we propose efficient algorithms to handle the max operation in SSTA with both quadratic delay dependency and non-Gaussian variation sources simultaneously. Based on such algorithms, we develop an SSTA flow with quadratic delay model and non-Gaussian variation sources. All the atomic operations, max and add, are calculated efficiently via either closed-form formulas or low dimension (at most 2-D) lookup tables. We prove that the complexity of our algorithm is linear in both variation sources and circuit sizes, hence our algorithm scales well for large designs. Compared to Monte Carlo simulation for non-Gaussian variation sources and nonlinear delay models, our approach predicts the mean, standard deviation and 95% percentile point with less than 2% error, and the skewness with less than 10% error. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting Process Variability in Voltage/Frequency Control

    Page(s): 1392 - 1404
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1014 KB) |  | HTML iconHTML  

    Fine-grained dynamic voltage/frequency scaling (DVFS) is an important tool in managing the balance between power and performance in chip-multiprocessors. Although manufacturing process variations are giving rise to significant core-to-core variations in power and performance, traditional DVFS controllers are unaware of these variations. Exploiting the different power profiles of the cores can significantly improve energy efficiency. Process variations do not significantly affect dynamic power, so less-leaky processing units are more energy-efficient than their leakier counterparts at a given supply voltage and frequency. Taking advantage of this observation, three existing DVFS control algorithms are modified to shift work from inefficient, leaky processing units to efficient, less leaky ones, maintaining performance while reducing total power consumption. This work-shifting is carried out both between dies in a given speed bin and between voltage/frequency islands on a given die. The gains enabled by incorporating variability-awareness into the three DVFS algorithms are demonstrated on both multithreaded and multiprogrammed workloads. For a baseline 16-core design with per-core voltage/frequency islands (VFIs) and a 4×4 mesh on-chip network, the aggregate power per squared throughput (power/throughput2 or P/T2) over all fabricated dies is reduced by 9.2%, 5.7%, and 7.7% for the three controllers. Chip multiprocessor designs using other VFI granularities and network topologies are also examined. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Analysis of a Delay Sensor Applicable to Process/Environmental Variations and Aging Measurements

    Page(s): 1405 - 1418
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1912 KB) |  | HTML iconHTML  

    With technology scaling, the deviation between predicted path delay using simulation and actual path delay on silicon increases due to process variation and aging. Hence, on-chip measurement architectures are now widely used due to their higher accuracy and lower cost compared to using external expensive measurement devices. In this paper, a novel path-delay measurement architecture called path-based ring oscillator (Path-RO) which takes into account variations is proposed. Path-RO can perform accurate on-chip path-delay measurement with nearly no impact on functional data path. At the same time, process variations will not affect the measurement accuracy. The accuracy degradation due to aging is also negligible, which enables Path-RO to monitor path delay throughout aging process. This delay sensor is perfectly suitable for fast and accurate speed binning as well. By targeting speed paths, the speed of chip can be binned efficiently even in presence of clock skew. Various simulation results collected by Path-RO inserted into b19 circuit demonstrate its high accuracy and efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource-Efficient FPGA Architecture and Implementation of Hough Transform

    Page(s): 1419 - 1428
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (590 KB) |  | HTML iconHTML  

    Hough transform is widely used for detecting straight lines in an image, but it involves huge computations. For embedded application, field-programmable gate arrays are one of the most used hardware accelerators to achieve real-time implementation of Hough transform. In this paper, we present a resource-efficient architecture and implementation of Hough transform on an FPGA. The incrementing property of Hough transform is described and used to reduce the resource requirement. In order to facilitate parallelism, we divide the image into blocks and apply the incrementing property to pixels within a block and between blocks. Moreover, the locality of Hough transform is analyzed to reduce the memory access. The proposed architecture is implement on an Altera EP2S180F1508C3 device and can operate at a maximum frequency of 200 MHz. It could compute the Hough transform of 512 × 512 test images with 180 orientations in 2.07-3.16 ms without using many FPGA resources (i.e., one could achieve the performance by adopting a low-cost low-end FPGA). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Portable, Flexible, and Scalable Soft Vector Processors

    Page(s): 1429 - 1442
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1512 KB) |  | HTML iconHTML  

    Field-programmable gate arrays (FPGAs) are increasingly used to implement embedded digital systems, however, the hardware design necessary to do so is time-consuming and tedious. The amount of hardware design can be reduced by employing a microprocessor for less-critical computation in the system. Often this microprocessor is implemented using the FPGA reprogrammable fabric as a soft processor which presently have simple architectures and moderate performance. Our goal is to scale the performance of existing soft processors hence expanding their suitability to more critical computation. To this end we propose extending soft processors with vector extensions to exploit the abundant data parallelism found in many embedded kernels. Such a soft vector processor can execute these kernels much faster than a single-core hence reducing the need for hardware implementations. We observe this improved execution speed through experimentation with vector extended soft processor architecture (VESPA) which is designed, implemented, and evaluated on real FPGA hardware. VESPA is shown to effectively scale performance up to 32 lanes, while providing substantial architectural flexibility to create a fine-grained design space. With these characteristics, and portability across FPGA devices, soft vector processors can provide exact-fit architectures which can efficiently and more easily implement data parallel workloads over custom FPGA hardware design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ground Switching Load Modulation With Ground Isolation for Passive HF RFID Transponders

    Page(s): 1443 - 1452
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (845 KB) |  | HTML iconHTML  

    This paper presents a ground switching load modulation scheme for passive HF RFID transponders. The proposed modulation scheme allows HF transponders to communicate with the reader in the strong field with a higher modulation index using simple RF clamps, compared with conventional resistive and capacitive load modulation schemes. Also presented is a ground-isolated voltage doubler rectifier to eliminate effects of parasitic diodes in CMOS processes, thereby increasing the communication distance. A 13.56-MHz transponder prototype was implemented in a standard CMOS 0.35-process. Measurement results show that the proposed transponder achieves over 35-mV modulation depth for the magnetic field strength from 0.15 to 7.5 A/m. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient FPGA Implementations of Point Multiplication on Binary Edwards and Generalized Hessian Curves Using Gaussian Normal Basis

    Page(s): 1453 - 1466
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (897 KB) |  | HTML iconHTML  

    Efficient implementation of point multiplication is crucial for elliptic curve cryptographic systems. This paper presents the implementation results of an elliptic curve crypto-processor over binary fields GF(2m) on binary Edwards and generalized Hessian curves using Gaussian normal basis (GNB). We demonstrate how parallelization in higher levels can be performed by full resource utilization of computing point addition and point-doubling formulas for both binary Edwards and generalized Hessian curves. Then, we employ the ω-coordinate differential formulations for computing point multiplication. Using a lookup-table (LUT)-based pipelined and efficient digit-level GNB multiplier, we evaluate the LUT complexity and time-area tradeoffs of the proposed crypto-processor on an FPGA. We also compare the implementation results of point multiplication on these curves with the ones on the traditional binary generic curve. To the best of the authors' knowledge, this is the first FPGA implementation of point multiplication on binary Edwards and generalized Hessian curves represented by ω-coordinates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mixed FBB/RBB: A Novel Low-Leakage Technique for FinFET Forced Stacks

    Page(s): 1467 - 1472
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (645 KB) |  | HTML iconHTML  

    In this paper, a novel technique to reduce the leakage current of FinFET forced stacks under a given delay constraint is presented. This technique takes advantage of the unique feature of four-terminal FinFETs allowing different transistors to have separately tunable back bias voltages. In this work, a reverse back bias voltage is applied to one of the two stacked transistors to reduce its leakage at the cost of a delay penalty, whereas a forward back bias voltage is applied to the other one to compensate this delay degradation. The technique is assessed by means of mixed device-circuit simulations for FinFETs that are representative of 40- and 27-nm technology generations. Results show that a leakage reduction by up to 50× can be achieved as compared with traditional transistor stacks, while keeping same speed, dynamic energy, and sensitivity to process/voltage/temperature variations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Harvesting-Aware Power Management for Real-Time Systems With Renewable Energy

    Page(s): 1473 - 1486
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2307 KB) |  | HTML iconHTML  

    In this paper, we propose a harvesting-aware power management algorithm that targets at achieving good energy efficiency and system performance in energy harvesting real-time systems. The proposed algorithm utilizes static and adaptive scheduling techniques combined with dynamic voltage and frequency selection to achieve good system performance under timing and energy constraints. In our approach, we simplify the scheduling and optimization problem by separating constraints in timing and energy domains. The proposed algorithm achieves improved system performance by exploiting task slack with dynamic voltage and frequency selection and minimizing the waste on harvested energy. Experimental results show that the proposed algorithm improves the system performance in deadline miss rate and the minimum storage capacity requirement for zero deadline miss rate. Comparing to the existing algorithms, the proposed algorithm achieves better performance in terms of the deadline miss rate and the minimum storage capacity under various settings of workloads and harvested energy profiles. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Buried Silicon-Germanium pMOSFETs: Experimental Analysis in VLSI Logic Circuits Under Aggressive Voltage Scaling

    Page(s): 1487 - 1495
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1694 KB) |  | HTML iconHTML  

    In this paper, the potential of Silicon-Germanium (SiGe) technology for VLSI logic applications is investigated from a circuit perspective for the first time. The study is based on experimental measurements on 45-nm SiGe pMOSFETs with a high- κ/metal gate stack, as well as on 45-nm Si pMOSFETs with identical gate stack for comparison. In the reference SiGe technology, an innovative technological solution is adopted that limits the SiGe material only to the channel region. The resulting SiGe device merges the higher speed of the Ge technology with the lower leakage of the Si technology. Appropriate circuit- and system-level metrics are introduced to identify the advantages offered by SiGe technology in VLSI circuits. Analysis is performed in the context of next-generation VLSI circuits that fully exploit circuit- and system-level techniques to improve the energy efficiency through aggressive voltage scaling, other than low-leakage techniques. Analysis shows that the SiGe technology has more efficient leakage-delay and dynamic energy-delay trade-offs at nominal supply, compared to Si technology. Moreover, it is shown that the traditional analysis performed at nominal supply actually underestimates the benefits of SiGe pMOSFETs, since the speed advantage of SiGe VLSI circuits is further emphasized at low voltages. This demonstrates that SiGe VLSI circuits benefit from aggressive voltage scaling significantly more than Si circuits, thereby making SiGe devices a very promising alternative to Si transistors in next-generation VLSI systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • UNISM: Unified Scheduling and Mapping for General Networks on Chip

    Page(s): 1496 - 1509
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2466 KB) |  | HTML iconHTML  

    Task scheduling and core mapping have a significant impact on the overall performance of network on chip (NOC). In this paper, a unified task scheduling and core mapping algorithm called UNISM is proposed for different NOC architectures including regular mesh, irregular mesh and custom networks. First, a unified model combining scheduling and mapping is introduced using mixed integer linear programming (MILP). Then, a novel graph model is proposed to consider the network irregularity and estimate communication energy and latency, since the number of network hops is not accurate enough for irregular mesh and custom networks. To make the MILP-based UNISM scalable, a heuristic is employed to speed up our method. Compared with two previous state-of-the-art works, experimental results show that more than 15% and 11.5% improvement on the execution time is achieved with similar energy consumption on average for regular mesh NOC. For irregular and custom NOC, the improvement is 27.3% and 14.5% on the execution time with 24.3% and 18.5% lower energy. Moreover, our method is scalable for large benchmarks in terms of runtime. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • LP-NUCA: Networks-in-Cache for High-Performance Low-Power Embedded Processors

    Page(s): 1510 - 1523
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1550 KB) |  | HTML iconHTML  

    High-end embedded processors demand complex on-chip cache hierarchies satisfying several contradicting design requirements such as high-performance operation and low energy consumption. This paper introduces light-power (LP) nonuniform cache architecture (NUCA), a tiled-cache addressing both goals. LP-NUCA places a group of small and low-latency tiles between the L1 and the last level cache (LLC) that adapt better to the application working sets and keep most recently evicted blocks close to L1. LP-NUCA is built around three specialized “networks-in-cache,” each aimed at a separate cache operation. To prove the design feasibility, we have fully implemented LP-NUCA in a 90-nm technology. From the VLSI implementation, we observe that the proposed networks-in-cache incur minimal area, latency, and power overhead. To further reduce the energy consumption, LP-NUCA employs two network-wide techniques (miss wave stopping and sectoring) that together reduce the dynamic cache energy by 35% without degrading performance. Our evaluations also show that LP-NUCA improves performance with respect to cache hierarchies similar to those found in high-end embedded processors. Similar results have been obtained after scaling to a 32-nm technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 0.31–1 GHz Fast-Corrected Duty-Cycle Corrector With Successive Approximation Register for DDR DRAM Applications

    Page(s): 1524 - 1528
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (501 KB) |  | HTML iconHTML  

    This brief presents a duty cycle corrector (DCC) using a binary search algorithm with successive approximation register (SAR). The proposed DCC consists of a duty-cycle detector, a duty-cycle adjuster, its controller and an output buffer. In order to achieve fast duty-correction with a small die area, a SAR-controller is exploited as a duty-correction controller. The proposed DCC circuit has been implemented and fabricated in a 0.13-μm CMOS process and occupies 0.048 mm2. The measured duty-cycle error for the 50% duty-rate is below 1% (or 10 pS) within 320 pS external input duty-cycle error. The duty of output signal is corrected only with 14 cycles. This DCC operates from 312.5 MHz to 1 GHz and dissipates 3.2 mW at 1 GHz. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ZeROA: Zero Clock Skew Rotary Oscillatory Array

    Page(s): 1528 - 1532
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (211 KB) |  | HTML iconHTML  

    Resonant rotary clocking is a clocking technology for high frequency clock generation and distribution at a low power dissipation rate. It is commonly conceived that the multiple phases on the rings of the rotary oscillatory array (ROA) necessitate a non-zero clock skew operation. In this paper, the feasibility of zero clock skew synchronization with the rotary clocking technology implemented on the ROA is shown. Design automation experiments are performed to demonstrate that the zero clock skew operation can be achieved with minimal change in the performance of rotary clock operation. In particular, a marginal ±1.5% change in the tapping wirelength and a negligible 0.38% average skew mismatch are reported in experiments on R1-R5 and ISPD 2010 benchmark circuits. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fine-Grain Voltage Tuned Cache Architecture for Yield Management Under Process Variations

    Page(s): 1532 - 1536
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (415 KB) |  | HTML iconHTML  

    Process variations cause large fluctuations in performance and power consumption in the manufactured chips, which eventually results in yield losses. In this paper, to mitigate access time failures and excessive leakage in caches, we propose a novel selective wordline boosting mechanism combined with SRAM cell arrays voltage lowering. Based on our evaluation, the proposed approach recovers up to 83.1% of the yield losses. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Best-First Soft/Hard Decision Tree Searching MIMO Decoder for a 4 \times 4 64-QAM System

    Page(s): 1537 - 1541
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (817 KB) |  | HTML iconHTML  

    This paper presents the algorithm and VLSI architecture of a configurable tree-searching approach that combines the features of classical depth-first and breadth-first methods. Based on this approach, techniques to reduce complexity while providing both hard and soft outputs decoding are presented. Furthermore, a single programmable parameter allows the user to tradeoff throughput versus BER performance. The proposed multiple-input-multiple-output decoder supports a 4 × 4 64-QAM system and was synthesized with 65-nm CMOS technology at 333 MHz clock frequency. For the hard output scheme the design can achieve an average throughput of 257.8 Mbps at 24 dB signal-to-noise ratio (SNR) with area equivalent to 54.2 Kgates and a power consumption of 7.26 mW. For the soft output scheme it achieves an average throughput of 83.3 Mbps across the SNR range of interest with an area equivalent to 64 Kgates and a power consumption of 11.5 mW. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Area-Time Efficient Scaling-Free CORDIC Using Generalized Micro-Rotation Selection

    Page(s): 1542 - 1546
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (334 KB) |  | HTML iconHTML  

    This paper presents an area-time efficient CORDIC algorithm that completely eliminates the scale-factor. By suitable selection of the order of approximation of Taylor series the proposed CORDIC circuit meets the accuracy requirement, and attains the desired range of convergence. Besides we have proposed an algorithm to redefine the elementary angles for reducing the number of CORDIC iterations. A generalized micro-rotation selection technique based on high speed most-significant-1-detection obviates the complex search algorithms for identifying the micro-rotations. The proposed CORDIC processor provides the flexibility to manipulate the number of iterations depending on the accuracy, area and latency requirements. Compared to the existing recursive architectures the proposed one has 17% lower slice-delay product on Xilinx Spartan XC2S200E device. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-Swing Differential Conditional Capturing Flip-Flop for LC Resonant Clock Distribution Networks

    Page(s): 1547 - 1551
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1170 KB) |  | HTML iconHTML  

    In this paper we introduce a new flip-flop for use in a low- swing LC resonant clocking scheme. The proposed low-swing differential conditional capturing flip-flop (LS-DCCFF) operates with a low-swing sinusoidal clock through the utilization of reduced swing inverters at the clock port. The functionality of the proposed flip-flop was verified at extreme corners through simulations with parasitics extracted from layout. The LS-DCCFF enables 6.5% reduction in power compared to the full- swing flip-flop with 19% area overhead. In addition, a frequency dependent delay associated with driving pulsed flip-flops with a low-swing sinusoidal clock has been characterized. The LS-DCCFF has 870 ps longer data to output delay as compared to the full-swing flip-flop at the same setup time for a 100 MHz sinusoidal clock. The functionality of the proposed flip-flop was tested and verified by using the LS-DCCFF in a dual-mode multiply and accumulate (MAC) unit fabricated in TSMC 90-nm CMOS technology. Low-swing resonant clocking achieved around 5.8% reduction in total power with 5.7% area overhead for the MAC. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems information for authors

    Page(s): 1552
    Save to Project icon | Request Permissions | PDF file iconPDF (93 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems society information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (27 KB)  
    Freely Available from IEEE

Aims & Scope

Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing, and systems applications. Generation of specifications, design, and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor, and process levels.

To address this critical area through a common forum, the IEEE Transactions on VLSI Systems was founded. The editorial board, consisting of international experts, invites original papers which emphasize the novel system integration aspects of microelectronic systems, including interactions among system design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and system level qualification. Thus, the coverage of this Transactions focuses on VLSI/ULSI microelectronic system integration.

Topics of special interest include, but are not strictly limited to, the following: • System Specification, Design and Partitioning, • System-level Test, • Reliable VLSI/ULSI Systems, • High Performance Computing and Communication Systems, • Wafer Scale Integration and Multichip Modules (MCMs), • High-Speed Interconnects in Microelectronic Systems, • VLSI/ULSI Neural Networks and Their Applications, • Adaptive Computing Systems with FPGA components, • Mixed Analog/Digital Systems, • Cost, Performance Tradeoffs of VLSI/ULSI Systems, • Adaptive Computing Using Reconfigurable Components (FPGAs) 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Yehea Ismail
CND Director
American University of Cairo and Zewail City of Science and Technology
New Cairo, Egypt
y.ismail@aucegypt.edu