By Topic

Circuits and Systems, 2006 IEEE North-East Workshop on

Date 18-21 June 2006

Filter Results

Displaying Results 1 - 25 of 87
  • Automatic Link Editor Generation for Embedded CPU Cores

    Publication Year: 2006 , Page(s): 121 - 124
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (215 KB) |  | HTML iconHTML  

    SoC design space exploration requires code generation for several CPU core alternatives. However, an embedded software code generation toolkit cannot be developed from scratch for every target CPU under exploration. Nor can it always be reused from standard packages, especially when the CPU core is an ASIP. That's why automatically retargetable tools are required. This paper describes a retargetable technique for link editor automatic generation from a formal description of the target CPU core. The implementation of the technique relies on the well-known GNU binutils package. To make it retargetable, the key is to reuse the architecture-independent libraries and automatically generate the architecture-dependent ones. The technique's correctness and robustness were verified for three target CPUs (MIPS, SPARC and PowerPC) running programs from the benchmark MiBench. For experimental validation, we have successfully compared the executable files produced by the generated tools to those produced by conventional tools from the GNU binutils package View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architectures and Design Methodologies for Very Low Power and Power Effective A/D Converters

    Publication Year: 2006 , Page(s): 77 - 80
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5144 KB)  

    The main goal of portable applications is obtaining data conversion with very low power consumption while maintaining acceptable resolution and linearity. This paper presents various design methods for achieving figure of merit of 1 pJ-conv or less, usable other than sigma-delta architectures in pipeline and successive approximation algorithms. Results of state-of-the art designs are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ultra-Low Voltage Nano-Scale Embedded RAMs

    Publication Year: 2006 , Page(s): 245 - 248
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (314 KB) |  | HTML iconHTML  

    Ultra-low voltage nano-scale embedded RAMs are described, focusing on RAM cells and peripheral circuits. First, challenges and trends of low-voltage RAM cells are discussed in terms of signal charge, signal voltage, and noise. In addition to ECC, power-supply controls to widen the voltage margin of cells, and a fully-depleted SOI to reduce VT -variation are also investigated. Then peripheral circuits are explained in terms of leakage reduction and compensation for speed variations. Based on this, it is concluded that ultra-low voltage RAMs cannot be achieved without reducing speed variations caused by variations in VT, thus resulting in a further need for compensation circuits and new devices with reduced VT variation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Realization of Large Integer Multipliers and Squarers

    Publication Year: 2006 , Page(s): 237 - 240
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    This paper presents an efficient design methodology and a systematic approach for the implementation of multiplication and squaring function for large integers using small-size embedded multipliers. A general architecture of the multiplier and squarer is proposed as well as a set of equations is derived to aid in the realization. The inputs of the multiplier and squarer are split into several segments leading to an efficient utilization of the small-size embedded multipliers and a reduced number of required addition operations. Various benchmarks were tested for different segments ranging from 2 to 4 targeting Xilinx Spartan-3 FPGA. The synthesis was performed with the aid of the Xilinx ISE 7.1 XST tool. Our approach was compared with the traditional technique using the same tool. The results illustrate that our design approach is very efficient in terms of both timing and area saving. The combinational delay is reduced by an average of 6.1% for the multiplier and 15.5% for the squarer. The area saving, in terms of number of 4-input LUTs, is about 8.3% for the multiplier and 50% for the squarer. In the case of the multiplier, both the approaches use the same number of embedded multipliers. For the squarer, our proposed approach has reduced the number of required embedded multipliers by an average of 30.5% compared to the traditional technique View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved Direct Stability Analysis of High-Order Continuous-Time Sigma-Delta Modulators

    Publication Year: 2006 , Page(s): 73 - 76
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (937 KB) |  | HTML iconHTML  

    Direct stability analysis for continuous-time (CT) sigma-delta modulators (SDM) in s-domain requires an accurate system model to ensure the reliability of the analysis. This paper presents an extended s-domain stability model for high-order single-bit CT SDMs including an improved quantizer model and an added delay element in the feedback path to represent the nonzero excess loop delay. An algorithm is also presented to generate the root-locus diagram of CT SDM systems with time delay elements. As an example stability of a class of third-order CT SDMs is analyzed and the stability boundary of the loop filter parameters is derived. Extensive time-domain simulations are conducted to verify the method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 5.9 GHZ Inductor-Less Low Noise Amplifier

    Publication Year: 2006 , Page(s): 17
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1153 KB) |  | HTML iconHTML  

    This paper presents a novel way to design an inductor-less low noise amplifier. The LNA created uses an artificial tank (AT) created from a damped ring oscillator (RO) to replace the LC-tank normally used in tuned circuits. The measured results show the LNA has an 18 dB voltage gain, 2.3 dB noise figure, an IP3 of -11 dBm, a 1 dB compression point of -19 dBm, and a power consumption of 4.85 mW. The main advantage of using the AT is to minimize the silicon area and to operate at a high frequency of 5.9 GHz. The prototype, made using CMOS 0.18-micron technology, occupied one-fiftieth of the silicon area generally used by circuits of this type View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving a 3 Data-Source Diagnostic Method

    Publication Year: 2006 , Page(s): 149 - 152
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (227 KB) |  | HTML iconHTML  

    In this paper, we present improvements to a diagnosis method for bridging faults combining three different data sources. The first data source is a set of IDDQ measurements used to identify the most probable fault type. The second source is a list of parasitic capacitances extracted from layout and used to create a list of realistic potential bridging fault sites. The third source is logical faults detectable at the primary outputs (including scan flip flops), used to limit the number of suspected gates. Combining these data significantly reduces the number of potential fault sites to consider in the diagnosis process. The modifications proposed in this paper allow the technique to be even more suitable for very large devices. Results obtained with different circuits are provided View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Architecture of Adiabatic Reversible Logic Gates

    Publication Year: 2006 , Page(s): 233 - 236
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (974 KB) |  | HTML iconHTML  

    This paper presents a new architecture of reversible Toffoli family gates based on the concept of adiabatic circuits. In particular, applicability of reversible energy recovery logic (RERL) is investigated in detail. Such an approach presents plausible reversible adiabatic logic constructed in CMOS technology, creating an alternative approach to reduction in power consumption. Simulations indicate that adiabatic implementation of reversible logic circuits in low-speed operation consumes much less energy than the complementary static CMOS circuits View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Design of a Double Precision Logarithmic Number System Arithmetic Unit

    Publication Year: 2006 , Page(s): 241 - 244
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB) |  | HTML iconHTML  

    This paper investigates the integration of a 64-bit LNS arithmetic unit into a conventional microprocessor. The goals are to devise an LNS unit that can be faster than an FPU for a broad range of applications, and to minimize the added hardware. Two ways of implementing the logarithmic sum and difference functions are studied. One way uses higher-order Taylor series implemented by look-up tables and interpolation, while the other is based on a CORDIC engine. It is shown that a look-up table based implementation is fairly competitive to a floating-point unit in terms of clock rate, overall latency and repeat rate, at the expense of some cache pressure, while the CORDIC-based implementation is fast, has a repeat rate of one clock cycle, and supports complex operations but at the cost of a higher gate count View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Building Heterogeneous Functional Prototypes Using Articulated Interfaces

    Publication Year: 2006 , Page(s): 137 - 140
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (358 KB) |  | HTML iconHTML  

    A functional prototype is an executable specification of an integrated system that can be used for early validation and verification. This prototype is assembled and verified using a variety of languages at different levels of abstraction. In order to preserve the integrity and executability of this heterogeneous specification, an articulated interface mechanism is presented. The appropriate decoupling of the interface provides an efficient and reusable communication component that can be used for both design refinement and verification. Experimentation shows that different parts of the model can be refined individually and exported to a target platform without compromising the prototype integrity. A verification environment is also proposed to perform unit verification in a heterogeneous environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SoC memory optimization using loop transformations

    Publication Year: 2006 , Page(s): 189 - 192
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (126 KB) |  | HTML iconHTML  

    In today's embedded systems, the memory hierarchy is rapidly becoming a major factor in terms of power, performance and area. This is especially true for embedded multimedia applications using temporary multi-dimensional arrays that are typically used to store intermediate results during multimedia processing. In this paper, we introduce a new buffer allocation method to replace these temporary arrays and we combine it with loop fusion and tiling. The simple and effective method we present simultaneously applies tiling with fusion to a set of loop nests. Then, it replaces temporary arrays with smaller buffers containing the useful data. These new techniques allow to optimize memory space and reduce the number of cache misses. Our buffer allocation method is implemented in the PIPS compiler and the experiments are made on the StepNP simulator. They show that our technique yields a significant reduction in the number of data cache misses (on average, the data cache miss ratio is decreased by 14.3%) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Generation of Embedded Systems with .NET Framework Based Tools

    Publication Year: 2006 , Page(s): 165 - 168
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (157 KB) |  | HTML iconHTML  

    The typical embedded system design flow has different shortcomings mainly because of the distinction between the hardware and software flows and the need to manually refine high-level specifications to low level descriptions to be implemented. We propose a methodology based on the .NET framework, where a high abstraction level specification is automatically refined to custom hardware block and a software executable. We demonstrate this methodology on an image processing application. It targets a FPGA-based system, composed of a soft-core Microblaze processor for the software, and reconfigurable logic for the hardware View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low Dead Time, Multi-hit FPGA-Based Time-to-Digital Converter

    Publication Year: 2006 , Page(s): 29 - 32
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (298 KB) |  | HTML iconHTML  

    This paper presents improvements on a novel FPGA-based multi-hit time-to-digital converter (TDC) to measure time intervals with a resolution of 100ps and a variable dynamic range controlled by a binary coarse counter. We use a matrix topology to provide a two-level resolution, aiming to minimize the overall measurement time. The conventional dead time is eliminated by the continuous detection and processing of data by two delay matrices operating in parallel. A back-resetting scheme eliminates the erroneous multi-detection of an event along matrix tap lines. The circuit was tested on a XILINX SPARTAN-3 FPGA platform View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real Time ELA De-Interlacing with the Xtensa Reconfigurable Processor

    Publication Year: 2006 , Page(s): 25 - 28
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (188 KB) |  | HTML iconHTML  

    This paper proposes optimization techniques to accelerate the enhanced edge-based line average (ELA) de-interlacing method. ELA is based on edge detection and directional interpolation as well as median filtering. The techniques are first based on low-level software optimizations to accelerate loops and arithmetic operations. Specialized hardware structures and corresponding new instructions are then defined for the Xtensa reconfigurable processor to accelerate ELA-specific operations. The combined software and hardware techniques result in a speed-up of 67x when compared to a base case. This accelerates the processing time from 25 times slower than real time to 2.7 times faster for a NTSC frame rate. A parallel processing version of ELA is also discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 3-10 GHz Low-Noise Amplifier for Ultra-Wideband Applications

    Publication Year: 2006 , Page(s): 9 - 12
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (348 KB) |  | HTML iconHTML  

    This paper presents a new methodology for the design of low noise amplifiers for ultra wideband applications. Using the unwanted effect of the gate-drain capacitor of transistors on the input impedance to our benefit, the operation of the conventional narrowband LNA is extended to provide a very good input matching from 3 GHz to 10 GHz. Using a triple-resonance circuit as the drain impedance, a relatively flat gain is also achieved over the same operation band. A power gain of 8 dB, with good input and output matching (S11< -14 dB and S22 < -14 dB) is achieved over a 3 to 10 GHz band in 0.13 mum CMOS technology View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Split H-tree Design Method for High-Performance GALS Systems

    Publication Year: 2006 , Page(s): 161 - 164
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (418 KB) |  | HTML iconHTML  

    Clocking an entire chip through a conventional H-tree clock distribution network, in deep sub-micron technologies, is becoming increasingly difficult. Various timing constraints, including skew budget due to process variations, put a limit on the fastest speed a chip can work at. In this paper, a design methodology is proposed to relax the timing constraints by a substantial factor, up to 6 times. This performance is achieved by splitting the H-tree into smaller modules and making the different modules communicate via mesochronous or asynchronous interfaces, this type of system is commonly known as globally asynchronous locally synchronous (GALS). A closed-form mathematical model for the length of interconnects, after successive splittings, is formulated. Simulation results support our analytical findings that split H-trees allow clocking chips, of a particular die size, at a higher frequency. Moreover, a comparison of different asynchronous communication mechanisms is achieved to suggest an optimum design based on the target system performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-Cost Stable a-Si:H AMOLED Display for Portable Applications

    Publication Year: 2006 , Page(s): 97 - 100
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB) |  | HTML iconHTML  

    A large sector of display market comprises portable devices including cell phones, personal organizers, PDAs, portable electronic games, etc. Important design considerations for displays employed in these applications are power consumption and cost. Hydrogenated Amorphous silicon (a-Si:H) active matrix organic light emitting diode (AMOLED) displays are promising technology for these applications. However, the a-Si:H AMOLED backplane suffers from the temporal instability. Although, several stable driving schemes have been proposed, they suffer from high implementation cost due to extra driving circuitry and high power consumption due to additional operating cycles. This paper presents a new driving scheme that provides a stable AMOLED display despite the aging effects in the a-Si:H thin film transistors and OLED, without increasing the driving complexity View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parasitic-aware Delay Optimization for Multi-GHz Static CMOS Ring Oscillators

    Publication Year: 2006 , Page(s): 101 - 104
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (243 KB) |  | HTML iconHTML  

    A method for optimizing the oscillation frequency of a multi-GHz ring oscillator composed of static CMOS inverters is derived. The derivation is based on the propagation delay of each inverter stage, and it includes the effects of interconnect parasitics and gate resistance. The method was verified with a 90-nm SOI CMOS process through layout extraction and simulation for predicting the underlying factors in delay as the inverter's devices were scaled View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Miniaturization of a Piezo-Actuation System Embedded in an Instrumented Autonomous Robot

    Publication Year: 2006 , Page(s): 261 - 264
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (251 KB) |  | HTML iconHTML  

    The paper introduces a miniaturization approach for the piezo-actuation and related systems embedded in a new version of a miniature autonomous robot named NanoWalker for operations at the molecular scale. Since the throughput of such platform is determined in great part not only by the number of robots operating simultaneously but also by the density of robots, a higher level of miniaturization would typically translate into a higher number of operations performed per second. First estimations suggest that the initial overall dimensions of 30 mm3 3 of the preceding version of the NanoWalker can be reduced to less than 10 mm3 for the future prototype of the robot View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Open-Loop Analysis of Cascode Compensation

    Publication Year: 2006 , Page(s): 81 - 84
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (262 KB) |  | HTML iconHTML  

    This paper presents a new open-loop analysis for cascode compensation scheme which is often used in high speed two-stage CMOS operational amplifiers (opamps). In the proposed analysis, the effect of zeros in the transfer function is also considered. Analytical approaches in this paper show that cascode compensation is more power efficient than conventional Miller compensation especially for large capacitive loads. Circuit-level simulation based on the derived equations is performed to see the accuracy of derived equations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Polynomial Based Predistortion for Solid State Power Amplifier Nonlinearity Compensation

    Publication Year: 2006 , Page(s): 181 - 184
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (137 KB) |  | HTML iconHTML  

    This paper presents a novel polynomial expansion technique for solid state power amplifier (SSPA) linearization. We present a new nonlinear compensation method for AM/AM and AM/PM conversion characteristics which is applicable in other nonlinear systems. The effect of this linearization is considered in the performance of 64- and 256-level QAM signal transmission in an additive white gaussian noise (AWGN) channel View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling and Evaluation of an Energy-Efficient Hierarchical Ring Interconnect for System-on-Chip Multiprocessors

    Publication Year: 2006 , Page(s): 201 - 204
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (233 KB) |  | HTML iconHTML  

    This paper describes the modeling and optimization of a hierarchical ring interconnect for system-on-chip multiprocessors. We have selected hierarchical rings for study because they exhibit properties which lend themselves to efficient SoC interconnects. Using our model, we are able to tune certain design parameters in order to reduce energy consumption. We also use dynamic clock throttling which efficiently reduces the energy consumption of the interconnect without adversely affecting system performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RoC: A Scalable Network on Chip Based on the Token Ring Concept

    Publication Year: 2006 , Page(s): 157
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (114 KB) |  | HTML iconHTML  

    A recent practice in the development of SoCs is the integration of interconnect networks, since integration offers significant bandwidth increases. This allows implementing multiprocessor systems that communicate more effectively than bus based architectures. This paper proposes a rotator-on-chip (RoC) architecture as a new network-on-chip based on the token ring concept. This scalable network has been integrated into a system level exploration platform for characterization. Increased performance is confirmed and improvements are proposed to decrease packet latency through the network. Results show that the RoC supports a working load of 82%, compared to 58% for the hot potato mesh network and 28% for the SPIN fat tree network View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Fast Facial Recognition Algorithm Applicable to Large Databases

    Publication Year: 2006 , Page(s): 193 - 196
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (317 KB) |  | HTML iconHTML  

    In this contribution, a transform domain two-dimensional principal component analysis algorithm employing vector quantization (TD2DPCA/VQ) is presented for facial recognition, particularly for large databases. The algorithm has attractive properties with respect to storage requirements in the training mode and the computational complexity in the testing mode. The experimental results obtained by applying the new algorithm to the ORL database confirmed the significant reduction in the storage and computational requirements while improving the excellent recognition accuracy of the spatial 2DPCA method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Unified HW/SW Interface Refinement Approach for MPSoC Design

    Publication Year: 2006 , Page(s): 185 - 188
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (475 KB) |  | HTML iconHTML  

    We introduce the service based component model as a unifying concept to specify and refine the HW/SW interface in MPSoC designs. The model allows encompassing the intricate dependencies between hardware components and low level system software in a structured, component based approach. Based on this model, we propose a method and tools to automate the refinement of abstract HW/SW interfaces using a predefined component library. The main benefit of such refinement methodology is a seamless HW/SW integration allowing efficient customization of the HW/SW interface. The approach was successfully applied to the design of an MPEG-4 video encoder View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.