Publication Year: 2017



Publication Year: 2017

Editorial

Publication Year: 2017

Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

Publication Year: 2017


Many embedded applications process large amounts of data using regular computational kernels, amenable to acceleration by specialized hardware coprocessors. To reduce the significant design effort, the dedicated hardware may be automatically generated, usually starting from the application's source or binary code. This paper presents a moduloscheduled loop accelerator capable of executing multiple... View full abstract»

EBSCam: Background Subtraction for Ubiquitous Computing

Publication Year: 2017


Background subtraction (BS) is a crucial machine vision scheme for detecting moving objects in a scene. With the advent of smart cameras, the embedded implementation of BS finds ever-increasing applications. This paper presents a new BS scheme called efficient BS for smart cameras (EBSCam). EBSCam thresholds the change in the estimated background model, which suppresses variance of the estimates, ... View full abstract»

An Efficient Component for Designing Signed Reverse Converters for a Class of RNS Moduli Sets of Composite Form $\{2^{k}, 2^{P}-1\}$

Publication Year: 2017


The application of residue number system (RNS) to digital signal processing lies in the ability to operate on signed numbers. However, the available RNS-to-binary (reverse) converters have been designed for unsigned numbers, which means that they do not produce signed outputs. Usually, some additional circuits are introduced at the output of the reverse converter to map the unsigned generated outp... View full abstract»

VLSI Extreme Learning Machine: A Design Space Exploration

Publication Year: 2017


In this paper, we describe a compact low-power high-performance hardware implementation of extreme learning machine for machine learning applications. Mismatches in current mirrors are used to perform the vector-matrix multiplication that forms the first stage of this classifier and is the most computationally intensive. Both regression and classification (on UCI data sets) are demonstrated and a ... View full abstract»

Sign-Magnitude Encoding for Efficient VLSI Realization of Decimal Multiplication

Publication Year: 2017


Decimal X × Y multiplication is a complex operation, where intermediate partial products (IPPs) are commonly selected from a set of precomputed radix-10 X multiples. Some works require only [0, 5] × X via recoding digits of Y to one-hot representation of signed digits in [-5,5]. This reduces the selection logic at the cost of one extra IPP. Two's complement signed-digit (TCSD) encodi... View full abstract»

Efficient Soft Cancelation Decoder Architectures for Polar Codes

Publication Year: 2017


The flooding belief propagation (FO-BP) and the soft-cancelation (SCAN) algorithms are the two most popular soft-output BP algorithms for the decoding of capacity-achieving polar codes. The FO-BP algorithm has high throughput at the cost of performance degradation in high signal-to-noise ratio (SNR) region or with large block length. The SCAN algorithm has much better decoding performance while su... View full abstract»

Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput Tradeoffs

Publication Year: 2017


Hybrid floating-point (FP) implementations improve software FP performance without incurring the area overhead of full hardware FP units. The proposed implementations are synthesized in 65-nm CMOS and integrated into small fixed-point processors with a RISC-like architecture. Unsigned, shift carry, and leading zero detection (USL) support is added to a processor to augment an existing instruction ... View full abstract»

A 2.5-ps Bin Size and 6.7-ps Resolution FPGA Time-to-Digital Converter Based on Delay Wrapping and Averaging

Publication Year: 2017


A high-resolution time-to-digital converter (TDC) implemented with field programmable gate array (FPGA) based on delay wrapping and averaging is presented. The fundamental idea is to pass a single clock through a series of delay elements to generate multiple reference clocks with different phases for input time quantization. Due to periodicity, those phases will be equivalently wrapped within one ... View full abstract»

Subthreshold Operation of CAAC-IGZO FPGA by Overdriving of Programmable Routing Switch and Programmable Power Switch

Publication Year: 2017


A field-programmable gate array (FPGA) using a crystalline oxide semiconductor of c-axis-aligned crystal indium-gallium-zinc oxide (CAAC-IGZO) has been developed, which is capable of subthreshold operation used for energy harvesting. To achieve subthreshold operation, the CAAC-IGZO FPGA has a structure designed as an extension of a boosting pass gate using a CAAC-IGZO FET and employs overdriving o... View full abstract»

Efficient Designs of Multiported Memory on FPGA

Publication Year: 2017


The utilization of block RAMs (BRAMs) is a critical performance factor for multiported memory designs on field-programmable gate arrays (FPGAs). Not only does the excessive demand on BRAMs block the usage of BRAMs from other parts of a design, but the complex routing between BRAMs and logic also limits the operating frequency. This paper first introduces a brand new perspective and a more efficien... View full abstract»

Floorplanning Automation for Partial-Reconfigurable FPGAs via Feasible Placements Generation

Publication Year: 2017


When dealing with partially reconfigurable designs on field-programmable gate array, floorplanning represents a critical step that highly impacts system's performance and reconfiguration overhead. However, current vendor design tools still require the floorplan to be manually defined by the designer. Within this paper, we provide a novel floorplanning automation framework, integrated in the Xilinx... View full abstract»

High-Speed and Low-Latency ECC Processor Implementation Over GF( $2^{m})$ on FPGA

Publication Year: 2017


In this paper, a novel high-speed elliptic curve cryptography (ECC) processor implementation for point multiplication (PM) on field-programmable gate array (FPGA) is proposed. A new segmented pipelined full-precision multiplier is used to reduce the latency, and the Lopez-Dahab Montgomery PM algorithm is modified for careful scheduling to avoid data dependency resulting in a drastic reduction in t... View full abstract»

ENFIRE: A Spatio-Temporal Fine-Grained Reconfigurable Hardware

Publication Year: 2017


Field programmable gate arrays (FPGAs) are well-established as fine-grained reconfigurable computing platforms. However, FPGAs demonstrate poor scalability in advanced technology nodes due to the large negative impact of the elaborate programmable interconnects (PIs). The need for such vast PIs arises from two key factors: 1) fine-grained bit-level data manipulation in the configurable logic block... View full abstract»

Temporarily Fine-Grained Sleep Technique for Near- and Subthreshold Parallel Architectures

Publication Year: 2017


This paper presents a design approach for improving energy-efficiency and throughput of parallel architectures in near- and subthreshold voltage circuits. The focus is to suppress leakage energy dissipation of the idle portions of circuits during active modes, which can allow us to wholly transform the throughput improvement from parallel architectures into energy savings via deep voltage scaling.... View full abstract»

OptiFEX: A Framework for Exploring Area-Efficient Floating Point Expressions on FPGAs With Optimized Exponent/Mantissa Widths

Publication Year: 2017
Cited by: Papers (1)


Field-programmable gate arrays (FPGAs) could outperform microprocessors on floating point computations due to massive parallelism, freedom on the selection of exponent/mantissa width, and utilization of simplified adders and multipliers. However, optimized use of resources and accuracy of the final implemented expression are two important issues in the implementation of floating point arithmetic e... View full abstract»

Logic-Base Interconnect Design for Near Memory Computing in the Smart Memory Cube

Publication Year: 2017


Hybrid memory cube (HMC) has promised to improve bandwidth, power consumption, and density for the next-generation main memory systems. In addition, 3-D integration gives a second shot for revisiting near memory computation to fill the gap between processors and memories. In this paper, we study the required infrastructure inside the HMC to support near memory computation in a modular and flexible... View full abstract»

A Fault Tolerance Technique for Combinational Circuits Based on Selective-Transistor Redundancy

Publication Year: 2017
Cited by: Papers (2)


With fabrication technology reaching nanolevels, systems are becoming more prone to manufacturing defects with higher susceptibility to soft errors. This paper is focused on designing combinational circuits for soft error tolerance with minimal area overhead. The idea is based on analyzing random pattern testability of faults in a circuit and protecting sensitive transistors, whose soft error dete... View full abstract»

Scalable Approach for Power Droop Reduction During Scan-Based Logic BIST

Publication Year: 2017


The generation of significant power droop (PD) during at-speed test performed by Logic Built-In Self Test (LBIST) is a serious concern for modern ICs. In fact, the PD originated during test may delay signal transitions of the circuit under test (CUT): an effect that may be erroneously recognized as delay faults, with consequent erroneous generation of test fails and increase in yield loss. In this... View full abstract»

Soft Error Rate Reduction of Combinational Circuits Using Gate Sizing in the Presence of Process Variations

Publication Year: 2017
Cited by: Papers (1)


Soft errors in combinational logic circuits are emerging as a significant reliability concern for nanoscale VLSI designs. This paper presents a novel sensitivity-based gate sizing methodology to reduce the soft error rate (SER) of combinational circuits in the presence of process variations. The proposed method is based on modeling the statistics of SER of the circuit gates as a random variable to... View full abstract»

Parallel High-Order Envelope-Following Method for Fast Transient Analysis of Highly Oscillatory Circuits

Publication Year: 2017


In this paper, a parallel high-order envelope-following (EF) method is presented. The proposed method exploits the high-order and A-stable Obreshkov formula (ObF) to provide superior accuracy and speedup for the EF technique. Utilizing ObF provides accurate and faster analysis while keeping the same accuracy as the conventional low-order integration methods. In addition, a parallel method that is ... View full abstract»

Lithography Defect Probability and Its Application to Physical Design Optimization

Publication Year: 2017


Modern standard cells contain intercell margins at the left and right ends for better lithography. We introduce defect probability, which is the probability that a lithography defect occurs if the margins between two adjacent cells are missing. Computing the defect probability of all cell pairs is impractical due to lengthy lithography simulations and huge number of cell pair combinations. Two app... View full abstract»

Modeling Size Limitations of Resistive Crossbar Array With Cell Selectors

Publication Year: 2017


Due to recent developments in emerging memory technologies, resistive crossbar arrays have gained increasing importance. The size of the crossbar arrays is, however, limited due to challenges brought by the interconnect resistance, sneak path currents, and the physical area of the peripheral circuitry. In this paper, three figures of merit that characterize the limitations of resistive crossbar ar... View full abstract»

