• ### Symbiote Coprocessor Unit—A Streaming Coprocessor for Data Stream Acceleration

Publication Year: 2016, Page(s):813 - 826
This paper describes the design and the architecture of symbiote coprocessor unit (SCU)-a programmable streaming coprocessor for a heterogeneous reconfigurable logic-assisted data stream management systems (DSMSs) such as symbiote. The SCU is intended for streaming applications with real-time event and data processing that have stricter deadlines, high-bandwidth, and high-accuracy requirements. To... View full abstract»

• ### A Novel Quantum-Dot Cellular Automata ${X}$ -bit $\times 32$ -bit SRAM

Publication Year: 2016, Page(s):827 - 836
Application of quantum-dot cellular automata (QCA) technology as an alternative to CMOS technology on the nanoscale has a promising future; QCA is an interesting technology for building memory. The proposed design and simulation of a new memory cell structure based on QCA with a minimum delay, area, and complexity is presented to implement a static random access memory (SRAM). This paper presents ... View full abstract»

• ### Hardware Accelerator for Probabilistic Inference in 65-nm CMOS

Publication Year: 2016, Page(s):837 - 845
A hardware accelerator is presented to compute the probabilistic inference for a Bayesian network (BN) in distributed sensing applications. For energy efficiency, the accelerator is operated at a near-threshold voltage of 0.5 V, while achieving a maximum clock frequency of 33 MHz. Clique-tree message passing algorithm is leveraged to compute the probabilistic inference. The theoretical maximum siz... View full abstract»

• ### Input-Based Dynamic Reconfiguration of Approximate Arithmetic Units for Video Encoding

Publication Year: 2016, Page(s):846 - 857
The field of approximate computing has received significant attention from the research community in the past few years, especially in the context of various signal processing applications. Image and video compression algorithms, such as JPEG, MPEG, and so on, are particularly attractive candidates for approximate computing, since they are tolerant of computing imprecision due to human imperceptib... View full abstract»

• ### VANUCA: Enabling Near-Threshold Voltage Operation in Large-Capacity Cache

Publication Year: 2016, Page(s):858 - 870
In this paper, we investigate the feasibility of voltage adjustment in a large capacity cache, and propose the architecture of voltage-adaptable nonuniform cache access (VANUCA) that exploits near-threshold computing and multivoltage domain to approach the limit of Vdd in a low-power cache. However, the adoption of near-threshold voltage (NTV) leads to a rocketing error probability in S... View full abstract»

• ### Write Buffer-Oriented Energy Reduction in the L1 Data Cache for Embedded Systems

Publication Year: 2016, Page(s):871 - 883
In resource-constrained embedded systems, on-chip cache memories play an important role in both performance and energy consumption. In contrast to read operations, scant regard has been paid to optimizing write operations even though the energy consumed by write operations in the data cache constitutes a large portion of the total energy consumption. Consequently, this paper proposes a write buffe... View full abstract»

• ### Designing Tunable Subthreshold Logic Circuits Using Adaptive Feedback Equalization

Publication Year: 2016, Page(s):884 - 896
Ultralow-power subthreshold logic circuits are becoming prominent in embedded applications with limited energy budgets. Minimum energy consumption of digital logic circuits can be obtained by operating in the subthreshold regime. However, in this regime process variations can result in up to an order of magnitude variations in ION/IOFF ratios leading to timing errors, which c... View full abstract»

• ### Error Resilient and Energy Efficient MRF Message-Passing-Based Stereo Matching

Publication Year: 2016, Page(s):897 - 908
Message-passing-based inference algorithms have immense importance in real-world applications. In this paper, error resiliency of a message passing based Markov random field (MRF) stereo matching hardware is explored and enhanced through the application of statistical error compensation. Error resiliency is of particular interest for subnanometer and postsilicon devices. The inherent robustness of... View full abstract»

• ### Process Variation Delay and Congestion Aware Routing Algorithm for Asynchronous NoC Design

Publication Year: 2016, Page(s):909 - 919
The effect of process variation (PV) on delay is a major reason to deteriorate the performance in advanced technologies. The performance of different routing algorithms is determined with/without PV for various traffic patterns. The saturation throughput and average message delay are used as performance metrics to evaluate the throughput. PV decreases the saturation throughput and increases the av... View full abstract»

• ### DFSB-Based Thermal Management Scheme for 3-D NoC-Bus Architectures

Publication Year: 2016, Page(s):920 - 931
Three-dimensional network-on-chip (NoC)-bus hybrid architectures are motivated to achieve lower propagation latency and higher bandwidth in vertical direction, by taking the advantage of the short interwafer distances in 3-D integrated circuits. However, 3-D integration technology increases the power density of the chip, and thus, results in thermal-related problems. Therefore, to ensure that the ... View full abstract»

• ### Low-Cost Multiple Bit Upset Correction in SRAM-Based FPGA Configuration Frames

Publication Year: 2016, Page(s):932 - 943
Radiation-induced multiple bit upsets (MBUs) are a major reliability concern in nanoscale technology nodes. Occurrence of such errors in the configuration frames of a field-programmable gate array (FPGA) device permanently affects the functionality of the mapped design. Periodic configuration scrubbing combined with a low-cost error correction scheme is an efficient approach to avoid such a perman... View full abstract»

• ### Adaptive Write and Shift Current Modulation for Process Variation Tolerance in Domain Wall Caches

Publication Year: 2016, Page(s):944 - 953
Domain wall memory (DWM), also known as racetrack memory, is gaining significant attention for embedded cache application due to low standby power, excellent retention, and the ability to store multiple bits per cell. In addition, it offers fast access time, good endurance, and retention. However, it suffers from poor write latency, shift latency, shift power, and write power. In addition, we obse... View full abstract»

• ### Sequoia: A High-Endurance NVM-Based Cache Architecture

Publication Year: 2016, Page(s):954 - 967
Emerging nonvolatile memory technologies, such as spin-transfer torque RAM or resistive RAM, can increase the capacity of the last-level cache (LLC) in a latency and power-efficient manner. These technologies endure 109-1012 writes per cell, making a nonvolatile cache (NV-cache) with a lifetime of dozens of years under ideal working conditions. However, nonuniformity in write... View full abstract»

• ### A Universal Hardware-Driven PVT and Layout-Aware Predictive Failure Analytics for SRAM

Publication Year: 2016, Page(s):968 - 978
The impact of device variability, temperature, and technology CAD-based layout parasitics on low-voltage static random access memory (SRAM) yield is explored using a novel variability-aware statistical methodology. Threshold voltage, Vt, mismatches for planar 22- and 14-nm FinFET SRAM transistors are characterized based on unique array-like structures for capturing process voltage and t... View full abstract»

• ### An Information Theory Perspective for the Binary STT-MRAM Cell Operation Channel

Publication Year: 2016, Page(s):979 - 991
Spin-torque transfer magnetic random access memory (STT-MRAM) has emerged as a promising nonvolatile memory technology, with advantages, such as scalability, speed, endurance, and power consumption. This paper presents an STT-MRAM cell operation channel model with write and read operations for information theorists and error correction code designers. This model considers the effects of process va... View full abstract»

• ### Embedding Read-Only Memory in Spin-Transfer Torque MRAM-Based On-Chip Caches

Publication Year: 2016, Page(s):992 - 1002
We propose a design technique for embedding read-only memory (ROM) in spin-transfer torque MRAM (STT-MRAM) arrays by adding an extra bit-line in every column of the array. RAM and ROM data, which can be different, are stored in the same bitcell and the ROM capacity may be as large as the RAM capacity. Furthermore, our proposed ROM-embedding technique is applicable to any resistive memory technolog... View full abstract»

• ### Modeling and Optimization of Memristor and STT-RAM-Based Memory for Low-Power Applications

Publication Year: 2016, Page(s):1003 - 1014
Conventional charge-based memory usage in low-power applications is facing major challenges. Some of these challenges are leakage current for static random access memory (SRAM) and dynamic random access memory (DRAM), additional refresh operation for DRAM, and high programming voltage for Flash. In this paper, two emerging resistive random access memory (ReRAM) technologies are investigated, memri... View full abstract»

• ### All-Digital 90° Phase-Shift DLL With Dithering Jitter Suppression Scheme

Publication Year: 2016, Page(s):1015 - 1024
This paper proposes a 90° phase-shift delay-locked loop (DLL) used in dynamic RAM for data sampling clock generation. The proposed DLL alleviates process variation issues, which are mainly caused by the mismatch between the delay line segments in the previous 90° phase-shift DLLs, and reduces area by adopting a multiplying DLL-based structure. In addition, a novel jitter suppression ... View full abstract»

• ### An All-Digital Approach to Supply Noise Cancellation in Digital Phase-Locked Loop

Publication Year: 2016, Page(s):1025 - 1035
With increased levels of integration in modern system-on-chips, the coupling of supply noise in a phase-locked loop (PLL) has become the dominant source of performance degradation in many systems. In this paper, an all-digital approach to canceling the effects of supply noise is presented. By sensing the supply noise using an analog-to-digital converter (ADC), an observer-controller loop filter jo... View full abstract»

• ### Enhancing Model Order Reduction for Nonlinear Analog Circuit Simulation

Publication Year: 2016, Page(s):1036 - 1049
Traditionally, model order reduction methods have been used to reduce the computational complexity of mathematical models of dynamic systems, while preserving their functional characteristics. This technique can also be used to fasten analog circuit simulations without sacrificing their highly nonlinear behavior. In this paper, we present an iterative approach for reducing the computational comple... View full abstract»

• ### Dual-Calibration Technique for Improving Static Linearity of Thermometer DACs for I/O

Publication Year: 2016, Page(s):1050 - 1058
In this paper, we propose a dual-calibration technique to improve the matching accuracy of digital-to-analog converter (DAC) elements and improve nonlinearity induced static errors in a current-steering thermometer DAC. The novelty of the proposed dual-calibration scheme lies in obtaining best samples from the error distribution using redundancy for improved matching followed by adaptively reorder... View full abstract»

• ### DScanPUF: A Delay-Based Physical Unclonable Function Built Into Scan Chain

Publication Year: 2016, Page(s):1059 - 1070
Physical unclonable function (PUF) has emerged as an attractive primitive to address diverse hardware security issues in integrated circuits, such as authentication and cryptographic key generation. Most of the existing PUFs rely on dedicated circuit structure for generating random signatures. It often causes concerns due to extra design efforts and hardware overhead. Moreover, the hardware comple... View full abstract»

• ### Toward Solving Multichannel RF-SoC Integration Issues Through Digital Fractional Division

Publication Year: 2016, Page(s):1071 - 1082
In modern RF system on chips (SoCs), the digital content consumes up to 85% of the IC chip area. The recent push to integrate multiple RF-SoC cores is met with heavy resistance by the remaining RF/analog circuitry, which creates numerous strong aggressors and weak victims leading to RF performance degradation. A key such mechanism is injection pulling through parasitic coupling between various LC-... View full abstract»

