• ### Guest Editors’ Introduction: Special Sectionon Computer Arithmetic

Publication Year: 2014


• ### Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator

Publication Year: 2014



This paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor. Specifically, the focus is on Cholesky, LU (with partial pivoting), and QR factorizations and their blocked algorithms. As part of the study, we expose the benefits of redesigning floating point units and their surrounding data-paths to... View full abstract»

• ### Fast and Efficient Circuit Topologies forFinding the Maximum of n k-Bit Numbers

Publication Year: 2014



Finding the value and/or index of the maximum (or minimum) element of a set of n numbers (each with k-bits) is a fundamental arithmetic operation and is needed in many applications. This paper proposes several maximum-finder (or minimum-finder) circuit topologies, which are parallel. We wrote circuit generators at hardware description language level for our topologies and previous works. Then we s... View full abstract»

• ### Energy-Efficient Pixel-Arithmetic

Publication Year: 2014



With the advent of pervasive computing, the performance requirements of visual applications have increased significantly. On the other hand, the energy budget for future devices may decrease due to reduced form factors and thus smaller battery sizes. It is thus imperative to improve energy efficiency of visual applications to meet their stringent demands in energy-constrained devices. This paper p... View full abstract»

• ### Division-Free Binary-to-Decimal Conversion

Publication Year: 2014


This article presents algorithms that convert multiple precision integer or floating-point numbers from radix 2 to radix 10 (or to any radix b > 2). Those algorithms, based on the “scaled remainder tree” technique, use multiplications instead of divisions in their critical part. Both quadratic and subquadratic algorithms are detailed, with proofs of correctness. Experimental resul... View full abstract»

• ### Fast Radix-10 Multiplication Using Redundant BCD Codes

Publication Year: 2014



We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speedup its computation: the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition, new techniques are developed to reduce significantly the latency and area of previous representative high-performance implementations. ... View full abstract»

• ### Numerical Reproducibility and Parallel Computations: Issues for Interval Algorithms

Publication Year: 2014



What is called numerical reproducibility is the problem of getting the same result when the scientific computation is run several times, either on the same machine or on different machines, with different types and numbers of processing units, execution environments, computational loads, etc. This problem is especially stringent for HPC numerical simulations. In what follows, we identify the probl... View full abstract»

• ### A DFA with Extended Character-Set for Fast Deep Packet Inspection

Publication Year: 2014



Deep packet inspection (DPI), based on regular expressions, is expressive, compact, and efficient in specifying attack signatures. We focus on their implementations based on general-purpose processors that are cost-effective and flexible to update. In this paper, we propose a novel solution, called deterministic finite automata with extended character-set (DFA/EC), which can significantly decrease... View full abstract»

• ### Applying Network Coding to Peer-to-Peer File Sharing

Publication Year: 2014



Network coding is a promising enhancement of routing to improve network throughput and provide high reliability. It allows a node to generate output messages by encoding its received messages. Peer-to-peer networks are a perfect place to apply network coding due to two reasons: the topology of a peer-to-peer network is constructed arbitrarily, thus it is easy to tailor the topology to facilitate n... View full abstract»

• ### Arbitrary-State Attribute-Based Encryption with Dynamic Membership

Publication Year: 2014



Attribute-based encryption (ABE) is an advanced encryption technology where the privacy of receivers is protected by a set of attributes. An encryptor can ensure that only the receivers who match the restrictions on predefined attribute values associated with the ciphertext can decrypt the ciphertext. However, maintaining the correctness of all users' attributes will take huge cost because it is n... View full abstract»

• ### $C!!-!!Lock$ : Energy Efficient Synchronization for Embedded Multicore Systems

Publication Year: 2014



Data synchronization among multiple cores has been one of the critical issues which must be resolved in order to optimize the parallelism of multicore architectures. Data synchronization schemes can be classified as lock-based methods (“pessimistic”) and lock-free methods (“optimistic”). However, none of these methods consider the nature of embedded systems which have d... View full abstract»

• ### LACS: A Locality-Aware Cost-Sensitive Cache Replacement Algorithm

Publication Year: 2014



The design of an effective last-level cache (LLC) in general-and an effective cache replacement/partitioning algorithm in particular-is critical to the overall system performance. The processor's ability to hide the LLC miss penalty differs widely from one miss to another. The more instructions the processor manages to issue during the miss, the better it is capable of hiding the miss penalty and ... View full abstract»

• ### On the Multicast Lifetime of WANETs with Multibeam Antennas: Formulation, Algorithms, and Analysis

Publication Year: 2014



We explore the multicast lifetime capacity of energy-limited wireless ad hoc networks using directional multibeam antennas by formulating and solving the corresponding optimization problem. In such networks, each node is equipped with a practical smart antenna array that can be configured to support multiple beams with adjustable orientation and beamwidth. The special case of this optimization pro... View full abstract»

• ### Opportunistic Sensing in Wireless Sensor Networks: Theory and Application

Publication Year: 2014



In real world, wireless heterogeneous sensor network (HSN) design and information integration are necessary in different applications. Traditionally, wireless sensor networks information integration is set up to passively fuse all received data. Such an approach is computationally challenging and operationally ineffective because improvements in information accuracy are not guaranteed. Opportunist... View full abstract»

• ### Reliable Multicast in Data Center Networks

Publication Year: 2014



Multicast benefits data center group communication in both saving network traffic and improving application throughput. Reliable packet delivery is required in data center multicast for data-intensive computations. However, existing reliable multicast solutions for the Internet are not suitable for the data center environment, especially with regard to keeping multicast throughput from degrading u... View full abstract»

• ### Scaling Power and Performance viaProcessor Composability

Publication Year: 2014



Power dissipation trends are leading high-performance processors to a regime in which all chip elements cannot be operated simultaneously at maximum frequency. Consequently, energy-efficiency will increase even more in importance, and performance must be achieved within strict power budgets. Current designs employ techniques such as dynamic voltage and frequency scaling (DVFS) to provide power-per... View full abstract»

• ### Scheduling to Optimize Cache Utilization for Non-Volatile Main Memories

Publication Year: 2014



In power and size sensitive embedded systems, non-volatile memories (NVMs) are replacing DRAM as the main memory since they have higher density, lower static power consumption, and lower costs. Unfortunately, these technologies are limited by their endurance and long write latencies. To minimize the main memory access time and extend the lifetime of the NVM, we optimally schedule tasks by an ILP f... View full abstract»

• ### System-Wide Cooperative Optimization for NAND Flash-Based Mobile Systems

Publication Year: 2014


NAND flash memory has become an essential storage medium for various mobile devices, but it has some idiosyncrasies, such as out-of-place updates and bulk erase operations, which impair the I/O performance of those devices. In particular, the random write performance is strongly influenced by the overhead of a Flash Translation Layer (FTL) that hides the idiosyncrasies of NAND flash memory. To red... View full abstract»

• ### Truthful Mechanisms for Allocating a Single Processor to Sporadic Tasks in Competitive Real-Time Environments

Publication Year: 2014



In a non-competitive environment, sporadic real-time task scheduling on a single processor is well understood. In this paper, we consider a competitive environment comprising several real-time tasks vying for execution upon a shared single processor. Each task obtains a value if the processor successfully schedules all its jobs. Our objective is to select a feasible subset of these tasks to maximi... View full abstract»

• ### Throughput Enhancement for Phase Change Memories

Publication Year: 2014



Phase Change Memory (PCM) has emerged as a promising candidate for future memories. PCM has high cell density, zero cell leakage, and high stability in deep sub-micron technologies. Although PCM has limited endurance, recent endeavors have shown that its lifetime can be improved by orders of magnitude. However, a major hurdle for PCM is the long write latency and high write power. For this reason,... View full abstract»

• ### An Efficient Multiple Cell Upsets Tolerant Content-Addressable Memory

Publication Year: 2014



Multiple cell upsets (MCUs) become more and more problematic as the size of technology reaches or goes below 65 nm. The percentage of MCUs is reported significantly larger than that of single cell upsets (SCUs) in 20 nm technology. In SRAM and DRAM, MCUs are tackled by incorporating single-error correcting double-error detecting (SEC-DED) code and interleaved data columns. However, in content-addr... View full abstract»

• ### Novel RNS Parameter Selection for Fast Modular Multiplication

Publication Year: 2014



The parameter selection of Residue Number Systems (RNS) has a great impact on its computational efficiency. This paper shows that a base extension, the most costly operation in RNS Montgomery multiplication, can be more efficient when the intervals between the RNS moduli are small. We propose a systematic RNS parameter selection procedure and two methods to select RNS moduli that lead to a reduced... View full abstract»

• ### On Newton–Raphson Iteration for Multiplicative Inverses Modulo Prime Powers

Publication Year: 2014



We study algorithms for the fast computation of modular inverses. Newton-Raphson iteration over p-adic numbers gives a recurrence relation computing modular inverse modulo pm, that is logarithmic in m. We solve the recurrence to obtain an explicit formula for the inverse. Then, we study different implementation variants of this iteration and show that our explicit formula is interesting... View full abstract»

