<![CDATA[ IEEE Transactions on Computers - new TOC ]]>
http://ieeexplore.ieee.org
TOC Alert for Publication# 12 2017July 27<![CDATA[Arb: Efficient Arbitrary-Precision Midpoint-Radius Interval Arithmetic]]>66812811292274<![CDATA[A Built-Off Self-Repair Scheme for Channel-Based 3D Memories]]>built-off self-test (BOSR) scheme at the controller level for channel-based 3D memory to enhance final product yield after the bonding of a memory cube to its corresponding logic die. The logic die contains the Channel controller, in which the BOSR circuit resides. Experimental results show that the repair rate is high with higher cluster failure ratio due to the flexible algorithm we choose. The area overhead is low and it decreases significantly when the memory size or channel count increases. The performance penalty is also low due to the parallel execution of address comparison and repair. Moreover, the manufacture cost is lower than conventional DRAM architecture due to allocator-based redundancies. Finally, the proposed scheme can easily be applied to other channel-based 3D memories.]]>668129313011405<![CDATA[Adaptive Coherence Granularity for Multi-Socket Systems]]>668130213122870<![CDATA[Area-Time Efficient Computation of Niederreiter Encryption on QC-MDPC Codes for Embedded Hardware]]>$\mu s$ on a Xilinx Virtex-6 FPGA with 3371 slices. Our iterative decrypting engine can decrypt one ciphertext in 114.64 $\mu s$ with 5271 slices and our faster non-iterative decrypting engine can decrypt in 65.76 $\mu s$ with 8781 slices.]]>668131313251223<![CDATA[ASSER: An Efficient, Reliable, and Cost-Effective Storage Scheme for Object-Based Cloud Storage Systems]]>ASSER, an ASSembling chain of Erasure coding and Replication. ASSER stores each object in two parts: a full copy and a certain amount of erasure-coded segments. We establish dedicated read/write protocols for ASSER leveraging the unique structural advantages. On the basis of elementary protocols, we implement sequential and PRAM (Pipeline-RAM) consistency to make ASSER feasible for various services with different performance/consistency requirements. Evaluation results demonstrate that under the same fault tolerance and consistency level, ASSER outperforms N-way replication and pure erasure coding in I/O throughput under diverse system and workload configurations with superior performance stability. More importantly, ASSER delivers stably efficient I/O performance at much lower storage cost than the other comparatives.]]>668132613402076<![CDATA[Cross-Platform Resource Scheduling for Spark and MapReduce on YARN]]>iKayak, which aims to improve the resource utilization and application performance in multi-tenant Spark-on-YARN clusters. iKayak relies on three key mechanisms: reservation-aware executor placement to avoid long waiting for resource reservation, dependency-aware resource adjustment to exploit under-utilized resource occupied by reduce tasks, and cross-platform locality-aware task assignment to coordinate locality competition between Spark and MapReduce applications. We implement iKayak in YARN. Experimental results on a testbed show that iKayak can achieve 50 percent performance improvement for Spark applications and 19 percent performance improvement for MapReduce applications, compared to two popular Spark-on-YARN deployment models, i.e., YARN-client model and YARN-cluster model.]]>66813411353999<![CDATA[Efficient Composited de Bruijn Sequence Generators]]>$2^n$ is a sequence in which every tuple of $n$ bits occurs exactly once. De Bruijn sequence generators have randomness properties that make them attractive for pseudorandom number generators and as building blocks for stream ciphers. Unfortunately, it is very difficult to find de Bruijn sequence generators with long periods (e.g., $2^{128}$) and most known de Bruijn sequence generators are computationally quite expensive. In this article, we present “OcDeb-$k$-$n$” and the first hardware implementation of de Bruijn sequence generators. OcDeb-$k$-$n$ efficiently computes a composited de Bruijn sequence where $k$ levels of composi-
ion are added to a de Bruijn sequence of period $2^n$. Numerically, OcDeb reduces the bit operations used for computing the feedback function significantly from ${\Theta}(k^2+nk)$ to ${\Theta}(k\;\log {k} + \log {n})$. Furthermore, it enables efficient parallelization and hardware retiming. Comprehensive result analysis is conducted for 65 nm ASIC technology. For example, OcDeb-32-32 has an area of 643 GE with 1.45 Gbps performance, and with parallelization it generates up to 25.4 Gbps at the cost of 4,787 GE. The area of OcDeb-512-32 generating a de Bruijn sequence of period $2^{544}$ is 7,304 GE and the performance is 1.25 Gbps.]]>668135413681199<![CDATA[Evaluation of Large Integer Multiplication Methods on Hardware]]>668136913821360<![CDATA[Hardware Design of Low-Power High-Throughput Sorting Unit]]>668138313951500<![CDATA[New Block Recombination for Subquadratic Space Complexity Polynomial Multiplication Based on Overlap-Free Approach]]>66813961406660<![CDATA[Wait-Free Programming for General Purpose Computations on Graphics Processors]]>$t$-resilient read-modify-write objects for a general model of GPU architectures without hardware synchronization primitives such as test-and-set and compare-and-swap. Accesses to the wait-free objects have time complexity $O(N)$, where $N$ is the number of processes. The wait-free objects have the optimal space complexity $O(N^2)$. Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without strong synchronization primitives in hardware and tha-
wait-free programming is possible for such GPUs.]]>66814071420383<![CDATA[A Secure Phase-Encrypted IEEE 802.15.4 Transceiver Design]]>66814211427732<![CDATA[Bank-Group Level Parallelism]]>668142814341244<![CDATA[Design of Approximate Radix-4 Booth Multipliers for Error-Tolerant Computing]]>66814351441471<![CDATA[DFT Computation Using Gauss-Eisenstein Basis: FFT Algorithms and VLSI Architectures]]>$3.62\times 10^9$ coefficients/s. The FPGA verified digital designs were synthesized, mapped, placed and finally routed for $0.18\mu$m CMOS technology assuming a 1.8 V DC supply employing Austria Micro Systems (AMS) standard-cell library (hitkit version 4.11). The routed ASIC is predicted to operate at a maximum frequency of 505 MHz for the expansion factor FRS with potential real-time throughput of $6.06\times 10^9$ coefficients/s.]]>66814421448601<![CDATA[Dynamic Checkpointing Policy in Heterogeneous Real-Time Standby Systems]]>N standby computing systems with a dynamic checkpointing policy. The system performs a real-time mission task that has to be accomplished within an allowed mission time. During the mission, to facilitate an effective failure recovery the system undergoes checkpointing procedures according to a policy that dynamically determines a checkpointing frequency based on the activated element and remaining work for completing the mission. System elements are heterogeneous; they can follow different, arbitrary types of time-to-failure distributions, have different performance and wait in different standby modes before their activation. A new numerical algorithm based on state space event transitions is first proposed to evaluate mission success probability of the real-time standby systems considered in this work. Additional new contributions are made by formulating and solving optimal dynamic checkpointing policy problems, as well as an integrated optimization problem that finds the optimal combination of checkpointing policy and element activation sequence maximizing mission success probability. Advantages of using the dynamic checkpointing policy over fixed even checkpoints are demonstrated through examples. Examples and results are also provided to illustrate effects of different mission and element parameters on mission success probability as well as on the optimal dynamic checkpointing policy.]]>66814491456421<![CDATA[Exploiting Write Heterogeneity of Morphable MLC/SLC SSDs in Datacenters with Service-Level Objectives]]>668145714631208