By Topic

Computers, IEEE Transactions on

Issue 3 • Date March 2012

Filter Results

Displaying Results 1 - 17 of 17
  • [Front cover]

    Publication Year: 2012 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (111 KB)  
    Freely Available from IEEE
  • [Cover 2]

    Publication Year: 2012 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (169 KB)  
    Freely Available from IEEE
  • On the Computation of Correctly Rounded Sums

    Publication Year: 2012 , Page(s): 289 - 298
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (327 KB) |  | HTML iconHTML  

    This paper presents a study of some basic blocks needed in the design of floating-point summation algorithms. In particular, in radix-2 floating-point arithmetic, we show that among the set of the algorithms with no comparisons performing only floating-point additions/subtractions, the 2Sum algorithm introduced by Knuth is minimal, both in terms of number of operations and depth of the dependency graph. We investigate the possible use of another algorithm, Dekker's Fast2Sum algorithm, in radix-10 arithmetic. We give methods for computing, in radix 10, the floating-point number nearest the average value of two floating-point numbers. We also prove that under reasonable conditions, an algorithm performing only round-to-nearest additions/subtractions cannot compute the round-to-nearest sum of at least three floating-point numbers. Starting from an algorithm due to Boldo and Melquiond, we also present new results about the computation of the correctly-rounded sum of three floating-point numbers. For a few of our algorithms, we assume new operations defined by the recent IEEE 754-2008 Standard are available. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimal-Memory Requirements for Pearl-Necklace Encoders of Quantum Convolutional Codes

    Publication Year: 2012 , Page(s): 299 - 312
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (724 KB)  

    One of the major goals in quantum information processing is to reduce the overhead associated with the practical implementation of quantum protocols, and often, routines for quantum error correction account for most of this overhead. A particular technique for quantum error correction that may be useful for protecting a stream of quantum information is quantum convolutional coding. The encoder for a quantum convolutional code has a representation as a convolutional encoder or as a "pearl-necklace” encoder. In the pearl-necklace representation, it has not been particularly clear in the research literature how much quantum memory such an encoder would require for implementation. Here, we offer an algorithm that answers this question. The algorithm first constructs a weighted, directed acyclic graph where each vertex of the graph corresponds to a gate string in the pearl-necklace encoder, and each path through the graph represents a path through noncommuting gates in the encoder. We show that the weight of the longest path through the graph is equal to the minimal amount of memory needed to implement the encoder. A dynamic programming search through this graph determines the longest path. The running time for the construction of the graph and search through it is quadratic in the number of gate strings in the pearl-necklace encoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Soft Error Sensitivity Evaluation of Microprocessors by Multilevel Emulation-Based Fault Injection

    Publication Year: 2012 , Page(s): 313 - 322
    Cited by:  Papers (20)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1125 KB) |  | HTML iconHTML  

    Estimation of soft error sensitivity is crucial in order to devise optimal mitigation solutions that can satisfy reliability requirements with reduced impact on area, performance, and power consumption. In particular, the estimation of Single Event Transient (SET) effects for complex systems that include a microprocessor is challenging, due to the huge potential number of different faults and effects that must be considered, and the delay-dependent nature of SET effects. In this paper, we propose a multilevel FPGA emulation-based fault injection approach for evaluation of SET effects called AMUSE (Autonomous MUltilevel emulation system for Soft Error evaluation). This approach integrates Gate level and Register-Transfer level models of the circuit under test in a FPGA and is able to switch to the appropriate model as needed during emulation. Fault injection is performed at the Gate level, which provides delay accuracy, while fault propagation across clock cycles is performed at the Register-Transfer level for higher performance. Experimental results demonstrate that AMUSE can emulate soft error effects for complex circuits including microprocessors and memories, considering the real delays of an ASIC technology, and support massive fault injection campaigns, in the order of tens of millions of faults within acceptable time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Soft N-Modular Redundancy

    Publication Year: 2012 , Page(s): 323 - 336
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1240 KB) |  | HTML iconHTML  

    Achieving robustness and energy efficiency in nanoscale CMOS process technologies is made challenging due to the presence of process, temperature, and voltage variations. Traditional fault-tolerance techniques such as N-modular redundancy (NMR) employ deterministic error detection and correction, e.g., majority voter, and tend to be power hungry. This paper proposes soft NMR that nontrivially extends NMR by consciously exploiting error statistics caused by nanoscale artifacts in order to design robust and energy-efficient systems. In contrast to conventional NMR, soft NMR employs Bayesian detection techniques in the voter. Soft voter algorithms are obtained through optimization of appropriate application aware cost functions. Analysis indicates that, on average, soft NMR outperforms conventional NMR. Furthermore, unlike NMR, in many cases, soft NMR is able to generate a correct output even when all N replicas are in error. This increase in robustness is then traded-off through voltage scaling to achieve energy efficiency. The design of a discrete cosine transform (DCT) image coder is employed to demonstrate the benefits of the proposed technique. Simulations in a commercial 45 nm, 1.2 V, CMOS process show that soft NMR provides up to 10× improvement in robustness, and 35 percent power savings over conventional NMR. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multithreaded Reactive Programming—the Kiel Esterel Processor

    Publication Year: 2012 , Page(s): 337 - 349
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1590 KB)  

    The Kiel Esterel Processor (KEP) is a multithreaded reactive processor designed for the execution of programs written in the synchronous language Esterel. Design goals were timing predictability, minimal resource usage, and compliance to full Esterel V5. The KEP directly supports Esterel's reactive control flow operators, notably concurrency and various types of preemption, through dedicated control units. Esterel allows arbitrary combinations and nesting of these operators, which poses particular implementation challenges that are addressed here. Other notable features of the KEP are a refined instruction set architecture, which allows us to trade-off generality against resource usage, and a Tick Manager that minimizes reaction time jitter and can detect timing over-runs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load-Balancing Multipath Switching System with Flow Slice

    Publication Year: 2012 , Page(s): 350 - 365
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2305 KB) |  | HTML iconHTML  

    Multipath Switching systems (MPS) are intensely used in state-of-the-art core routers to provide terabit or even petabit switching capacity. One of the most intractable issues in designing MPS is how to load balance traffic across its multiple paths while not disturbing the intraflow packet orders. Previous packet-based solutions either suffer from delay penalties or lead to O(N2 ) hardware complexity, hence do not scale. Flow-based hashing algorithms also perform badly due to the heavy-tailed flow-size distribution. In this paper, we develop a novel scheme, namely, Flow Slice (FS) that cuts off each flow into flow slices at every intraflow interval larger than a slicing threshold and balances the load on a finer granularity. Based on the studies of tens of real Internet traces, we show that setting a slicing threshold of 1-4 ms, the FS scheme achieves comparative load-balancing performance to the optimal one. It also limits the probability of out-of-order packets to a negligible level (10-6) on three popular MPSes at the cost of little hardware complexity and an internal speedup up to two. These results are proven by theoretical analyses and also validated through trace-driven prototype simulations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A High Performance and Memory Efficient LU Decomposer on FPGAs

    Publication Year: 2012 , Page(s): 366 - 378
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1037 KB) |  | HTML iconHTML  

    LU decomposition for dense matrices is an important linear algebra kernel that is widely used in both scientific and engineering applications. To efficiently perform large matrix LU decomposition on FPGAs with limited local memory, a block LU decomposition algorithm on FPGAs applicable to arbitrary matrix size is proposed. Our algorithm applies a series of transformations, including loop blocking and space-time mapping, onto sequential nonblocking LU decomposition. We also introduce a high performance and memory efficient hardware architecture, which mainly consists of a linear array of processing elements (PEs), to implement our block LU decomposition algorithm. Our design can achieve optimum performance under various hardware resource constraints. Furthermore, our algorithm and design can be easily extended to the multi-FPGA platform by using a block-cyclic data distribution and inter-FPGA communication scheme. A total of 36 PEs can be integrated into a Xilinx Virtex-5 XC5VLX330 FPGA on our self-designed PCI-Express card, reaching a sustained performance of 8.50 GFLOPS at 133 MHz for a matrix size of 16,384, which outperforms several general-purpose processors. For a Xilinx Virtex-6 XC6VLX760, a newer FPGA, we predict that a total of 180 PEs can be integrated, reaching 70.66 GFLOPS at 200 MHz. Compared to the previous work, our design can integrate twice the number of PEs into the same FPGA and has significantly higher performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis

    Publication Year: 2012 , Page(s): 379 - 394
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1467 KB) |  | HTML iconHTML  

    Plagiarism is a growing problem in academia. Academics often use plagiarism detection tools to detect similar source-code files. Once similar files are detected, the academic proceeds with the investigation process which involves identifying the similar source-code fragments within them that could be used as evidence for proving plagiarism. This paper describes PlaGate, a novel tool that can be integrated with existing plagiarism detection tools to improve plagiarism detection performance. The tool also implements a new approach for investigating the similarity between source-code files with a view to gathering evidence for proving plagiarism. Graphical evidence is presented that allows for the investigation of source-code fragments with regards to their contribution toward evidence for proving plagiarism. The graphical evidence indicates the relative importance of the given source-code fragments across files in a corpus. This is done by using the Latent Semantic Analysis information retrieval technique to detect how important they are within the specific files under investigation in relation to other files in the corpus. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Approximating Rate-Distortion Graphs of Individual Data: Experiments in Lossy Compression and Denoising

    Publication Year: 2012 , Page(s): 395 - 407
    Cited by:  Papers (3)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1893 KB) |  | HTML iconHTML  

    Classical rate-distortion theory requires specifying a source distribution. Instead, we analyze rate-distortion properties of individual objects using the recently developed algorithmic rate-distortion theory. The latter is based on the noncomputable notion of Kolmogorov complexity. To apply the theory we approximate the Kolmogorov complexity by standard data compression techniques, and perform a number of experiments with lossy compression and denoising of objects from different domains. We also introduce a natural generalization to lossy compression with side information. To maintain full generality we need to address a difficult searching problem. While our solutions are therefore not time efficient, we do observe good denoising and compression performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Minimum Spanning Tree Maintenance for Transient Node Failures

    Publication Year: 2012 , Page(s): 408 - 414
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (891 KB) |  | HTML iconHTML  

    In many network applications, the computation takes place on the minimum-cost spanning tree (MST) of the network G; unfortunately, a single link or node failure disconnects the tree. The ALL NODES REPLACEMENT (ANR) problem is the problem of precomputing, for each node u in G, the new MST should u fail. This problem has been extensively investigated for serial and parallel settings, and efficient solutions have been designed for those environments. The situation is surprisingly different in distributed settings. In fact, no distributed solution exists to date which performs better than the brute-force repeated application of MST construction. In this paper, we consider for the first time the problem of computing all the replacement minimum-cost spanning trees distributively. We design a solution protocol, and we prove that the total amount of communication exchanges taking place is O(n), each exchange using at most O(n) data items. Hence, the total amount of data items communicated during the computation (the data complexity) is O(n^2). We also show how the simpler problem ALL EDGES REPLACEMENT (AER) dealing with single edge failures, which can be solved with the same costs using some existing techniques. Also for the AER problem, efficient solutions exist in the serial and parallel setting but, prior to this work, no distributed solution other than brute force was known. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Montgomery-Based Semi-Systolic Multiplier for Even-Type GNB of GF(2^m)

    Publication Year: 2012 , Page(s): 415 - 419
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (544 KB) |  | HTML iconHTML  

    Efficient finite field multiplication is crucial for implementing public key cryptosystem. To achieve this, multipliers using Gaussian normal basis have been widely explored in previous works. In this paper, based on proposed Gaussian normal basis Montgomery (GNBM) representation, a semi-systolic even-type GNBM multiplier is developed. Analysis shows that the proposed architecture saves about 57 percent space complexity and 50 percent time complexity when compared with the only existing semi-systolic even-type GNB multiplier. Moreover, due to properties of regularity and modularity, the proposed multiplier is very suitable for VLSI implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maxterm Covering for Satisfiability

    Publication Year: 2012 , Page(s): 420 - 426
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (561 KB)  

    This paper presents a novel efficient satisfiability (SAT) algorithm based on maxterm covering. The satisfiability of a clause set is determined in terms of the number of relative maxterms of the empty clause with respect to the clause set. If the number of relative maxterms is zero, it is unsatisfiable, otherwise satisfiable. A set of synergic heuristic strategies are presented and elaborated. We conduct a number of experiments on 3-SAT and k-SAT problems at the phase transition region, which have been cited as the hardest group of SAT problems. Our experimental results on public benchmarks attest to the fact that, by incorporating our proposed heuristic strategies, our enhanced algorithm runs several orders of magnitude faster than the extension rule algorithm, and it also runs faster than zChaff and MiniSAT for most of k-SAT (k≥3) instances. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modified Redundant Representation for Designing Arithmetic Circuits with Small Complexity

    Publication Year: 2012 , Page(s): 427 - 432
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (805 KB) |  | HTML iconHTML  

    We give a modified redundant representation for designing arithmetic circuits with small complexity. Using our modified redundant representation, we improve many of the complexity values significantly. Our method works for any finite field. We also give some applications in cryptography. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Cover3]

    Publication Year: 2012 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (169 KB)  
    Freely Available from IEEE
  • [Cover 4]

    Publication Year: 2012 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (111 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org