By Topic

Computers, IEEE Transactions on

Issue 2 • Date Feb. 2011

Filter Results

Displaying Results 1 - 18 of 18
  • [Front cover]

    Publication Year: 2011 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (177 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2011 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • Guest Editors' Introduction: Special Section on Computer Arithmetic

    Publication Year: 2011 , Page(s): 145 - 147
    Save to Project icon | Request Permissions | PDF file iconPDF (112 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Reducing the Computation Time in (Short Bit-Width) Two's Complement Multipliers

    Publication Year: 2011 , Page(s): 148 - 156
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2723 KB) |  | HTML iconHTML  

    Two's complement multipliers are important for a wide range of applications. In this paper, we present a technique to reduce by one row the maximum height of the partial product array generated by a radix-4 Modified Booth Encoded multiplier, without any increase in the delay of the partial product generation stage. This reduction may allow for a faster compression of the partial product array and regular layouts. This technique is of particular interest in all multiplier designs, but especially in short bit-width two's complement multipliers for high-performance embedded cores. The proposed method is general and can be extended to higher radix encodings, as well as to any size square and m times n rectangular multipliers. We evaluated the proposed approach by comparison with some other possible solutions; the results based on a rough theoretical analysis and on logic synthesis showed its efficiency in terms of both area and delay. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exact and Approximated Error of the FMA

    Publication Year: 2011 , Page(s): 157 - 164
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1400 KB) |  | HTML iconHTML  

    The fused multiply accumulate-add (FMA) instruction, specified by the IEEE 754-2008 Standard for Floating-Point Arithmetic, eases some calculations, and is already available on some current processors such as the Power PC or the Itanium. We first extend an earlier work on the computation of the exact error of an FMA (by giving more general conditions and providing a formal proof). Then, we present a new algorithm that computes an approximation to the error of an FMA, and provide error bounds and a formal proof for that algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved Division by Invariant Integers

    Publication Year: 2011 , Page(s): 165 - 175
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1064 KB) |  | HTML iconHTML  

    This paper considers the problem of dividing a two-word integer by a single-word integer, together with a few extensions and applications. Due to lack of efficient division instructions in current processors, the division is performed as a multiplication using a precomputed single-word approximation of the reciprocal of the divisor, followed by a couple of adjustment steps. There are three common types of unsigned multiplication instructions: we define full word multiplication (umul), which produces the two-word product of two single-word integers; low multiplication (umullo), which produces only the least significant word of the product; and high multiplication (umulhi), which produces only the most significant word. We describe an algorithm that produces a quotient and remainder using one umul and one umullo. This is an improvement over earlier methods, since the new method uses cheaper multiplication operations. It turns out that we also get some additional savings from simpler adjustment conditions. The algorithm has been implemented in version 4.3 of the gmp library. When applied to the problem of dividing a large integer by a single word, the new algorithm gives a speedup of roughly 30 percent, benchmarked on AMD and Intel processors in the x86_64 family. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simulation-Based Verification of Floating-Point Division

    Publication Year: 2011 , Page(s): 176 - 188
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (995 KB) |  | HTML iconHTML  

    Floating-point division is known to exhibit an exceptionally wide array of corner cases, making its verification a difficult challenge. Despite the remarkable advances in formal methods, the intricacies of this operation and its implementation often render these inapplicable. Simulation-based methods remain the primary means for verification of division. FPgen is a test generation framework targeted at the floating point datapath. It has been successfully used in the simulation-based verification of a variety of hardware designs. FPgen comprises a comprehensive test plan and a powerful test generator. A proper response to the difficulties posed by division constitutes a major part of FPgen's capabilities. We present an overview of the relevant verification tasks supplied with FPgen and the underlying algorithms used to target them. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Area-Efficient Multipliers Based on Multiple-Radix Representations

    Publication Year: 2011 , Page(s): 189 - 201
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1411 KB) |  | HTML iconHTML  

    In this paper, we shall introduce several new algorithms for integer multiplication that are based on specific multiple-radix representation of one of the multiplicands. We provide extensive theoretical analysis and experimental results for multipliers based on the new representations on 0.18 μm CMOS technology. They provide a clear picture about the advantages of the new method in 64-bit hardware implementations compared to array-based classical multiplier and radix-8-based multiplier. The proposed multipliers have better area and power consumption compared to reference multipliers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Real/Complex Logarithmic Number System ALU

    Publication Year: 2011 , Page(s): 202 - 213
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4070 KB) |  | HTML iconHTML  

    The real Logarithmic Number System (LNS) offers fast multiplication but uses more expensive addition. Cotransformation and higher order table methods allow real LNS ALUs with reasonable precision on Field-Programmable Gate Arrays (FPGAs). The Complex LNS (CLNS) is a generalization of LNS, which represents complex values in log-polar form. CLNS is a more compact representation than traditional rectangular methods, reducing bus and memory cost in the FFT; however, prior CLNS implementations were either slow CORDIC-based or expensive 2D-table-based approaches. Instead, we reuse real LNS hardware for CLNS, with specialized hardware (including a novel log sin that overcomes singularity problems) that is smaller than the real-valued LNS ALU to which it is attached. All units were derived from the Floating-Point-Cores (FloPoCo) library. FPGA synthesis shows our CLNS ALU is smaller than prior fast CLNS units. We also compare the accuracy of prior and proposed CLNS implementations. The most accurate of the proposed methods increases the error in radix-two FFTs by less than half a bit, and a more economical FloPoCo-based implementation increases the error by only one bit. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation

    Publication Year: 2011 , Page(s): 214 - 227
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2815 KB) |  | HTML iconHTML  

    In this paper, we show how to reduce the computation of correctly rounded square roots of binary floating-point data to the fixed-point evaluation of some particular integer polynomials in two variables. By designing parallel and accurate evaluation schemes for such bivariate polynomials, we show further that this approach allows for high instruction-level parallelism (ILP) exposure, and thus, potentially low-latency implementations. Then, as an illustration, we detail a C implementation of our method in the case of IEEE 754-2008 binary32 floating-point data (formerly called single precision in the 1985 version of the IEEE 754 standard). This software implementation, which assumes 32-bit unsigned integer arithmetic only, is almost complete in the sense that it supports special operands, subnormal numbers, and all rounding-direction attributes, but not exception handling (that is, status flags are not set). Finally, we have carried out experiments with this implementation on the ST231, an integer processor from the STMicroelectronics' ST200 family, using the ST200 family VLIW compiler. The results obtained demonstrate the practical interest of our approach in that context: for all rounding-direction attributes, the generated assembly code is optimally scheduled and has indeed low latency (23 cycles). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Midpoints and Exact Points of Some Algebraic Functions in Floating-Point Arithmetic

    Publication Year: 2011 , Page(s): 228 - 241
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4077 KB) |  | HTML iconHTML  

    When implementing a function f in floating-point arithmetic, if we wish correct rounding and good performance, it is important to know if there are input floating-point values x such that f(x) is either the middle of two consecutive floating-point numbers (assuming rounded-to-nearest arithmetic), or a floating-point number (assuming rounded toward ± ∞ or toward 0 arithmetic). In the first case, we say that f(x) is a midpoint, and in the second case, we say that f(x) is an exact point. For some usual algebraic functions and various floating-point formats, we prove whether or not there exist midpoints or exact points. When there exist midpoints or exact points, we characterize them or list all of them (if there are not too many). The results and the techniques presented in this paper can be used in particular to deal with both the binary and the decimal formats defined in the IEEE 754-2008 standard for floating-point arithmetic. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Certifying the Floating-Point Implementation of an Elementary Function Using Gappa

    Publication Year: 2011 , Page(s): 242 - 253
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (849 KB) |  | HTML iconHTML  

    High confidence in floating-point programs requires proving numerical properties of final and intermediate values. One may need to guarantee that a value stays within some range, or that the error relative to some ideal value is well bounded. This certification may require a time-consuming proof for each line of code, and it is usually broken by the smallest change to the code, e.g., for maintenance or optimization purpose. Certifying floating-point programs by hand is, therefore, very tedious and error-prone. The Gappa proof assistant is designed to make this task both easier and more secure, due to the following novel features: It automates the evaluation and propagation of rounding errors using interval arithmetic. Its input format is very close to the actual code to validate. It can be used incrementally to prove complex mathematical properties pertaining to the code. It generates a formal proof of the results, which can be checked independently by a lower level proof assistant like Coq. Yet it does not require any specific knowledge about automatic theorem proving, and thus, is accessible to a wide community. This paper demonstrates the practical use of this tool for a widely used class of floating-point programs: implementations of elementary functions in a mathematical library. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hybrid Binary-Ternary Number System for Elliptic Curve Cryptosystems

    Publication Year: 2011 , Page(s): 254 - 265
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2961 KB) |  | HTML iconHTML  

    Single and double scalar multiplications are the most computational intensive operations in elliptic curve based cryptosystems. Improving the performance of these operations is generally achieved by means of integer recoding techniques, which aim at minimizing the scalars' density of nonzero digits. The hybrid binary-ternary number system provides both short representations and small density. In this paper, we present three novel algorithms for both single and double scalar multiplication. We present a detailed theoretical analysis, together with timings and fair comparisons over both tripling-oriented Doche-Ichart-Kohel curves and generic Weierstrass curves. Our experiments show that our algorithms are almost always faster than their widely used counterparts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Architectures for the eta_T Pairing over Small-Characteristic Supersingular Elliptic Curves

    Publication Year: 2011 , Page(s): 266 - 281
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3379 KB) |  | HTML iconHTML  

    This paper is devoted to the design of fast parallel accelerators for the cryptographic ηT pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller's algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies that we considered to design our multiplier, we point out the intrinsic parallelism of Miller's loop and outline the architecture of coprocessors for the ηT pairing over F(2m) and F(2m). Thanks to a careful choice of algorithms for the tower field arithmetic associated with the ηT pairing, we manage to keep the pipelined multiplier at the heart of each coprocessor busy. A final exponentiation is still required to obtain a unique value, which is desirable in most cryptographic protocols. We supplement our pairing accelerators with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources. According to our place-and-route results on Xilinx FPGAs, our designs improve both the computation time and the area-time trade-off compared to previously published coprocessors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performing Arithmetic Operations on Round-to-Nearest Representations

    Publication Year: 2011 , Page(s): 282 - 291
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1234 KB) |  | HTML iconHTML  

    During any composite computation, there is a constant need for rounding intermediate results before they can participate in further processing. Recently, a class of number representations denoted RN-Codings were introduced, allowing an unbiased rounding-to-nearest to take place by a simple truncation, with the property that problems with double-roundings are avoided. In this paper, we first investigate a particular encoding of the binary representation. This encoding is generalized to any radix and digit set; however, radix complement representations for even values of the radix turn out to be particularly feasible. The encoding is essentially an ordinary radix complement representation with an appended round-bit, but still allowing rounding-to-nearest by truncation, and thus avoiding problems with double-roundings. Conversions from radix complement to these round-to-nearest representations can be performed in constant time, whereas conversion the other way, in general, takes at least logarithmic time. Not only is rounding-to-nearest a constant time operation, but so is also sign inversion, both of which are at best log-time operations on ordinary two's complement representations. Addition and multiplication on such fixed-point representations are first analyzed and defined in such a way that rounding information can be carried along in a meaningful way, at minimal cost. The analysis is carried through for a compact (canonical) encoding using two's complement representation, supplied with a round-bit. Based on the fixed-point encoding, it is shown possible to define floating-point representations, and a sketch of the implementation of an FPU is presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ISRA-Based Grouping: A Disk Reorganization Approach for Disk Energy Conservation and Disk Performance Enhancement

    Publication Year: 2011 , Page(s): 292 - 304
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3411 KB) |  | HTML iconHTML  

    Reducing disk energy consumption and improving disk performance in high-performance computer systems are increasingly pressing issues for reasons of disk economy and efficiency. To achieve these goals, we define the concept of Immediate Successor Relationship Amount (ISRA) to represent the successor relationship of data blocks, and propose an ISRA-based grouping algorithm for disk reorganization, based on an undirected graph. We group data blocks that experience frequent successive accesses, then sort them using a merge-sort-like algorithm to determine the position of every group as well as the new position of every block within those groups. We evaluate our approach in terms of disk seek time and disk energy consumption, using Disksim and the log energy model. The results show clearly that both disk seek time and the energy needs can be reduced by about 50 percent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TC Information for authors

    Publication Year: 2011 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2011 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (177 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org