By Topic

Computers, IEEE Transactions on

Issue 4 • Date April 1994

Filter Results

Displaying Results 1 - 14 of 14
  • Comments on "Area-time optimal adder design"

    Page(s): 507 - 512
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (409 KB)  

    A previous paper by Wei and Thompson (1990) defined a family of adders based on a modular design and presented an excellent systematic method of implementing a VLSI parallel adder using three types of component cells designed in static CMOS. Their approach to the adder design was based on the optimization of a formulated dynamic programming problem with respect to area and time. The authors first explicitly demonstrate the optimal 32-bit fast carry generator, described by Wei and Thompson, is incorrect. With suitable corrections, a correct 32-bit fast carry generator design is then presented. Next, BiCMOS technology is applied to implement the subcircuit of fast carry generator to accelerate the critical path. The authors show that the critical path delay of 16-bit, 32-bit and 66-bit adders is respectively shortened to 83.89%, 86.89% and 90.62% after introducing the BiCMOS drivers.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A state assignment approach to asynchronous CMOS circuit design

    Page(s): 460 - 469
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (916 KB)  

    Present a new algorithm for state assignment in asynchronous circuits so that for each circuit state transition, only one (secondary) state variable switches. No intermediate unstable states are used. The resultant circuits operate at optimum speed in terms of the number of transitions made and use only static CMOS gates. By reducing the number of switching events per state transition, noise due to the switching events is reduced and dynamic power dissipation may also be reduced. This approach is suitable for asynchronous sequential circuits that are designed from flow tables or state transition diagrams. The proposed approach may also be useful for designing synchronous circuits, but explorations into the subject of clock power would be necessary to determine its usefulness View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Testing iterative logic arrays for sequential faults with a constant number of patterns

    Page(s): 495 - 501
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (596 KB)  

    Shows that a constant number of test vectors are sufficient for fully testing a k-dimensional ILA for sequential faults if the cell function is bijective. The authors then present an efficient algorithm to obtain such a test sequence. By extending the concept of C-testability and M-testability to sequential faults, the constant-length test sequence can be obtained. A pipelined array multiplier is shown to be C-testable with only 53 test vectors for exhaustively testing the sequential faults View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concurrent process monitoring with no reference signatures

    Page(s): 475 - 480
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    A simple, inexpensive and time/space efficient signature technique for process monitoring is presented. In this technique, a known signature function is applied to the instruction stream at compilation phase and when the accumulated signature forms an m-out-of-n code, the corresponding instructions are tagged. Error checking is done at run-time by monitoring the signatures accumulated at the tagged locations to determine whether they form m-out-of-n codes. This approach of signature checking does not require the embedding of reference signatures at compilation, thereby leading to savings in memory as well as in execution time. The m-out-of-n code approach offers high error coverage and controllable latency. The results of the experiments conducted to verify the controllability of the latency are discussed. One of the distinguishing features of the proposed scheme is the elimination of reference signatures, which are the main source of memory and time overhead in the existing techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of asynchronous binary arbitration on digital transmission-line busses

    Page(s): 484 - 489
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (648 KB)  

    A common misconception is that asynchronous binary arbitration settles in at most four units of bus-propagation delay, irrelevant of the number of arbitration bus lines. The author disproves this conjecture by presenting an arrangement of modules on m bus lines, for which binary arbitration requires [m/2] units of bus-propagation delay to settle. He also proves that for any arrangement of modules on m bus lines, binary arbitration settles in at most [m/2]+2 units of bus-propagation delay View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Innovative structures for CMOS combinational gates synthesis

    Page(s): 385 - 399
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1304 KB)  

    Design of multiple outputs CMOS combinational gates is studied. Two techniques for minimization of multiple output functions at the switching level are introduced. These techniques are based on innovative transistor interconnection structures named Delta and Lambda networks. The two techniques can be combined together to obtain further area reductions. Different synthesis algorithms are discussed, from exhaustive enumeration to branch and bound to heuristic techniques allowing to speed up the synthesis process. Simulation results for synthesis are introduced to compare the different algorithms. Design examples are also provided. Electrical simulations show that the dynamic behavior of such structures is comparable to the traditional static or domino implementations (obviously the new and traditional structures have the same static behavior) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Uniform parity group distribution in disk arrays with multiple failures

    Page(s): 501 - 506
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (644 KB)  

    Several new disk arrays have recently been proposed in which the parity groupings are uniformly distributed throughout the array so that the extra workload created by a disk failure can be evenly shared by all the surviving disks, resulting in the best possible degraded mode performance. Many arrays now also put in multiple spare disks so that expensive service calls can be deferred. Furthermore, in a new sparing scheme called distributed sparing, the spare spaces are actually distributed throughout the array. This means after a rebuild the new array will be logically different from the original array. The authors present an algorithm for constructing and maintaining arrays with distributed sparing so that repeated uniform parity group distribution is achieved with each successive failure View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computational arrays with flexible redundancy

    Page(s): 413 - 430
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1620 KB)  

    Different multiple redundancy schemes for fault detection and correction in computational arrays are proposed and analyzed. The basic idea is to embed a logical array of nodes onto a processor/switch array such that d processors, 1⩽d⩽4, are dedicated to the computation associated with each node. The input to a node is directed to the d processors constituting that node, and the output of the node is computed by taking a majority vote among the outputs of the d processors. The proposed processor/switch array (PSVA) is versatile in the sense that it may be configured as a nonredundant system or as a system which supports double, triple or quadruple redundancy. It also allows for spares to be distributed in the PSVA in a way that permits spare sharing among nodes, thus enhancing the overall system reliability. In addition to choosing the required degree of redundancy, the flexibility of the PSVA architecture allows for the embedding of redundant arrays onto defective PSVA's and for run-time reconfiguration to avoid faulty processors and switches. Different embedding and reconfiguration algorithms are presented and analyzed using Markov chain techniques, using probability arguments, and via simulation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A totally self-checking checker for a parallel unordered coding scheme

    Page(s): 490 - 495
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (568 KB)  

    Bose has developed a parallel unordered coding scheme using only r checkbits for 2r information bits. This code can detect all unidirectional errors and requires simple parallel encoding/decoding. The information symbols can be separated from the check symbols. However, the information symbols containing all zeros and all ones need to be transformed to two other information symbols. This allows one to reduce the number of checkbits over Berger code by 1. Since information symbols containing a power-of-two number of bits are quite common, this coding scheme should become quite popular. The authors describe a modular, economical, and easily testable totally self-checking (TSC) checker design for the above code. The TSC concept is well known for providing concurrent error detection of transient as well as permanent faults. The design is self-testing with at most only 2r+16 codeword tests. This means that if k is the number of information bits, the size of the codeword test set is only O(log2 k). This is the first known TSC checker design for this code View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliable floating-point arithmetic algorithms for error-coded operands

    Page(s): 400 - 412
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1072 KB)  

    Reliable floating-point arithmetic is vital for dependable computing systems. It is also important for future high-density VLSI realizations that are vulnerable to soft-errors. However, the direct checking of floating-point arithmetic is still an open problem. The author presents a set of reliable floating-point arithmetic algorithms for low-cost residue encoded and Berger encoded operands, respectively. Closed form equations are derived for floating-point addition, subtraction, multiplication, and division. Given the standard IEEE floating-point numbers, the proposed reliable floating-point multiplication algorithms for low-cost residue encoded operands are extremely low-cost: it requires less than 8% of hardware redundancy in all cases. For reliable floating-point addition and subtraction, the author finds the hardware redundancy ratios of applying low-cost residue code is about the same as that of applying Berger code: less than 40% of hardware redundancy for single precision numbers and about 16% for double precision numbers. For reliable floating-point division, Berger encoded operands yields hardware cost-effectiveness: about 45% for single precision numbers and about 36% for double precision numbers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal channel access protocol with multiple reception capacity

    Page(s): 480 - 484
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB)  

    A multiple access packet communication model is analyzed in which the users can receive packets on more than one common channel. For this type of system, a new channel access protocol is presented. The authors prove that under heavy homogeneous load the protocol guarantees the maximum achievable throughput among all possible protocols. The general model can be applied to different systems, according to various realizations of the logical channels. For example, in packet radio networks the channels can be realized by different carrier frequencies (FDMA) or by different codes (CDMA). The simplicity and optimality of the protocol make it attractive for practical applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instruction window size trade-offs and characterization of program parallelism

    Page(s): 431 - 442
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1156 KB)  

    Detecting independent operations is a prime objective for computers that are capable of issuing and executing multiple operations simultaneously. The number of instructions that are simultaneously examined for detecting those that are independent is the scope of concurrency detection. The authors present an analytical model for predicting the performance impact of varying the scope of concurrency detection as a function of available resources, such as number of pipelines in a superscalar architecture. The model developed can show where a performance bottleneck might be: insufficient resources to exploit discovered parallelism, insufficient instruction stream parallelism, or insufficient scope of concurrency detection. The cost associated with speculative execution is examined via a set of probability distributions that characterize the inherent parallelism in the instruction stream. These results were derived using traces from a Multiflow TRACE SCHEDULING compacting FORTRAN 77 and C compilers. The experiments provide misprediction delay estimates for 11 common application-level benchmarks under scope constraints, assuming speculative, out-of-order execution and run time scheduling. The throughput prediction of the analytical model is shown to be close to the measured static throughput of the compiler output View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An analysis of edge fault tolerance in recursively decomposable regular networks

    Page(s): 470 - 475
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    Fault tolerance of interconnection networks is one of the major considerations in evaluating the reliability of large scale multiprocessor systems. In the paper, the reliability of a family of regular networks with respect to edge failures is investigated using four different fault tolerance measures. Two probabilistic measures, resilience and restricted resilience, are developed, used to evaluate disconnection likelihoods using two different failure models, and compared with corresponding deterministic measures. The network topologies chosen for the present study all have the recursive decomposition property, where larger networks can be decomposed into copies of smaller networks of the same topology. This family of graphs includes the k-ary n-cube, star and cube connected cycle graphs, which have optimal deterministic connectivities. The probabilistic fault tolerance measures, however, are found to depend on topological properties such as network size and degree View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal parallel and pipelined processing through a new class of matrices with application to generalized spectral analysis

    Page(s): 443 - 459
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1240 KB)  

    A new class of general-base matrices, named sampling matrices, which are meant to bridge the gap between algorithmic description and computer architecture is proposed. “Poles,” “zeros,” “pointers,” and “spans” are among the terms introduced to characterize properties of this class of matrices. A formalism for the decomposition of a general matrix in terms of general-base sampling matrices is proposed. “Span” matrices are introduced to measure the dependence of a matrix span on algorithm parameters and, among others, the interaction between this class of matrices and the general-base perfect shuffle permutation matrix previously introduced. A classification of general-base parallel “recirculant” and parallel pipelined processors based on memory topology, access uniformity and shuffle complexity is proposed. The matrix formalism is then used to guide the search for algorithm factorizations leading to optimal parallel and pipelined processor architecture View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au