Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Computers, IEEE Transactions on

Issue 6 • Date Jun 1988

Filter Results

Displaying Results 1 - 18 of 18
  • Systolic super summation

    Publication Year: 1988 , Page(s): 657 - 677
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1700 KB)  

    A principal limitation in accuracy for scientific computation performed with floating-point arithmetic is due to the computation of repeated sums, such as those that arise in inner products. A systolic super summer of cellular design is proposed for the high-throughput performance of repeated sums of floating-point numbers. The apparatus receives pipelined inputs of streams of summands from one or many sources. The floating-point summands are converted into a fixed-point form by a sieve-like pipelined cellular packet-switching device with signal combining. The emerging fixed-point numbers are then summed in a corresponding network of extremely long accumulators (i.e., super accumulators). At the cell level, the design uses a synchronous model of VLSI. The amount of time the apparatus needs to compute an entire sum depends on the values of summands; at this architectural level, the design is asynchronous. The throughput per unit area of hardware approaches that of a tree network, but without the long wire and signal propagation delay that are intrinsic to tree networks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theory of clocking for maximum execution overlap of high-speed digital systems

    Publication Year: 1988 , Page(s): 678 - 690
    Cited by:  Papers (6)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1052 KB)  

    The effect of clocking schemes on overlapped execution performance in a digital system is described and quantified. Effects of branching, data dependencies, and resource conflicts between consecutive tasks are considered. Some problems of clocking scheme synthesis for the design of digital systems with maximum execution overlap are examined. Effects of performance of the choice of clocking scheme, partitioning of functions into the time steps, the number of clock phases, the length of each phase (i.e., how to pipeline), and the assignment of functions to clock phases are treated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A simple method for determining Hadamard sequency vectors

    Publication Year: 1988 , Page(s): 743 - 745
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    A simple method for determining the sequency ordering of any row in any Hadamard matrix directly from its binary representation is developed. This proposed method is proved to be much simpler than the well-known bit-reverse inverse Gray code method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Definition and design of strongly language disjoint checkers

    Publication Year: 1988 , Page(s): 745 - 748
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB)  

    Strongly language-disjoint (SLD) checkers are to sequential systems what strongly code-disjoint checkers are to combinatorial systems. SLD checkers are the largest class of checkers with which a functional system can achieve the totally self-checking goal. Self-checking sequential systems are first addressed, and formal definitions of SLD checkers are given. The design of SLD checkers based on regular combinatorial self-checking components is then considered View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A synthesis algorithm for reconfigurable interconnection networks

    Publication Year: 1988 , Page(s): 691 - 699
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (812 KB)  

    The performance of a parallel algorithm depends in part on the interconnection topology of the target parallel system. An interconnection network is called reconfigurable if its topology can be changed between different algorithm executions. Since communication patterns vary from one parallel algorithm to another, a reconfigurable network can effectively support algorithms with different communication requirements. It is shown how to generate a network topology that is optimized with respect to the communication patterns of a given task. The algorithm presented takes as input a task graph and generates as output a topology that closely matches the given input graph. The topologies generated by the algorithm are analyzed with respect to optimum interconnection topologies for the best, worst, and average cases. Simulation results verify the average-case performance prediction and confirm that, on the average, the optimum topologies are generated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On two-dimensional via assignment for single-row routing

    Publication Year: 1988 , Page(s): 721 - 727
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (632 KB)  

    The authors study the via assignment problem when vias are allowed to appear rowwise as well as columnwise. Previously they proved that the problem belongs to the class of NP-hard problems and therefore it is unlikely that polynomial-time algorithms exist for solving the problem. Two heuristics (HEU1 and HEU2) to solve the problem were proposed. HEU1 splits the nets before any routing is done while HEU2 assigns the nets alternately to via rows and via columns. Here they modify HEU2 so that the side of the board to which the nets are assigned first for connection is selected according to a desired ratio of board width to height View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimum complexity FIR filters and sparse systolic arrays

    Publication Year: 1988 , Page(s): 760 - 764
    Cited by:  Patents (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB)  

    The properties of B-spline approximation and the integral/derivative properties of convolution lead to efficient algorithms for the implementation of multidimensional FIR filters. The implementations are of minimum time complexity under the Nyquist criterion. The algorithm can easily be implemented using a sparse systolic array architecture. The resulting B-spline convolvers have much lower circuit complexity than systolic architectures based on conventional convolution algorithms. A two-dimensional hardware implementation based on simplifications of current architectures is presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cache operations by MRU change

    Publication Year: 1988 , Page(s): 700 - 709
    Cited by:  Papers (37)  |  Patents (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (752 KB)  

    The performance of set associative caches is analyzed. The method used is to group the cache lines into regions according to their positions in the replacement stacks of a cache, and then to observe how the memory access of a CPU is distributed over these regions. Results from the preserved CPU traces show that the memory accesses are heavily concentrated on the most recently used (MRU) region in the cache. The concept of MRU change is introduced; the idea is to use the event that the CPU accesses a non-MRU line to approximate the time the CPU is changing its working set. The concept is shown to be useful in many aspects of cache design and performance evaluation, such as comparison of various replacement algorithms, improvement of prefetch algorithms, and speedup of cache simulation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Benchmark synthesis using the LRU cache hit function

    Publication Year: 1988 , Page(s): 637 - 645
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (696 KB)  

    The LRU cache hit function is used as a general characterization of locality of reference to address the synthesis question of whether benchmarks can be created that have a required locality of reference. Several results are given that show circumstances under which this synthesis can or cannot be achieved. An additional characterization called the warm-start cache hit function is introduced and shown to be efficiently computable. The operations of repetition and replication are used to form new programs, and their characteristics are derived. Using these operations, a general benchmark synthesis technique is obtained and demonstrated with an example View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A generalized message-passing mechanism for communicating sequential processes

    Publication Year: 1988 , Page(s): 646 - 651
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (464 KB)  

    Bidirectional message-passing (bi-io), a novel symmetric communication mechanism for concurrent processes, is introduced and developed. The mechanism is symmetric in the sense that, in one atomic action, a message is transmitted in each direction between two processes. For some applications (tree structure, systolic arrays) this method is shown to have several advantages over conventional synchronization and communication primitives (mainly conciseness of programs, absence of certain types of deadlock). The mechanism is rigorously defined with a CSP-like syntax and a weakest-precondition semantics. Two systolic arrays are developed using bidirectional message-passing: a matrix-vector multiplier and a palindrome recognizer View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Systolic tree implementation of data structures

    Publication Year: 1988 , Page(s): 727 - 735
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (680 KB)  

    Systolic tree architectures are presented for data structures such as stacks, queues, dequeues, priority queues, and dictionary machines. The stack, queue, and dequeue have a unit response time and a unit pipeline interval. The priority queue also has a unit response time, but the pipeline interval is 2. The response time and pipeline interval for the dictionary machine are O(log n) and O(1), respectively, where n is the number of data elements currently residing in the tree. In each node of the tree, the mechanism for controlling the transmission and distribution of data is finite state. This feature makes the designs presented here suitable for VLSI. If there are n data elements in the data structure, the depth of the tree is O(log n) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Approximate analysis of fork/join synchronization in parallel queues

    Publication Year: 1988 , Page(s): 739 - 743
    Cited by:  Papers (54)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (432 KB)  

    An approximation technique, called scaling approximation, is introduced and applied to the analysis of homogeneous fork/join queuing systems consisting of K⩾2 servers. The development of the scaling approximation technique is guided by both experimental and theoretical considerations. The approximation is based on the observation that there exist upper and lower bounds on the mean response time that grow at the same rate as a function of K. Simple, closed-form approximate expressions for the mean response time are derived and compared to simulation results. The relative error in the approximation is less than 5% for K⩽32 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Strongly code disjoint checkers

    Publication Year: 1988 , Page(s): 751 - 756
    Cited by:  Papers (93)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB)  

    Strongly code-disjoint (SCD) checkers are defined and shown to include totally self-checking (TSC) code-disjoint checkers. This type of checker is the natural companion of strongly fault-secure (SFS) networks. SCD checkers are the largest class of checkers with which a combinational system may achieve the TSC goal. Some examples are given to illustrate the design of SCD checkers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Abstract specification of synchronous data types for VLSI and proving the correctness of systolic network implementations

    Publication Year: 1988 , Page(s): 710 - 720
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (992 KB)  

    A combined methodology is presented for specifying abstract synchronous data types and proving the correctness of systolic network implementations. It is shown that an extension of the Parnas trace method of specifying software modules containing distinct access programs yields a natural method of specifying abstract synchronous data types that possess distinct access operators and are intended for implementation in VLSI. Associated systematic proof techniques are presented, and the correctness of several novel systolic network implementations of familiar data types is established. The methodology appears to be naturally suited to systolic network implementations with their associated rippling of control flow and data flow. The important distinction between systolic control-flow networks and systolic data-flow networks is presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Functional test generation based on unate function theory

    Publication Year: 1988 , Page(s): 756 - 760
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (492 KB)  

    The generation of a universal test set (UTS) for unate functions is used as a starting point. This test set is complete and minimal for the set of all unateness-preserving faults. However, for functions that are not unate in any variable, the UTS generated by this algorithm is the exhaustive set. An algorithm is presented that computes a good functional test set (GFTS) of reasonable size even for such functions. The algorithm does this by breaking up functions into more unate components, recursively computing GFTS for them, and combining the test sets in an appropriate way. The GFTS generated by the algorithm is compared to random test sets of the same size for gate-level fault coverage in typical implementations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Continuous models for communication density constraints on multiprocessor performance

    Publication Year: 1988 , Page(s): 652 - 656
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (416 KB)  

    Fundamental limits on the communication capabilities of massively parallel multiprocessors are investigated. It is shown that in the limit of machines of infinite extent in which the number of processors per unit volume is constant and in which the communication bandwidth from each processor to its neighbors depends only on their separation distance, interprocessor communication must fall off faster than the fourth power of distance. For machines of finite size, communication energy density is used as a metric to compare various machine sizes and packaging densities. For instance, for machines with spherical symmetry and uniform communication requirements, the peak density depends on the number of processors to the 4/3 power and the number of processors per unit volume to the 2/3 power View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A comparison of VLSI architecture of finite field multipliers using dual, normal, or standard bases

    Publication Year: 1988 , Page(s): 735 - 739
    Cited by:  Papers (36)  |  Patents (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (640 KB)  

    Three different finite-field multipliers are presented: (1) a dual-basis multiplier due to E.R. Berlekamp (1982); the Massey-Omura normal basis multiplier; and (3) the Scott-Tavares-Peppard standard basis multiplier. These algorithms are chosen because each has its own distinct features that apply most suitably in particular areas. They are implemented on silicon chips with NMOS technology so that the multiplier most desirable for VLSI implementation can readily be ascertained View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new bit-serial systolic multiplier over GF(2m)

    Publication Year: 1988 , Page(s): 749 - 751
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (216 KB)  

    A bit-serial systolic array has been developed to computer multiplications over GF(2m). In contrast to a previously designed systolic multiplier, this algorithm allows the input elements to center a linear systolic array in the same order, and the system only requires one control signal View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org