By Topic

Computers, IEEE Transactions on

Issue 3 • Date Mar 1990

Filter Results

Displaying Results 1 - 14 of 14
  • Executing a program on the MIT tagged-token dataflow architecture

    Page(s): 300 - 318
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1700 KB)  

    The MIT Tagged-Token Dataflow Project has an unconventional, but integrated approach to general-purpose high-performance parallel computing. Rather than extending conventional sequential languages, Id, a high-level language with fine-grained parallelism and determinacy implicit in its operational semantics, is used. Id programs are compiled to dynamic dataflow graphs, which constitute a parallel machine language. Dataflow graphs are directly executed on the MIT tagged-token dataglow architecture (TTDA), a multiprocessor architecture. An overview of current thinking on dataflow architecture is provided by describing example Id programs, their compilation to dataflow graphs, and their execution on the TTDA. Related work and the status of the project are described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient VLSI parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions

    Page(s): 400 - 404
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (556 KB)  

    An algorithm with worst case time complexity O(log2 N) in two dimensions and O(m1/2 log N) in three dimensions with N input points and m as the number of tetrahedra in triangulation is given. Its AT2 VLSI complexity on Thompson's logarithmic delay model, (1983) is O(N2log6 N) in two dimensions and O(m2N log4 N) in three dimensions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A note on the linear transformation method for systolic array design

    Page(s): 393 - 399
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB)  

    The use of the linear transformation method to systolize the Warshall algorithm for computing the transitive closure of a graph on a mesh-connected array (without wraparound connections) is discussed. The technique is extended to design linear systolic arrays. The advantage of this approach is easy verification of correctness, as well as synthesis of a family of arrays with tradeoffs between I/O bandwidth, number of processing elements, and local storage. The technique can be further refined to cope with problems that entail nonconstant dependency vectors View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Utilizing bandwidth sharing in the slotted ring

    Page(s): 289 - 299
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB)  

    A slotted-ring protocol that performs well across the full range of message length distributions is presented. The relative performance of the protocol is best at low to medium ring utilization, which is the most usual operating condition for local area computer networks. The protocol is not subject to the normal requirement for repeating source and destination addresses in each slot of a multiple-slot message. This reduced overhead feature is a main reason for the performance gains that are achieved. The protocol does not depend on any central control station for assigning slot usage to individual stations. However, it does require each ring station to keep track of the current status and source station usage of each slot on the ring. Implementation of the protocol would require significantly more complex logic circuits than are normally needed in either token rings or conventional slotted rings, and error recovery would be more difficult. Hence, its main value is that it serves as an indicator of the maximum achievable performance of the slotted format for local computer network rings operating at low to medium utilization levels under fully distributed access control View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An improved hardware implementation of the fault-tolerant clock synchronization algorithm for large multiprocessor systems

    Page(s): 404 - 407
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (416 KB)  

    An improved implementation of clock synchronization of multiprocessor systems in the presence of malicious faults is proposed. The proposed hardware implementation for the reference clock selection has a lower gate complexity, smaller time delay, and greater flexibility than the previously published implementation. The improvement is achieved by replacing the sorter with a counting encoder and comparators and by introducing threshold generation logic with programmable registers. The scheme has a gate complexity of O(n) and a delay of O(log n), where n is the total number of inputs to a particular clock, and is programmable for different values of n and m, the maximum number of faults View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel communication in a large distributed environment

    Page(s): 328 - 348
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1320 KB)  

    The evolution of MultiRPC, a parallel remote procedure call mechanism implemented in Unix is described. Parallelism is obtained from the concurrency of processing on servers and from the overlapping of retransmissions and timeouts. Each of the parallel calls retains the semantics and functionality of the standard remote procedure calls. The underlying communication medium need not support multicast or broadcast transmissions. An analytic model of the system is derived and validated. The experimental observations demonstrate the feasibility of using MultiRPC to contact up to 100 servers in parallel View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient TSC 1-out-of-3 code checker

    Page(s): 407 - 411
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (424 KB)  

    A design method for a combinational totally self-checking (TSC) 1-out-of-3 code checker is presented. This method is not only simpler and more efficient than others, but is also successful in the case where more than one 1-out-of-3 code exists in a TSC system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of high-speed and cost-effective self-testing checkers for low-cost arithmetic codes

    Page(s): 360 - 374
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1096 KB)  

    Methods for designing self-testing checkers (STCs) for arithmetic error-detecting codes are presented. First, general rules for the design of minimal-level STCs for any error-detecting code are given. The design is illustrated with STCs for 3N+B codes, 0⩽B ⩽2. Then the recursive structure of both 3N+B codes and residue/inverse-residue codes with check base A=3 is revealed. The resulting design of STCs is very flexible and universal, in the sense that an iterative, cost-effective, or high-speed version of the checker can be designed for either code. The design approach, unlike previous approaches for arithmetic codes, gives a unified treatment to STCs for nonseparate (3N+B) and separate (residue and inverse residue) codes. The speed and the complexity of the STC for a code from either class with n bits are about the same. Both high-speed checkers (which have up to three gate levels) and cost-effective checkers are faster and require less hardware than analogous checkers proposed for 3N codes and for residue codes with A=3 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the design of combinational totally self-checking 1-out-of-3 code checkers

    Page(s): 387 - 393
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (640 KB)  

    The authors present the design of an 11-transistor combinational NMOS 1-out-of-3 code checker. The checker is totally self-checking (TSC) with respect to 36 faults out of a total of 58 faults defined at the NMOS switch and layout geometrical levels, and achieves the TSC goal of a checker for most of the fault sequences. The minimum fault sequences under which the TSC goal is lost are composed of at least three faults. This might be considered as a sufficient level of safety for some implementations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design of a testable parallel multiplier

    Page(s): 411 - 416
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (448 KB)  

    A scheme for an easily testable multiplier and the corresponding test generation procedures are presented. To provide 100% controllability of the summand-counter, the summand-generator is modified. The modified summand-generator can be implemented with little hardware overhead. Since the summands are 100% controllable, the summand-counter can be constructed with the minimum number of adder cells. The multiplier is not C-testable, but can be tested with a small numbers of test vectors, i.e. 3n+60 vectors. It requires only one extra input, whereas C-testable multipliers usually require at least four or five extra inputs and more adder cells along with extra circuitry. Using the modified summand-generator, other types of multipliers can be easily constructed to be testable with only one extra input. Test sets for these multipliers can be obtained using the same test generation approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis and implementation of branch-and-bound algorithms on a hypercube multicomputer

    Page(s): 384 - 387
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (432 KB)  

    The feasibility of implementing best-first (best-bound) branch-and-bound algorithms on hypercube multicomputers is discussed. The computationally-intensive nature of these algorithms might lead a causal observer to believe that their parallelization is trivial. However, as the number of processors grows, two goals must be satisfied to some degree in order to maintain a reasonable level of efficiency. First, processors must be kept busy doing productive work (i.e. exploring worthwhile subproblems). Second, the number of interprocessor communications must be minimized along the critical path in the state-space tree from the original problem to the subproblem yielding a solution. It is difficult to improve performance in one of these areas without degrading performance in the other. Analytical models for the execution time of loosely synchronous and asynchronous parallel branch-and-bound algorithms are presented, and the models are validated with data from the execution of five algorithms that solve the traveling salesperson problem View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A buffer-based method for storage allocation in an object-oriented system

    Page(s): 375 - 383
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB)  

    As some object-oriented computing systems support the object type buffer by hardware, it is reasonable to base their free storage management on a set of buffers, each containing free storage blocks of a specific length. Then memory space can normally be allocated and deallocated by simple buffer read and write operations. More complex routines have to be executed only when a buffer full or buffer empty exception is raised. Their task is to clear a buffer position or to insert a storage descriptor into the buffer, respectively. An algorithm that handles such exceptions by splitting and recombination of free blocks and relocation of objects is described. The algorithm is tuned by a set of parameters that specify the amount of descriptors the buffers may hold immediately after exception handling. As a consequence, each buffer receives a moderate number of descriptors, so that the probability of further exceptions is reduced. Moreover, the parameters control the tradeoff between relocation costs and resulting storage fragmentation. The performance of the algorithm is evaluated by an analytical and a simulation model, and methods to find optimal parameter values are described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers

    Page(s): 349 - 359
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1016 KB)  

    The problems of data dependency resolution and precise interrupt implementation in pipelined processors are combined. A design for a hardware mechanism that resolves dependencies dynamically and, at the same time, guarantees precise interrupts is presented. Simulation studies show that by resolving dependencies the proposed mechanism is able to obtain a significant speedup over a simple instruction issue mechanism as well as implement precise interrupts View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of multibuffered packet-switching networks in multiprocessor systems

    Page(s): 319 - 327
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (740 KB)  

    An analytic model and analytic results for the performance of multibuffered packet-switching interconnection networks in multiprocessor systems are presented. The performance of single-buffered delta networks is first modeled using the state transition diagram of a buffer. The model is then extended to account for multiple buffers. The analytic results for multibuffered delta networks are compared to simulation results. The performance of multibuffered data manipulator networks is analyzed to demonstrate the generality of the model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au