Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Computers, IEEE Transactions on

Issue 10 • Date Oct. 1985

Filter Results

Displaying Results 1 - 15 of 15
  • Preface

    Publication Year: 1985 , Page(s): 873
    Save to Project icon | Request Permissions | PDF file iconPDF (430 KB)  
    Freely Available from IEEE
  • Iterative solution of large, sparse linear systems on a static data flow architecture: Performance studies

    Publication Year: 1985 , Page(s): 874 - 880
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1505 KB)  

    The applicability of static data flow architectures to the iterative solution of sparse linear systems of equations is investigated. An analytic performance model of a static data flow computation is developed. This model includes both spatial parallelism, concurrent execution in multiple PEs, and pipelining, the streaming of data from array memories through the PEs. The performance model is used to analyze a row-partitioned iterative algorithms for solving sparse linear systems of algebraic equations. On the basis of this analysis, design parameters for the static data flow architecture as a function of matrix sparsity and dimension are proposed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed execution of functional programs using serial combinators

    Publication Year: 1985 , Page(s): 881 - 891
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2270 KB)  

    A general strategy for automatically decomposing and dynamically distributing a functional program is discussed. The strategy is suitable for parallel execution on multiprocessor architectures with no shared memory. It borrows ideas from data flow and reduction machine research on the one hand, and from conventional compiler technology for sequential machines on the other. One of the more troublesome issues in such a system is choosing the right granularity for the parallel tasks. As a solution, the authors describe a program transformation technique based on serial combinators that offers in some sense just the right granularity for this style of computing, and that can be fine-tuned for particular multiprocessor architectures. Simulation demonstrates the success of this approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fat-trees: Universal networks for hardware-efficient supercomputing

    Publication Year: 1985 , Page(s): 892 - 901
    Cited by:  Papers (188)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2279 KB)  

    The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without resorting to a special-purpose architecture. It is proved that a fat-tree of a given size is nearly the best routing network of that size. This universality theorem is established using a three-dimensional VLSI model that incorporates wiring as a direct cost. In this model, hardware size is measured as physical volume. It is proved that for any given amount of communications hardware, a fat-tree built from that amount of hardware can stimulate every other network built from the same amount of hardware, using only slightly more time (a polylogarithmic factor greater). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault location techniques for distributed control interconnection networks

    Publication Year: 1985 , Page(s): 902 - 910
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2371 KB)  

    One class of networks suitable for use in parallel processing systems is the multistage cube network. The authors focus on fault location procedures suitable for use in networks that use distributed routing control through the use of routing tags and message transmission protocols. Faults occurring in the data lines can corrupt message routing tags transmitted over them and thereby cause misrouting of messages. Protocol lines (used in handshaking between network sources and destinations), if faulty, can prevent a message path from being established or can cause the path to `lock up' once transmission of data has begun. These faults have more pronounced effects on the network performance than faults previously considered for centralized routing control systems. The single-fault location procedures presented form a logical superset to those of the centralized control systems (where message routing is dictated by the actions of a global control unit) and can be adapted for use in both circuit and packet switching networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A comparative study of unification algorithms for OR-parallel execution of logic languages

    Publication Year: 1985 , Page(s): 911 - 917
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1555 KB)  

    As a step toward designing a computer architecture suitable for executing parallel logic languages, the author has studied some memory management techniques proposed for creating multiple binding environments, which are required with OR-parallelism. Three algorithms have been implemented using a Prolog-like interpreter and have been tried on some logic programs, to attempt to compare their relative performance. The author describes these algorithms and their implementation and discusses the results of the performance analysis. The attempts compare the algorithms, although accurate comparisons are difficult to make since some aspects of the algorithms are architecture-dependent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bandwidth availability of multiple-bus multiprocessors

    Publication Year: 1985 , Page(s): 918 - 926
    Cited by:  Papers (36)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1776 KB)  

    The effect of failures on the performance of multiple-bus multiprocessors is considered. Bandwidth expressions for this architecture are derived for uniform and nonuniform memory references. Mathematical models are developed to compute the reliability and the performance-related bandwidth availability (BA). The results obtained for the multiple-bus interconnection are compared with those of a crossbar. The models are also extended to analyze the partial bus structure, where the memories are divided into groups and each group is connected to a subset of buses. The reliability and the BA of the multiple-bus and partial bus architectures are compared. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An empirical study of automatic restructuring of nonnumerical programs for parallel processors

    Publication Year: 1985 , Page(s): 927 - 933
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1734 KB)  

    The feasibility of automatic restructuring of nonnumerical programs for parallel processing is studied through experiments using Parafrase, an automatic restructurer at the University of Illinois, Urbana-Champaign. Parallel processing speedup results due to automatic restructuring for several basic nonnumerical problems are presented. The loops encountered are classified at a low level. On the basis of the speedup results and the analyses of the loop types, the difficulty and the effectiveness of automatic restructuring are discussed. The experiments suggest that automatic restructuring can be a useful tool for exploiting parallelism in the sequential form of nonnumerical programs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A semi-Markov model for the performance of multiple-bus systems

    Publication Year: 1985 , Page(s): 934 - 942
    Cited by:  Papers (26)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1628 KB)  

    A discrete-time model is presented of memory interference in multiprocessor systems using multiple-bus interconnection networks. It differs from earlier models in its ability to model variable connection time and arbitrary inter-request time. The model describes each processing element's behavior by means of a semi-Markov process, taking as input the number of processing elements, the number of memory modules, the number of buses, the mean think time of the processing elements, and the first and second moments of the connection time between processing elements and memories. The model produces as output the memory bandwidth, processing element utilization, memory module utilization, average queue length at a memory, and average waiting time experienced by a processing element while waiting to access a memory. Using the model, it is possible to analyze the interaction of the input parameters on the system performance without using a complex Markov chain; a four-state semi-Markov process is sufficient regardless of the think and connection time distributions. The accuracy and capability of the model are illustrated. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • “Hot spot” contention and combining in multistage interconnection networks

    Publication Year: 1985 , Page(s): 943 - 948
    Cited by:  Papers (160)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1471 KB)  

    The combining of messages within a multistage switching network has been proposed to reduce memory contention in highly parallel shared-memory multiprocessors, especially for shared lock and synchronization data. A quantitative investigation of the performance impact of such contention and the effectiveness of combining in reducing this impact is reported. The effect of a nonuniform traffic pattern consisting of a single hot spot of higher access rate superimposed on a background of uniform traffic was investigated. The potential degradation due to even moderate hot spot traffic was found to be very significant, severely degrading all memory access, not just access to shared lock locations, due to an effect the authors call tree saturation. The technique of message combining was found to be an effective means of eliminating this problem if it arises due to lock or synchronization contention. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the effective bandwidth of interleaved memories in vector processor systems

    Publication Year: 1985 , Page(s): 949 - 957
    Cited by:  Papers (34)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1799 KB)  

    Memory interleaving and multiple access ports are the key to a high memory bandwidth in vector processor systems. Each of the active ports supports an independent access stream to memory among which access conflicts may arise. Such conflicts lead to a decrease in memory bandwidth. The authors present some analytical results for the calculation of the resulting effect bandwidth for one and two access streams to a memory system in a vector processor. In particular, conditions for conflict-free access are given together with some conflicting cases that should be avoided. Finally, examples of measurements on a Cray X-MP and corresponding simulations are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On testing isomorphism of permutation networks

    Publication Year: 1985 , Page(s): 958 - 962
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1243 KB)  

    The problem of constructing equivalence maps between two multistage permutation networks is considered. A branch-and-bound algorithm is given to test whether two such networks are equivalent in polynomial time. Whenever they are, the algorithm also determines a map that conjugates one network onto the other. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of parallel branch-and-bound algorithms

    Publication Year: 1985 , Page(s): 962 - 964
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (867 KB)  

    Consideration is given to the performance of parallel best-bound-first branch-and-bound algorithms in which several nodes with least lower bounds are expanded simultaneously. It is well known that anomalies may occur in the execution of a parallel branch-and-bound algorithm. The authors show the conditions under which anomalies are guaranteed not to occur when the number of processors is doubled, or not even doubled. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The power of parallel prefix

    Publication Year: 1985 , Page(s): 965 - 968
    Cited by:  Papers (38)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (957 KB)  

    The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using pn processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-ε)(ε〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A cache-based multiprocessor with high efficiency

    Publication Year: 1985 , Page(s): 968 - 972
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1118 KB)  

    Shared-memory multiprocessors to support concurrent languages for general-purpose multitasked systems are analyzed. To solve the traditional performance problems caused by memory access latency and conflicts, extensive caching of instructions and data is performed in each processor mode. Caches are private to each processor, and coherence is maintained in hardware between the caches. To maintain a good efficiency, several contexts are resident in each processor. On a miss in the cache, a microswitch to another resident context is operated by changing the program counter and a pointer in the register memory. The instruction set of each processor is RISC-like, so that a microswitch should waste few machine cycles. The proposed system has high efficiency, even when the number of processors increases and when the coherence overhead and conflicts are high. Models are developed to evaluate throughput and efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org