By Topic

Computers, IEEE Transactions on

Issue 12 • Date Dec 1988

Filter Results

Displaying Results 1 - 18 of 18
  • Pairwise reduction for the direct, parallel solution of sparse, unsymmetric sets of linear equations

    Publication Year: 1988 , Page(s): 1648 - 1654
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (776 KB)  

    A paradigm for concurrent computing is explored in which a group of autonomous, asynchronous processes shares a common memory space and cooperates to solve a single problem. The processes synchronize with only a few others at a time; barrier synchronization is not permitted except at the beginning and end of the computation. The paradigm maps directly to a shared-memory multiprocessor with efficient synchronization primitives and is applied to the solution of a large, sparse system of linear equations. The algorithm, called pairwise solve (or PSolve), is presented with several variants to address some of the limitations of previous algorithms. On the Alliant FX/8, PSolve is faster than Gaussian elimination and two common sparse matrix algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthesizing linear array algorithms from nested FOR loop algorithms

    Publication Year: 1988 , Page(s): 1578 - 1598
    Cited by:  Papers (40)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1676 KB)  

    The mapping of algorithms structured as depth-p nested FOR loops into special-purpose systolic VLSI linear arrays is addressed. The mappings are done by using linear functions to transform the original sequential algorithms into a form suitable for parallel execution on linear arrays. A feasible mapping is derived by identifying formal criteria to be satisfied by both the original sequential algorithm and the proposed transformation function. The methodology is illustrated by synthesizing algorithms for matrix multiplication and a version of the Warshall-Floyd transitive closure algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Commutativity-based concurrency control for abstract data types

    Publication Year: 1988 , Page(s): 1488 - 1505
    Cited by:  Papers (52)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1628 KB)  

    Two novel concurrency algorithms for abstract data types are presented that ensure serializability of transactions. It is proved that both algorithms ensure a local atomicity property called dynamic atomicity. The algorithms are quite general, permitting operations to be both partial and nondeterministic. The results returned by operations can be used in determining conflicts, thus allowing higher levels of concurrency than otherwise possible. The descriptions and proofs encompass recovery as well as concurrency control. The two algorithms use different recovery methods: one uses intentions lists, and the other uses undo logs. It is shown that conflict relations that work with one recovery method do not necessarily work with the other. A general correctness condition that must be satisfied by the combination of a recovery method and a conflict relation is identified View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concurrent access of priority queues

    Publication Year: 1988 , Page(s): 1657 - 1665
    Cited by:  Papers (20)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (816 KB)  

    Contention for the shared heap limits the obtainable speedup in parallel algorithms using this data structure as a priority queue. An approach that allows concurrent insertions and deletions on the heap in a shared-memory multiprocessor is presented. The scheme retains the strict priority ordering of the serial-access heap algorithms, i.e. a delete operation returns the best key of all keys that have been inserted or are being inserted at the time delete is started. Experimental results on the BBN Butterfly parallel processor demonstrate that the use of concurrent-heap algorithms in parallel branch-and-bound improves its performance substantially View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliable broadcast in hypercube multicomputers

    Publication Year: 1988 , Page(s): 1654 - 1657
    Cited by:  Papers (51)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (444 KB)  

    A simple algorithm for broadcasting in a hypercube multicomputer containing faulty nodes/links is proposed. The algorithm delivers multiple copies of the broadcast message through disjoint paths to all the modes in the system. Its salient feature is that the delivery of the multiple copies is transparent to the processes receiving the message and does not require the processes to know the identity of the faulty processors. The processes on nonfaulty nodes that receive the message identify the original message from the multiple copies using some scheme appropriate for the fault model used. The algorithm completes in n +1 steps if each node can simultaneously use all of its outgoing links. If each node cannot use more than one outgoing link at a time, then the algorithm requires 2n steps View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A linear algebraic model of algorithm-based fault tolerance

    Publication Year: 1988 , Page(s): 1599 - 1604
    Cited by:  Papers (45)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (500 KB)  

    A linear algebraic interpretation is developed for previously proposed algorithm-based fault tolerance schemes. The concepts of distance, code space, and the definitions of detection and correction in the vector space Rn are explained. The number of errors that can be detected or corrected for a distance-(d+1) code is derived. It is shown why the correction scheme does not work for general weight vectors, and a novel fast-correction algorithm for a weighted distance-5 code is derived View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel technique for efficient parallel implementation of a classical logic/fault simulation problem

    Publication Year: 1988 , Page(s): 1569 - 1577
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (868 KB)  

    A technique is presented for formulating the logic/fault simulation of VLSI array logic in terms of standard vector and matrix operation primitives that are well supported on all scientific supercomputers, high-end mainframes, and minisupercomputers that provide vector parallel hardware and software. The overall computing environment is assumed to be a scientific/engineering one, with Fortran as the primary coding medium and the hardware biased toward numerically intensive applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The formal specification and design of a distributed electronic funds-transfer system

    Publication Year: 1988 , Page(s): 1515 - 1528
    Cited by:  Papers (6)  |  Patents (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1224 KB)  

    The design of an electronic funds-transfer (EFT) system, using the UNITY parallel programming methodology, is presented. The process begins with a high-level specification that captures the essence of transaction processing in the system. In a series of refinement steps, this specification is transformed into one that leads directly to a program suitable for execution on the distributed architecture of the EFT system. Each refinement step involves replacing a data structure by a distributed version that can be implemented efficiently on the target architecture. By defining a correspondence between the replaced data structure and its distributed counterpart, it can be demonstrated formally that each refinement step preserves the intent of the original specification View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constructing two-writer atomic registers

    Publication Year: 1988 , Page(s): 1506 - 1514
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB)  

    A two-writer, n-reader atomic memory register is constructed from two one-writer, (n+1)-reader atomic memory registers. There are no restrictions on the size of the constructed register. The simulation requires only a single extra bit per real register and can survive the failure of any set of readers and writers. A complete proof of correctness is given. Several obvious ways are suggested to try to extend this algorithm to more than two writers, none of which work. As an example, it is shown how a natural extension of the two-writer protocol fails View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partitioning techniques for large-grained parallelism

    Publication Year: 1988 , Page(s): 1627 - 1634
    Cited by:  Papers (13)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB)  

    A model is presented for parallel processing in loosely coupled multiprocessing environments, such as networks of computer workstations, that are amenable to large-grained parallelism. The model takes into account the overhead involved in data communication to and from a remote processor and can be used to partition a large class of computations optimally, consisting of computations that can be organized as a one-level tree and are homogeneous and separable. The optimal partition can be determined for a given number processors, and, if required, the optimal number of processors to use can also be derived. Experimental results validate the model and demonstrate its effectiveness View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simulating essential pyramids

    Publication Year: 1988 , Page(s): 1642 - 1648
    Cited by:  Papers (4)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (844 KB)  

    Pyramid computers, and more generally pyramid algorithms, for image processing have the advantage of providing regular structure with a base naturally identified with an input image and a logarithmic height that permits rapid reduction of information. It is shown that it is possible to simulate systematically the effect of having a separate, so-called `essential' pyramid over each object, greatly simplifying algorithm development since algorithms can be written assuming that there is only a single object. This approach can yield optimal or nearly optimal algorithms for the pyramid computer and can also be used on nonpyramid architectures such as the hypercube, mesh-of-trees, mesh, mesh with row and column buses, mesh with reconfigurable buses, and PRAM (parallel random-access machine). For several of these architectures, the simulated essential pyramids can simultaneously execute an algorithm nearly as fast as a pyramid computer over a single object View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A randomized parallel backtracking algorithm

    Publication Year: 1988 , Page(s): 1665 - 1676
    Cited by:  Papers (7)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (572 KB)  

    A technique for parallel backtracking using randomization is proposed. Its main advantage is that good speedups are possible with little or no interprocessor communication. The speedup obtainable is problem-dependent. In those cases where the problem size becomes very large, randomization is extremely successful achieving good speedups. The technique also ensures high reliability, flexibility, and fault tolerance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Iterative algorithms for solution of large sparse systems of linear equations on hypercubes

    Publication Year: 1988 , Page(s): 1554 - 1568
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1096 KB)  

    Finite-element discretization produces linear equations in the form Ax=b, where A is large, sparse, and banded with proper ordering of the variables x. The solution of such equations on distributed-memory message-passing multiprocessors implementing the hypercube topology is addressed. Iterative algorithms based on the conjugate gradient method are developed for hypercubes designed for coarse-grained parallelism. The communication requirements of different schemes for mapping finite-element meshes onto the processors of a hypercube are analyzed with respect to the effect of communication parameters of the architecture. Experimental results for a 16-node Intel 80386-based iPSC/2 hypercube are presented and discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A benchmark parallel sort for shared memory multiprocessors

    Publication Year: 1988 , Page(s): 1619 - 1626
    Cited by:  Papers (10)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (866 KB)  

    The first parallel sort algorithm for shared memory MIMD (multiple-instruction-multiple-data-stream) multiprocessors that has a theoretical and measured speedup near linear is exhibited. It is based on a novel asynchronous parallel merge that evenly partitions data to be merged among any number of processors. A benchmark sorting algorithm is proposed that uses this merge to remove the linear time bottleneck inherent in previous multiprocessors sorts. This sort, when applied to data set on p processors, has a time complexity of O((n log n)/p)+O((n log p)/p) and a space complexity of 2n, where n is the number of keys being sorted. Evaluations of the merge and benchmark sort algorithms on a 12-processor Sequent Balance 21000 System demonstrate near-linear speedup when compared to sequential Quicksort. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A compiler that increases the fault tolerance of asynchronous protocols

    Publication Year: 1988 , Page(s): 1541 - 1553
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1256 KB)  

    A compiler that increases the fault tolerance of certain asynchronous protocols is presented. Specifically, it transforms a source protocol that is resilient to crash faults into an object protocol that is resilient to Byzantine faults. The compiler simplifies the design of protocols for the Byzantine fault model because it allows the design process to be broken in two steps. The first step is to design a protocol for the crash fault model. The second step, which is completely mechanical, is to compile the protocol into one for the Byzantine fault model. The compiler is used to produce an asynchronous approximate agreement protocol that operates in the Byzantine fault model and improves in several respects on the performance of the asynchronous approximate agreement protocol of D. Dolev et al. (1986) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient parallel convex hull algorithms

    Publication Year: 1988 , Page(s): 1605 - 1618
    Cited by:  Papers (31)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1444 KB)  

    Parallel algorithms are presented to identify (i.e. detect and enumerate) the extreme points of the convex hull of a set of planar points using a hypercube, pyramid, tree, mesh-of-trees, mesh with reconfigurable bus, exclusive-read-exclusive-write parallel random-access machine (EREW PRAM), and modified AKS network. It is shown that the problem of identifying the convex hull for a set of planar points given arbitrarily, cannot be solved faster than sorting View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Space-efficient and fault-tolerant message routing in outerplanar networks

    Publication Year: 1988 , Page(s): 1529 - 1540
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1172 KB)  

    The problem of designing space- and communication-efficient routing schemes for networks that experience faults is addressed. For any outerplanar network containing t faults, a succinct routing scheme is presented that uses O(αtn) space and communication to generate routings that are less than ((α+1)/(α-1))t times longer than optimal, where α>1 is an odd-valued integer parameter. Thus, the routings can be tuned as desired, using a suitable amount of information. Efficient sequential and distributed algorithms are presented for setting up the routing schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Circuit simulation on shared-memory multiprocessors

    Publication Year: 1988 , Page(s): 1634 - 1642
    Cited by:  Papers (25)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (996 KB)  

    Reports the parallelization on a shared-memory vector multiprocessor of the computationally intensive components of a circuit simulator-matrix assembly (including device model evaluation) and the unstructured sparse linear system solution. A theoretical model is used to predict the performance of the lock-synchronized parallel matrix assembly, and the results are compared to experimental measurements. Alternate approaches to efficient sparse matrix solution are contrasted, highlighting the impact of the matrix representation/access strategy on achievable performance, and medium-grained approach with superior performance is introduced. The techniques developed have been incorporated into a prototype parallel implementation of the production circuit simulator ADVICE on the Alliant FX/8 multiprocessor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org