By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 8 • Date Aug 1998

Filter Results

Displaying Results 1 - 10 of 10
  • Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system

    Publication Year: 1998 , Page(s): 705 - 720
    Cited by:  Papers (27)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (564 KB)  

    We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many independent subsystems and, thus, is able to support parallel implementations of divide-and-conquer computations like Strassen's algorithm. The main contributions of the paper are as follows. We develop five matrix multiplication algorithms with varying degrees of parallelism on the LARPBS computing model; namely, MM1, MM 2, MM3, and compound algorithms C1(ε)and C2(δ). Algorithm C1(ε) has adjustable time complexity in sublinear level. Algorithm C2(δ) implies that it is feasible to achieve sublogarithmic time using σ(N3) processors for matrix multiplication on a realistic system. Algorithms MM3, C1(ε), and C2(δ) all have o(𝒩3) cost and, hence, are very processor efficient. Algorithms MM1, MM3, and C1(ε) are general-purpose matrix multiplication algorithms, where the array elements are in any ring. Algorithms MM2 and C2(δ) are applicable to array elements that are integers of bounded magnitude, or floating-point values of bounded precision and magnitude, or Boolean values. Extension of algorithms MM 2 and C2(δ) to unbounded integers and reals are also discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel system for text inference using marker propagations

    Publication Year: 1998 , Page(s): 729 - 747
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1036 KB)  

    This paper presents a possible solution for the text inference problem-extracting information unstated in a text, but implied. Text inference is central to natural language applications such as information extraction and dissemination, text understanding, summarization, and translation. Our solution takes advantage of a semantic English dictionary available in electronic form that provides the basis for the development of a large linguistic knowledge base. The inference algorithm consists of a set of highly parallel search methods that, when applied to the knowledge base, find contexts in which sentences are interpreted. These contexts reveal information relevant to the text. Implementation, results, and parallelism analysis are discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable s-to-p broadcasting on message-passing MPPs

    Publication Year: 1998 , Page(s): 758 - 768
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (680 KB)  

    In s-to-p broadcasting, s processors in a processor machine contain a message to be broadcast to all the processors, 1⩽s⩽p. We present a number of different broadcasting algorithms that handle all ranges of s. We show how the performance of each algorithm is influenced by the distribution of the s source processors and by the relationships between the distribution and the characteristics of the interconnection network. For the Intel Paragon we show that for each algorithm and machine dimension there exist ideal distributions and distributions on which the performance degrades. For the Cray T3D we also demonstrate dependencies between distributions and machine sizes. To reduce the dependence of the performance on the distribution of sources, we propose a repositioning approach. In this approach, the initial distribution is turned into an ideal distribution of the target broadcasting algorithm. We report experimental results for the Intel Paragon and Cray T3D and discuss scalability and performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved compressions of cube-connected cycles networks

    Publication Year: 1998 , Page(s): 803 - 812
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    We present a new technique for the embedding of large cube-connected cycles networks (CCC) into smaller ones, a problem that arises when algorithms designed for an architecture of an ideal size are to be executed on an existing architecture of a fixed size. Using the new embedding strategy, we show that the CCC of dimension I can be embedded into the CCC of dimension k with dilation 1 and optimum load for any k, l∈ N, k⩾8, such 5/3+ck<1/k⩽2, ck=3.2(2/3k)/4k+3, thus improving known results. Our embedding technique also leads to improved dilation-1 embeddings in the case 3/2<1/k⩽5/3+Ck View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized broadcasting and multicasting protocols in cut-through routed networks

    Publication Year: 1998 , Page(s): 788 - 802
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (788 KB)  

    This paper addresses the one-to-all broadcasting problem and the one-to-many broadcasting problem, usually simply called broadcasting and multicasting, respectively. Broadcasting is the information dissemination problem in which a node of a network sends the same piece of information to all the other nodes. Multicasting is a partial broadcasting in the sense that only a subset of nodes forms the destination set. Both operations have many applications in parallel and distributed computing. In this paper, we study these problems in both line model, and cut-through model. The former assumes long distance calls between nonneighboring processors. The latter strengthens the line model by taking into account the use of a routing function. Long distance calls are possible in circuit-switched and wormhole-routed networks, and also in many networks supporting optical facilities. In the line model, it is well known that one can compute in polynomial time a [log2n]-round broadcast or multicast protocol for any arbitrary network. Unfortunately such a protocol is often inefficient from a practical point of view because it does not use the resources of the network in a balanced way. In this paper, we present a new algorithm to compute broadcast or multicast protocols. This algorithm applies under both line and cut-through models. Moreover, it returns protocols that efficiently use the bandwidth of the network. From a complexity point of view, we also show that most of the optimization problems relative to the maximization of the efficiency of broadcast or multicast protocols in terms of switching time or vertex load are NP-complete. We have, however, derived polynomial efficient solutions for tree-networks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A compiler optimization algorithm for shared-memory multiprocessors

    Publication Year: 1998 , Page(s): 769 - 787
    Cited by:  Papers (4)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (344 KB)  

    This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense matrices. The programs initially were hand-coded to target a variety of parallel machines using loop parallelism. We ignore the user's parallel loop directives, and use known and implemented dependence and interprocedural analysis to find parallelism. We then apply our new optimization algorithm to the resulting program. We compare the original parallel program to the hand-optimized program, and show that our algorithm improves three programs, matches four programs, and degrades one program in our test suite on a shared-memory, bus-based parallel machine with local caches. This experiment suggests existing dependence and interprocedural array analysis can automatically detect user parallelism, and demonstrates that user parallelized codes often benefit from our compiler optimizations, providing evidence that we need both parallel algorithms and compiler optimizations to effectively utilize parallel machines View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Deterministic voting in distributed systems using error-correcting codes

    Publication Year: 1998 , Page(s): 813 - 824
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (552 KB)  

    Distributed voting is an important problem in reliable computing. In an N Modular Redundant (NMR) system, the N computational modules execute identical tasks and they need to periodically vote on their current states. In this paper, we propose a deterministic majority voting algorithm for NMR systems. Our voting algorithm uses error-correcting codes to drastically reduce the average case communication complexity. In particular, we show that the efficiency of our voting algorithm can be improved by choosing the parameters of the error-correcting code to match the probability of the computational faults. For example, consider an NMR system with 31 modules, each with a state of m bits, where each module has an independent computational error probability of 10-3. 1, this NMR system, our algorithm can reduce the average case communication complexity to approximately 1.0825 m compared with the communication complexity of 31 m of the naive algorithm in which every module broadcasts its local result to all other modules. We have also implemented the voting algorithm over a network of workstations. The experimental performance results match well the theoretical predictions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new algorithm based on Givens rotations for solving linear equations on fault-tolerant mesh-connected processors

    Publication Year: 1998 , Page(s): 825 - 832
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (412 KB)  

    In this paper, we propose a new I/O overhead free Givens rotations based parallel algorithm for solving a system of linear equations. The algorithm uses a new technique called two-sided elimination and requires an N×(N+1) mesh-connected processor array to solve N linear equations in (5N-log N-4) time steps. The array is well suited for VLSI implementation as identical processors with simple and regular interconnection pattern are required. We also describe a fault-tolerant scheme based on an algorithm based fault tolerance (ABFT) approach. This scheme has small hardware and time overhead and can tolerate up to N processor failures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A distributed graph algorithm for the detection of local cycles and knots

    Publication Year: 1998 , Page(s): 748 - 757
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (344 KB)  

    In this paper, a distributed cycle/knot detection algorithm for general graphs is presented. The algorithm distinguishes between cycles and knots and is the first algorithm to our knowledge which does so. It is especially relevant to an application such as parallel simulation in which 1) cycles and knots can arise frequently 2) the size of the graph is very large, and 3) it is necessary to know if a given node is in a cycle or a knot. It requires less communication than previous algorithms-2m vs. (at least) (4m) for the Chandy and Misra algorithm, where m is the number of links in the graph. It requires O (nlog (n)) bits of memory, where n is the number of nodes. The algorithm differs from the classical diffusing computation methods through its use of incomplete search messages to speed up the computation. We introduce a marking scheme in order to identify strongly connected subcomponents of the graph which cannot reach the initiator of the algorithm. This allows us to distinguish between the case in which the initiator is in a cycle (only) or is in a knot View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recognizing nondominated coteries and wr-coteries by availability

    Publication Year: 1998 , Page(s): 721 - 728
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (268 KB)  

    Coterie is a widely accepted concept for solving the mutual exclusion problem. Nondominated coteries are an important class of coteries which have better performance than dominated coteries. The performance of a coterie is usually measured by availability. Higher availability of a coterie exhibits greater ability to tolerate node or communication link failures. In this paper, we demonstrate a way to recognize nondominated coteries using availability. By evaluating the availability of a coterie instead of using a formal proof, the coterie can be recognized as a nondominated coterie or not. Moreover, with regard to wr-coterie, a concept for solving the replica control problem, we also present a similar result for recognizing nondominated wr-coteries. Finally, we apply our results to some well-known coteries and wr-coteries View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology