By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 4 • Date April 1996

Filter Results

Displaying Results 1 - 11 of 11
  • Parallel computing in networks of workstations with Paralex

    Page(s): 371 - 384
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2646 KB)  

    Modern distributed systems consisting of powerful workstations and high-speed interconnection networks are an economical alternative to special-purpose supercomputers. The technical issues that need to be addressed in exploiting the parallelism inherent in a distributed system include heterogeneity, high-latency communication, fault tolerance and dynamic load balancing. Current software systems for parallel programming provide little or no automatic support towards these issues and require users to be experts in fault-tolerant distributed computing. The Paralex system is aimed at exploring the extent to which the parallel application programmer can be liberated from the complexities of distributed systems. Paralex is a complete programming environment and makes extensive use of graphics to define, edit, execute, and debug parallel scientific applications. All of the necessary code for distributing the computation across a network and replicating it to achieve fault tolerance and dynamic load balancing is automatically generated by the system. In this paper we give an overview of Paralex and present our experiences with a prototype implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using finite state automata to produce self-optimization and self-control

    Page(s): 439 - 448
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1384 KB)  

    A simple game provides a framework within which agents can spontaneously self-organize. In this paper, we present this game, and develop basic theory underlying a robust method for distributed coordination based on this game. This method makes use of finite state automata-one associated with each agent-which guide the agents. We give a new, general method of analysis of these systems, which previously had been studied only in limited cases. We also provide a physical example, which should hint at the type of problems resolvable using this method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Asynchronous analysis of parallel dynamic programming algorithms

    Page(s): 425 - 438
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1984 KB)  

    We examine a very simple asynchronous model of parallel computation that assumes the time to compute a task is random, following some probability distribution. The goal of this model is to capture the effects of unpredictable delays on processors, due to communication delays or cache misses, for example. Using techniques from queueing theory and occupancy problems, we use this model to analyze two parallel dynamic programming algorithms. We show that this model is simple to analyze and correctly predicts which algorithm will perform better in practice. The algorithms we consider are a pipeline algorithm, where each processor i computes in order the entries of rows i, i+p, and so on, where p is the number of processors; and a diagonal algorithm, where entries along each diagonal extending from the left to the top of the table are computed in turn. It is likely that the techniques used here can be useful in the analysis of other algorithms that use barriers or pipelining techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors

    Page(s): 385 - 398
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2252 KB)  

    We study the efficiency of previously proposed stride and sequential prefetching-two promising hardware-based prefetching schemes to reduce read-miss penalties in shared-memory multiprocessors. Although stride accesses dominate in four out of six of the applications we study, we find that sequential prefetching does as well as and in same cases even better than stride prefetching for five applications. This is because 1) most strides are shorter than the block size (we assume 32 byte blocks), which means that sequential prefetching is as effective for these stride accesses, and 2) sequential prefetching also exploits the locality of read misses with nonstride accesses. However, since stride prefetching in general results in fewer useless prefetches, it offers the extra advantage of consuming less memory-system bandwidth View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Randomized routing with shorter paths

    Page(s): 356 - 362
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB)  

    Studies the use of randomized routing in multistage networks. While log N additional randomizing stages are needed to break “spatial locality”, within each permutation, only log log N additional randomizing stages are needed to break “temporal locality” among successive permutations. Thus, log N bits of initial randomization per input, followed by log log N bits of randomization per packet are sufficient to ensure that t permutations are delivered in time t+log N. We present simulation results that validate this analysis View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two ranking schemes for efficient computation on the star interconnection network

    Page(s): 321 - 327
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (768 KB)  

    A node ranking scheme provides the necessary structural view for developing algorithms on a network. We present two ranking schemes for the star interconnection network both of which allow constant time order preserving communication. The first scheme is based on a hierarchical view of the star network. It enables one to efficiently implement order preserving ASCEND/DESCEND class of algorithms. This class includes several important algorithms such as the Fast Fourier Transform (FFT) and matrix multiplication. The other ranking scheme gives a flexible pipelined view of the star interconnection network and provides a suitable framework for implementation of pipelined algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Valid transformations: a new class of loop transformations for high-level synthesis and pipelined scheduling applications

    Page(s): 399 - 410
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1556 KB)  

    In this paper we present a new class of loop optimizing transformations called valid transformations, which are suitable for fine-grain parallelization applications such as high-level synthesis of VLSI designs or compilers for super-scalar or VLIW machines. This class of transformations are different from existing ones in that valid transformations can be illegal. Nevertheless, if a transformation is valid, the transformed loop has a feasible pipeline schedule. We present an example valid transformation called loop expansion which can help produce cost-performance efficient designs and explore a larger design space for a satisfactory design. Several examples are used to demonstrate the efficacy of the proposed technique View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource allocation in cube network systems based on the covering radius

    Page(s): 328 - 342
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1324 KB)  

    When multiple copies of a certain resource exist in a cube network system, it is desirable that every nonresource node can reach the resource in a given number of hops. In this paper, we introduce systematic approaches to resource allocation in a cube system so that each nonresource node is connected with a specified number of resource copies and that the allocation performance measure of interest is optimized. The methodology used is based on the covering radius results of known codes. These codes aid in constructing desired linear codes whose codewords address nodes where resource copies are placed. The resource allocation problem is translated to an integer nonlinear program whose best possible solution can be identified quickly by taking advantage of basic properties derived from the known codes, yielding an optimal or near-optimal allocation result. Those basic properties lead to drastic time complexity reduction (up to several orders of magnitude smaller), in particular for large system sizes. Our approaches are applicable to any cube size, often arriving at more efficient allocation outcomes than what are attainable using prior schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Matrix partitioning on a virtual shared memory parallel machine

    Page(s): 343 - 355
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1484 KB)  

    The general problem considered in the paper is partitioning of a matrix operation between processors of a parallel system in an optimum load-balanced way without potential memory contention. The considered parallel system is defined by several features the main of which is availability of a virtual shared memory divided into segments. If partitioning of a matrix operation causes parallel access to the same memory segment with writing data to the segment by at least one processor, then contention between processors arises which implies performance degradation. To eliminate such situation, a restriction is imposed on a class of possible partitionings, so that no two processors would write data to the same segment. On the resulting class of contention-free partitionings, a load-balanced optimum partitioning is defined as satisfying independent minimax criteria. The main result of the paper is an algorithm for finding the optimum partitioning by means of analytical solution of respective minimax problems. The paper also discusses implementation and performance issues related to the algorithm, on the basis of experience at Kendall Square Research Corporation, where the partitioning algorithm was used for creating high-performance parallel matrix libraries View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient optimal reconfiguration algorithm for FDDI-based networks

    Page(s): 411 - 424
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2000 KB)  

    We study a new network architecture based on standard FDDI networks. This network, called FDDI-based reconfigurable network (FBRN), is constructed using multiple FDDI token rings and has the ability to reconfigure itself in the event of extensive damage to the network. Thus, an FBRN has the potential to provide high available bandwidth even in the presence of numerous faults. Realization of this potential depends crucially on a reconfiguration algorithm that guides the reconfiguration process. We design and analyze a reconfiguration algorithm for FBRNs. Our algorithm is optimal in the sense that it always produces a configuration that results in the maximum available bandwidth for a given fault pattern. This algorithm has a polynomial time complexity. We also show that the available bandwidth of an FBRN is dramatically improved with our reconfiguration algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On general results for all-to-all broadcast

    Page(s): 363 - 370
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1056 KB)  

    All-to-all broadcast refers to the process by which every node broadcasts its certain piece of information to all other nodes in the system. In this paper, we develop all-to-all broadcast schemes by dealing with two classes of schemes. A prior scheme based on generation of minimal complete sets is first described, and then a new scheme based on propagation of experts is developed. The former always completes the broadcasting in the minimal number of steps and the latter is designed to minimize the number of messages. Performance of these two classes of schemes is comparatively analyzed. The all-to-all broadcast scheme desired can be derived by combining the advantages of these two classes of schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology