By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 5 • Date May 1998

Filter Results

Displaying Results 1 - 7 of 7
  • On supernode transformation with minimized total running time

    Page(s): 417 - 428
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (372 KB)  

    With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, for sufficiently large supernodes and number of processors, and for the case where multiple supernodes are mapped to a single processor, we give an order n polynomial whose real positive roots include the optimal supernode size. For two special cases, 1) two-dimensional algorithm problems and 2) n-dimensional algorithm problems, where the communication cost is dominated by the startup penalty and, therefore, can be approximated by a constant, we give a closed form expression for the optimal supernode size, which is independent of the supernode relative side lengths and cutting hyperplanes. For the case where the algorithm iteration index space and the supernodes are hyperrectangular, we give closed form expressions for the optimal supernode relative side lengths. Our experiment shows a good match of the closed form expressions with experimental data View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Alleviating consumption channel bottleneck in wormhole routed k-ary n-cube systems

    Page(s): 481 - 496
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (872 KB)  

    This paper identifies performance degradation in wormhole routed k-ary n-cube networks due to limited number of router-to-processor consumption channels at each node. Many recent research in wormhole routing have advocated the advantages of adaptive routing and virtual channel flow control schemes to deliver better network performance. This paper indicates that the advantages associated with these schemes cannot be realized with limited consumption capacity. To alleviate such performance bottlenecks, a new network interface design using multiple consumption channels is proposed. To match virtual multiplexing on network channels, we also propose each consumption channel to support multiple virtual consumption channels. The impact of message arrival rate at a node on the required number of consumption channels is studied analytically. It is shown that wormhole networks with higher routing adaptivity, dimensionality, degree of hot-spot traffic, and number of virtual lanes have to take advantage of multiple consumption channels to deliver better performance. The interplay between system topology, routing algorithm, number of virtual lanes, messaging overheads, and communication traffic is studied through simulation to derive the effective number of consumption channels required in a system. Using the ongoing technological trend, it is shown that wormhole-routed systems can use up to two-four consumption channels per node to deliver better system performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant real-time communication in distributed computing systems

    Page(s): 470 - 480
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB)  

    The delivery delay in a point-to-point packet switching network is difficult to control due to the contention among randomly-arriving packets at each node and multihops a packet must travel between its source and destination. Despite this difficulty, there are an increasing number of applications that require packets to be delivered reliably within prespecified delay bounds. This paper shows how this can be achieved by using real-time channels which make “soft” reservation of network resources to ensure the timely delivery of real-time packets. We first present theoretical results and detailed procedures for the establishment of real-time channels and then show how the basic real-time channels can be enhanced to be fault-tolerant using the multiple disjoint paths between a pair of communicating nodes. The contribution of the former is a tighter schedulability condition which makes more efficient use of network resources than any other existing approaches, and that of the latter is a significant improvement in fault tolerance over the basic real-time channel, which is inherently susceptible to component failures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Arachne: a portable threads system supporting migrant threads on heterogeneous network farms

    Page(s): 459 - 469
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (232 KB)  

    We present the design and implementation of Arachne, a threads system that can be interfaced with a communications library for multithreaded distributed computations. In particular, Arachne supports thread migration between heterogeneous platforms, dynamic stack size management, and recursive thread functions. Arachne is efficient, flexible, and portable-it is based entirely on C and C++. To facilitate heterogeneous thread operations, we have added three keywords to the C++ language. The Arachne preprocessor takes as input code written in that language and outputs C++ code suitable for compilation with a conventional C++ compiler. The Arachne runtime system manages all threads during program execution. We present some performance measurements on the costs of basic thread operations and thread migration in Arachne and compare these to costs in other threads systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • All to-all communication with minimum start-up costs in 2D/3D tori and meshes

    Page(s): 442 - 458
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (912 KB)  

    All-to-all communication patterns occur in many important parallel algorithms. This paper presents new algorithms for all-to-all communication patterns (all-to-all broadcast and all-to-all personalized exchange) for wormhole switched 2D/3D torus- and mesh-connected multiprocessors. The algorithms use message combining to minimize message start-ups at the expense of larger message sizes. The unique feature of these algorithms is that they are the first algorithms that we know of that operate in a bottom-up fashion rather than a recursive, top-down manner. For a 2d×2d torus or mesh, the algorithms for all-to-all personalized exchange have time complexity of O(23d). An important property of the algorithms is the O(d) time due to message start-ups, compared with O(2d) for current algorithms. This is particularly important for modern parallel architectures where the start-up cost of message transmissions still dominates, except for very large block sizes. Finally, the 2D algorithms for all-to-all personalized exchange are extended to O(24d) algorithms in a 2d×2d×2d3D torus or mesh. These algorithms also retain the important property of O(d) time due to message start-ups View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Collection-aware optimum sequencing of operations and closed-form solutions for the distribution of a divisible load on arbitrary processor trees

    Page(s): 429 - 441
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB)  

    The problem of optimally distributing a divisible load to the nodes of an arbitrary processor tree is tackled in this paper. The rigorous mathematical foundation presented allows the derivation of the sequence of operations that is necessary to obtain the minimum processing time, along with closed-form expressions that yield the solution in time O(NP), where P is the number of tree nodes and N their maximum degree. The main contributions of this work are: (1) both load distribution and result collection overheads are considered, thus providing better resource utilization, and (2) arbitrary processor trees are examined in contrast with previous approaches that examined either complete homogeneous trees, or single level trees. Additionally, approximate algorithms for solving the problem of specifying the optimum subset of active processors for a given load, are presented and evaluated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A spanning multichannel linked hypercube: a gradually scalable optical interconnection network for massively parallel computing

    Page(s): 497 - 512
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1364 KB)  

    A new, scalable interconnection topology called the Spanning Multichannel Linked Hypercube (SMLH) is proposed. This proposed network is very suitable to massively parallel systems and is highly amenable to optical implementation. The SMLH uses the hypercube topology as a basic building block and connects such building blocks using two-dimensional multichannel links (similar to spanning buses). In doing so, the SMLH combines positive features of both the hypercube (small diameter, high connectivity, symmetry, simple routing, and fault tolerance) and the spanning bus hypercube (SBH) (constant node degree, scalability, and ease of physical implementation), while at the same time circumventing their disadvantages. The SMLH topology supports many communication patterns found in different classes of computation, such as bus-based, mesh-based, and tree-based problems, as well as hypercube-based problems. A very attractive feature of the SMLH network is its ability to support a large number of processors with the possibility of maintaining a constant degree and a constant diameter. Other positive features include symmetry, incremental scalability, and fault tolerance. It is shown that the SMLH network provides better average message distance, average traffic density, and queuing delay than many similar networks, including the binary hypercube, the SBH, etc. Additionally, the SMLH has comparable performance to other high-performance hypercubic networks, including the Generalized Hypercube and the Hypermesh. An optical implementation methodology is proposed for SMLH. The implementation methodology combines both the advantages of free space optics with those of wavelength division multiplexing techniques. A detailed analysis of the feasibility of the proposed network is also presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology