By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 12 • Date Dec. 1999

Filter Results

Displaying Results 1 - 11 of 11
  • Author index

    Publication Year: 1999 , Page(s): 1333 - 1337
    Save to Project icon | Request Permissions | PDF file iconPDF (92 KB)  
    Freely Available from IEEE
  • Subect index

    Publication Year: 1999 , Page(s): 1337 - 1344
    Save to Project icon | Request Permissions | PDF file iconPDF (575 KB)  
    Freely Available from IEEE
  • Time-optimal gossip of large packets in noncombining 2D tori and meshes

    Publication Year: 1999 , Page(s): 1252 - 1261
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (908 KB)  

    The main results of this paper are algorithms for time-optimal gossip of large packets in noncombining full-duplex all-port 2-D tori and meshes of any size m×n. The gossip algorithms define the structure of broadcast trees and lock-step scheduling schemes for packets that make the broadcast trees time-are-disjoint. The gossip algorithm for tori is also buffer-optimal-it requires routers with auxiliary buffers for at most three packets. The gossip algorithm for meshes requires routers with auxiliary buffers for O(m+n) packets View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient algorithms for block-cyclic array redistribution between processor sets

    Publication Year: 1999 , Page(s): 1217 - 1240
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1296 KB)  

    Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes O(max(P, Q)) time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Broadcast-efficient protocols for mobile radio networks

    Publication Year: 1999 , Page(s): 1276 - 1289
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (780 KB)  

    The main contribution of this work is to present elegant broadcast-efficient protocols for permutation routing, ranking, and sorting on single-hop Mobile Radio Networks with p stations and k radio channels, denoted by MRN(p,k). Clearly, any protocol performing these tasks on n items must perform n/k broadcast rounds because each item must be broadcast at least once. We begin by presenting an optimal off-line permutation routing protocol using n /k broadcast rounds for arbitrary k, p, and n. Further, we show that optimal on-line routing can be performed in n/ k broadcast rounds, provided that either k=1 or p=n. We then go on to develop an online routing protocol that takes 2n/ k+k-1 broadcast rounds on the MRN(p,k), whenever k⩽√p/2. Using these routing protocols as basic building blocks, we develop a ranking protocol that takes 2n /k+o(n/k) broadcast rounds as well as a sorting protocol that takes 3n/k+o(n/k) broadcast rounds, provided that k ε o(√n) and p=n. Finally, we develop a ranking protocol that takes 3n/k+o(n/ k) broadcast rounds, as well as a sorting protocol that takes 4n/k+o(n/k) broadcast rounds on the MRN(p,k), provided that k⩽√p/2 and p ε o(n). Featuring very low proportionality constants, our protocols offer a vast improvement over the state of the art View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Isomorphism of degree four Cayley graph and wrapped butterfly and their optimal permutation routing algorithm

    Publication Year: 1999 , Page(s): 1290 - 1298
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (476 KB)  

    In this paper, we first show that the degree four Cayley graph proposed in a paper appearing in the January 1996 issue of IEEE Transactions on Parallel and Distributed Systems is indeed isomorphic to the wrapped butterfly. The isomorphism was first reported by Muga and Wei in the proceedings of PDPTA '96. The isomorphism is shown by using an edge-preserving bijective mapping. Due to the isomorphism, algorithms for the degree four Cayley graph can be easily developed in terms of wrapped butterfly and topological properties of one network can be easily derived in terms of the other. Next, we present the first optimal oblivious one-to-one permutation routing scheme for these networks in terms of the wrapped butterfly. Our algorithm runs in time O(√N), where N is the network size View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tight bounds for prefetching and buffer management algorithms for parallel I/O systems

    Publication Year: 1999 , Page(s): 1262 - 1275
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (724 KB)  

    The I/O performance of applications in multiple-disk systems can be improved by overlapping disk accesses. This requires the use of appropriate prefetching and buffer management algorithms that ensure the most useful blocks are accessed and retained in the buffer. In this paper, we answer several fundamental questions on prefetching and buffer management for distributed-buffer parallel I/O systems. First, we derive and prove the optimality of an algorithm, P-min, that minimizes the number of parallel I/Os. Second, we analyze P-con, an algorithm that always matches its replacement decisions with those of the well-known demand-paged MIN algorithm. We show that P-con can become fully sequential in the worst case. Third, we investigate the behavior of on-line algorithms for multiple-disk prefetching and buffer management. We define and analyze P-Iru, a parallel version of the traditional LRU buffer management algorithm. Unexpectedly, we find that the competitive ratio of P-Iru is independent of the number of disks. Finally, we present the practical performance of these algorithms on randomly generated reference strings. These results confirm the conclusions derived from the analysis on worst case inputs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new self-routing multicast network

    Publication Year: 1999 , Page(s): 1299 - 1316
    Cited by:  Papers (16)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (368 KB)  

    In this paper, we propose a design for a new self-routing multicast network which can realize arbitrary multicast assignments between its inputs and outputs without any blocking. The network design uses a recursive decomposition approach and is based on the binary radix sorting concept. All functional components of the network are reverse banyan networks. Specifically, the new multicast network is recursively constructed by cascading a binary splitting network and two half-size multicast networks. The binary splitting network, in turn, consists of two recursively constructed reverse banyan networks. The first reverse banyan network serves as a scatter network and the second reverse banyan network serves as a quasisorting network. The advantage of this approach is to provide a way to self-route multicast assignments through the network and a possibility to reuse part of network to reduce the network cost. The new multicast network we design is compared favorably with the previously proposed multicast networks. It uses O(n log2 n) logic gates, and has O(log2 n) depth and O(log2 n) routing time where the unit of time is a gate delay. By reusing part of the network, the feedback implementation of the network can further reduce the network cost to O(n log n) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithmic redistribution methods for block-cyclic decompositions

    Publication Year: 1999 , Page(s): 1201 - 1216
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3520 KB)  

    This article presents various data redistribution methods for block-partitioned linear algebra algorithms operating on dense matrices that are distributed in a block-cyclic fashion. Because the algorithmic partitioning unit and the distribution blacking factor are most often chosen to be equal, severe alignment restrictions are induced on the operands, and optimal values with respect to performance are architecture dependent. The techniques presented in this paper redistribute data “on the fly,” so that the user's data distribution blocking factor becomes independent from the architecture dependent algorithmic partitioning. These techniques are applied to the matrix-matrix multiplication operation. A performance analysis along with experimental results shows that alignment restrictions can then be removed and that high performance can be maintained across platforms independently from the user's data distribution blocking factor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel parsing algorithms for static dictionary compression

    Publication Year: 1999 , Page(s): 1241 - 1251
    Cited by:  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (232 KB)  

    The data compression based on dictionary techniques works by replacing phrases in the input string with indexes into some dictionary. The dictionary can be static or dynamic. In static dictionary compression, the dictionary contains a predetermined fixed set of entries. In dynamic dictionary compression, the dictionary changes its entries during compression. We present parallel algorithms for two parsing strategies for static dictionary compression. One is the optimal parsing strategy with dictionaries that have the prefix properly, for which our algorithm requires O(L+log n) time and O(n) processors, where n is the number of symbols in the input string, and L is the maximum length of the dictionary entries, while previous results run in O(L+log n) time using O(n2) processors or in O(L+log2 n) time using O(n) processors. The other is the longest fragment first (LFF) parsing strategy, for which our algorithm requires O(L+log n,) time and O(n log L) processors, while a previous result obtained an O(L log n) time performance on O(n/log n) processors. For both strategies, we derive our parallel algorithms by modifying the on-line algorithms using a pointer doubling technique View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Universal constructions for large objects

    Publication Year: 1999 , Page(s): 1317 - 1332
    Cited by:  Papers (6)  |  Patents (31)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (420 KB)  

    We present lock-free and wait-free universal constructions for implementing large shared objects. Most previous universal constructions require processes to copy the entire object state, which is impractical for large objects. Previous attempts to address this problem require programmers to explicitly fragment large objects into smaller, more manageable pieces, paying particular attention to how such pieces are copied. In contrast, our constructions are designed to largely shield programmers from this fragmentation. Furthermore, for many objects, our constructions result in lower copying overhead than previous ones. Fragmentation is achieved in our constructions through the use of load-linked, store-conditional, and validate operations on a “large” multiword shared variable. Before presenting our constructions, we show how these operations can be efficiently implemented from similar one-word primitives View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology