Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 2 • Date Feb 1996

Filter Results

Displaying Results 1 - 10 of 10
  • A trip-based multicasting model in wormhole-routed networks with virtual channels

    Publication Year: 1996 , Page(s): 138 - 150
    Cited by:  Papers (23)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1324 KB)  

    This paper focuses on efficient multicasting in wormhole-routed networks. A trip-based model is proposed to support adaptive, distributed, and deadlock-free multiple multicast on any network with arbitrary topology using at most two virtual channels per physical channel. This model significantly generalizes the path-based model proposed earlier which works only for Hamiltonian networks and cannot be applicable to networks with arbitrary topology resulted due to system faults. Fundamentals of the trip-based model, including the necessary and sufficient condition to be deadlock-free, and the use of appropriate number of virtual channels to avoid deadlock are investigated. The potential of this model is illustrated by applying it to hypercubes with faulty nodes. Simulation results indicate that the proposed model can implement multiple multicast on faulty hypercubes with negligible performance degradation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constant time BSR solutions to parenthesis matching, tree decoding, and tree reconstruction from its traversals

    Publication Year: 1996 , Page(s): 218 - 224
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (616 KB)  

    Recently Akl et al. introduced a new model of parallel computation, called BSR (broadcasting with selective reduction) and showed that it is more powerful than any CRCW PRAM and yet requires no more resources for implementation than even EREW PRAM. The model allows constant time solutions to sorting, parallel prefix and other problems. In this paper, we describe constant time solutions to the parenthesis matching, decoding binary trees in bitstring representation, generating next tree shape in B-order, and the reconstruction of binary trees from their traversals, using the BSR model. They are the first constant time solutions to mentioned problems on any model of computation. The number of processors used is equal to the input size, for each problem. A new algorithm for sorting integers is also presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Circuit-switched broadcasting in torus and mesh networks

    Publication Year: 1996 , Page(s): 184 - 190
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (656 KB)  

    We consider the problem of broadcasting on torus and mesh networks using circuit-switched, half-duplex, and link-bound communication. In this paper, we obtain an optimal broadcasting algorithm that uses pd time steps for a d-dimensional torus with (2d+1)p nodes in each side of the torus. Using this algorithm, we show that a broadcasting on a d-dimensional mesh with the same size can be done in pd+p+d-1 time steps View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient LRU-based buffering in a LAN remote caching architecture

    Publication Year: 1996 , Page(s): 191 - 206
    Cited by:  Papers (8)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1664 KB)  

    The possibility of fast access to the main memory of remote sites has been advanced as a potential performance improvement in distributed systems. Even if a page is not available in local memory, sites need not do a disk access. Instead, the sites can use efficient mechanisms that support rapid request/response exchanges in order to access pages that are currently buffered at a remote site. Hardware and software support in such a remote caching architecture must also include algorithms that determine which pages should be buffered at what sites. When each site uses the classic LRU replacement algorithm, performance can be much worse than optimal in many system configurations. Because sites do not coordinate individual decisions, overall system buffering/caching decisions yield very inefficient global configurations. This paper proposes an easily implementable modification of the LRU replacement algorithm for LAN environments that reduces replication. The algorithm substantially improves hit-ratios-and thus performance-over a wide range of parameters. The relatively simple LAN topology implies that much less state information need be available for good replacement decisions compared to general network topologies. Two implications of two variations of the algorithm are explored. In an environment where the network is not a performance bottleneck, and where performance is memory-limited, performance of the proposed replacement algorithm is shown to be close to optimal View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic data structure selection and transformation for sparse matrix computations

    Publication Year: 1996 , Page(s): 109 - 126
    Cited by:  Papers (14)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1656 KB)  

    The problem of compiler optimization of sparse codes is well known and no satisfactory solutions have been found yet. One of the major obstacles is formed by the fact that sparse programs explicitly deal with particular data structures selected for storing sparse matrices. This explicit data structure handling obscures the functionality of a code to such a degree that optimization of the code is prohibited, for instance, by the introduction of indirect addressing. The method presented in this paper delays data structure selection until the compile phase, thereby allowing the compiler to combine code optimization with explicit data structure selection. This method enables the compiler to generate efficient code for sparse computations. Moreover, the task of the programmer is greatly reduced in complexity View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Folded Petersen cube networks: new competitors for the hypercubes

    Publication Year: 1996 , Page(s): 151 - 168
    Cited by:  Papers (22)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1636 KB)  

    We introduce and analyze a new interconnection topology, called the k-dimensional folded Petersen (FPk) network, which is constructed by iteratively applying the Cartesian product operation on the well-known Petersen graph. Since the number of nodes in FPk is restricted to a power of ten, for better scalability we propose a generalization, the folded Petersen cube network FPQn,k =Qn×FPk, which is a product of the n-dimensional binary hypercube (Qn) and FPk. The FPQn,k topology provides regularity, node- and edge-symmetry, optimal connectivity (and therefore maximal fault-tolerance), logarithmic diameter, modularity, and permits simple self-routing and broadcasting algorithms. With the same node-degree and connectivity, FPQ n,k has smaller diameter and accommodates more nodes than Q n+3k, and its packing density is higher compared to several other product networks. This paper also emphasizes the versatility of the folded Petersen cube networks as a multicomputer interconnection topology by providing embeddings of many computationally important structures such as rings, multi-dimensional meshes, hypercubes, complete binary trees, tree machines, meshes of trees, and pyramids. The dilation and edge-congestion of all such embeddings are at most two View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for designing deadlock-free wormhole routing algorithms

    Publication Year: 1996 , Page(s): 169 - 183
    Cited by:  Papers (20)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1728 KB)  

    This paper presents a framework to design fully-adaptive, deadlock-free wormhole algorithms for a variety of network topologies. The main theoretical contributions are: (a) design of new wormhole algorithms using store-and-forward algorithms, (b) a sufficient condition for deadlock free routing by the wormhole algorithms so designed, and (c) a sufficient condition for deadlock free routing by these wormhole algorithms with centralized flit buffers shared among multiple channels. To illustrate the theory, several wormhole algorithms based on store-and-forward hop schemes are designed. The hop-based wormhole algorithms can be applied to a variety of networks including torus, mesh, de Brujin, and a class of Cayley networks, with the best known bounds on virtual channels for minimal routing on the last two classes of networks. An analysis of the resource requirements and performances of a proposed algorithm, called negative-hop algorithm, with some of the previously proposed algorithms for torus and mesh networks is presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithms for search trees on message passing architectures

    Publication Year: 1996 , Page(s): 97 - 108
    Cited by:  Papers (1)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1432 KB)  

    In this paper we describe a new algorithm for maintaining a balanced search tree on a message-passing MIMD architecture; the algorithm is particularly well suited for implementation on a small number of processors. We introduce a (2B-2, 2B) search tree that uses a bidirectional ring of O(log n) processors to store n entries. Update operations use a bottom-up node-splitting scheme, which performs significantly better than top-down search tree algorithms. The bottom-up algorithm requires many fewer messages and results in less blocking due to synchronization than top-down algorithms. Additionally, for a given cost ratio of computation to communication the value of B may be varied to maximize performance. Implementations on a parallel-architecture simulator are described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel distributive join algorithm for cube-connected multiprocessors

    Publication Year: 1996 , Page(s): 127 - 137
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (992 KB)  

    This paper presents a parallel distributive join algorithm for cube-connected multiprocessors. The performance analysis shows that the proposed algorithm has an almost linear speedup over the sequential distributive join algorithm as the number of processors increases, and its performance is comparable to that of the parallel hybrid-hash join algorithm. A big advantage of the proposed algorithm over hash-based join algorithms is that it does not have the bucket overflow problem caused by nonuniform hashing of the smaller operand relation. Moreover, the proposed algorithm can easily support the nonequijoin operation, which is very hard to implement by using hash-based join algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MAD kernels: an experimental testbed to study multiprocessor memory system behavior

    Publication Year: 1996 , Page(s): 207 - 217
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1268 KB)  

    On large-scale multiprocessors, access to common memory is one of the key performance limiting factors. The shared-memory performance depends not only on the characteristics of the memory hierarchy itself, but also upon the characteristics of the memory address streams and the interaction between the two. We present a technique for multiprocessor workload construction and a family of artificial kernels, called MAD-kernels, to systematically investigate the behavior of the memory hierarchy. The measured performance is independent of any particular application or algorithm. The proposed methodology is demonstrated on two commercial shared-memory systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology