System Maintenance:
There may be intermittent impact on performance while updates are in progress. We apologize for the inconvenience.
By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 9 • Date Sep 1995

Filter Results

Displaying Results 1 - 7 of 7
  • Generating and approximating nondominated coteries

    Publication Year: 1995 , Page(s): 905 - 914
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (860 KB)  

    A coterie, which is used to realize mutual exclusion in a distributed system is a family C of incomparable subsets such that every pair of subsets in C has at least one element in common. Associate with a family of subsets C a positive (i.e., monotone) Boolean function fc such that fc(x)=1 if the Boolean vector x is equal to or greater than the characteristic vector of some subset in C, and 0 otherwise. It is known that C is a coterie if and only if fc is dual-minor, and is a nondominated (ND) coterie if and only if fc is self-dual. We introduce an operator ρ, which transforms a positive self-dual function into another positive self-dual function, and the concept of almost-self-duality, which is a close approximation to self-duality and can be checked in polynomial time (the complexity of checking positive self-duality is currently unknown). After proving several interesting properties of them, we propose a simple algorithm to check whether a given positive function is self-dual or not. Although this is not a polynomial algorithm, it is practically efficient in most cases. Finally, we present an incrementally polynomial algorithm that generates all positive self-dual functions (ND coteries) by repeatedly applying p operations. Based on this algorithm, all ND coteries of up to seven variables are computed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Products of networks with logarithmic diameter and fixed degree

    Publication Year: 1995 , Page(s): 963 - 975
    Cited by:  Papers (22)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1140 KB)  

    Analyzes some general properties of product networks that are pertinent to parallel architectures and then focuses on three case studies. These are products of complete binary trees, shuffle-exchange and de Bruijn networks. It is shown that all of these are powerful architectures for parallel computation, as evidenced by their ability to efficiently emulate numerous other architectures. In particular, r-dimensional grids and r-dimensional meshes of trees can be embedded efficiently in products of these graphs, i.e. either as a subgraph or with small constant dilation and congestion. In addition, the shuffle-exchange network can be embedded in an r-dimensional product of shuffle-exchange networks with dilation cost 2r and congestion cost 2. Similarly, the de Bruijn network can be embedded in an r-dimensional product of de Bruijn networks with dilation cost r and congestion cost 4. Moreover, it is well known that shuffle-exchange and de Bruijn graphs can emulate the hypercube with a small constant slowdown for “normal” algorithms. This means that their product versions can also emulate these hypercube algorithms with constant slowdown. Conclusions include a discussion of many open research areas View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stack evaluation of arbitrary set-associative multiprocessor caches

    Publication Year: 1995 , Page(s): 930 - 942
    Cited by:  Papers (7)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1240 KB)  

    We propose a simple solution to the problem of efficient stack evaluation of LRU multiprocessor cache memories with arbitrary set-associative mapping. It is an extension of the existing stack evaluation techniques for all set-associative LRU uniprocessor caches. Special marker entries are used in the stack to represent data blocks (or lines) deleted by an invalidation-based cache coherence protocol. A method of marker-splitting is employed when a data block below a marker in the stack is accessed. Using this technique, one-pass trace evaluation of memory access trace yields hit ratios for all cache sizes and set-associative mappings of multiprocessor caches in a single pass over a memory reference trace. Simulation experiments on some multiprocessor trace data show an order-of-magnitude speed-up in simulation time using this one-pass technique View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A theory of deadlock-free adaptive multicast routing in wormhole networks

    Publication Year: 1995 , Page(s): 976 - 987
    Cited by:  Papers (20)  |  Patents (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1176 KB)  

    A theory for the design of deadlock-free adaptive routing algorithms for wormhole networks, proposed by the author (1991, 1993), supplies sufficient conditions for an adaptive routing algorithm to be deadlock-free, even when there are cyclic dependencies between channels. Also, two design methodologies were proposed. Multicast communication refers to the delivery of the same message from one source node to an arbitrary number of destination nodes. A tree-like routing scheme is not suitable for hardware-supported multicast in wormhole networks because it produces many headers for each message, drastically increasing the probability of a message being blocked. A path-based multicast routing model was proposed by Lin and Ni (1991) for multicomputers with 2D-mesh and hypercube topologies. In this model, messages are not replicated at intermediate nodes. This paper develops the theoretical background for the design of deadlock-free adaptive multicast routing algorithms. This theory is valid for wormhole networks using the path-based routing model. It is also valid when messages with a single destination and multiple destinations are mixed together. The new channel dependencies produced by messages with several destinations are studied. Also, two theorems are proposed, developing conditions to verify that an adaptive multicast routing algorithm is deadlock-free, even when there are cyclic dependencies between channels. As an example, the multicast routing algorithms of Lin and Ni are extended, so that they can take advantage of the alternative paths offered by the network View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A trace-driven simulator for performance evaluation of cache-based multiprocessor systems

    Publication Year: 1995 , Page(s): 915 - 929
    Cited by:  Papers (17)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1336 KB)  

    We describe a simulator which emulates the activity of a shared memory, common bus multiprocessor system with private caches. Both kernel and user program activities are considered, thus allowing an accurate analysis and evaluation of coherence protocol performance. The simulator can generate synthetic traces, based on a wide set of input parameters which specify processor, kernel and workload features. Other parameters allow us to detail the multiprocessor architecture for which the analysis has to be carried out. An actual-trace-driven simulation is possible, too, in order to evaluate the performance of a specific multiprocessor with respect to a given workload, if traces concerning this workload are available. In a separate section, we describe how actual traces can also be used to extract a set of input parameters for synthetic trace generation. Finally, we show how the simulator may be successfully employed to carry out a detailed performance analysis of a specific coherence protocol View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ring-connected networks and their relationship to cubical ring connected cycles and dynamic redundancy networks

    Publication Year: 1995 , Page(s): 988 - 996
    Cited by:  Papers (4)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB)  

    Reviews a 1-fault-tolerant (1-ft) hypercube model with degree 2r: the ring-connected network (RCN), which has the lowest degree among all 1-ft, one-spare node, r-dimensional hypercube architectures yet discovered. Then, we propose a constant-time reconfiguration algorithm via an add-and-modulo automorphism. Furthermore, by introducing the equivalence from hypercubes to cube-connected cycles (CCCs) and to butterflies (BFs), we find that there is also a corresponding equivalence from RCNs to cubical ring-connected cycles (CRCCs) and to dynamic redundancy networks (DRNs). From this fact, we find that once a symmetric fault-tolerant structure has been discovered for one of the three models, then it can be applied directly to the other hypercubic networks. Applying the technique, we find a degree-6, 1-ft Benes network. We think that more attention should be paid to the strong relationship between hypercubes, CCCs and BFs. Finally, from this equivalence relationship we propose three new bounded-degree k-ft models: k-ft CCCs, k-ft BFs and k-ft Benes networks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors

    Publication Year: 1995 , Page(s): 943 - 962
    Cited by:  Papers (33)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1824 KB)  

    Presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. While several previous papers have looked at hyperplane partitioning of iteration spaces to reduce communication traffic, the problem of deriving the optimal tiling parameters for minimal communication in loops with general affine index expressions has remained open. Our paper solves this open problem by presenting a method for deriving an optimal hyperparallelepiped tiling of iteration spaces for minimal communication in multiprocessors with caches. We show that the same theoretical framework can also be used to determine optimal tiling parameters for both data and loop partitioning in distributed memory multicomputers. Our framework uses matrices to represent iteration and data space mappings and the notion of uniformly intersecting references to capture temporal locality in array references. We introduce the notion of data footprints to estimate the communication traffic between processors and use linear algebraic methods and lattice theory to compute precisely the size of data footprints. We have implemented this framework in a compiler for Alewife, a distributed shared-memory multiprocessor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology