By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 2 • Date Feb 1997

Filter Results

Displaying Results 1 - 11 of 11
  • Uniform and self-stabilizing token rings allowing unfair daemon

    Publication Year: 1997 , Page(s): 154 - 163
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    A distributed system consists of a set of processes and a set of communication links, each connecting a pair of processes. A distributed system is said to be self-stabilizing if it converges to a correct system state no matter which system state it starts with. A self-stabilizing system is considered to be an ideal fault tolerant system, since it tolerates any kind and any finite number of transient failures. In this paper, we investigate uniform randomized self-stabilizing mutual exclusion systems on unidirectional rings. As far as deterministic systems are concerned, it is well-known that there is no such system when the number 6 of processes (i.e., ring size) is composite, even if a fair central-daemon (c-daemon) is assumed. A fair daemon guarantees that every process will be selected for activation infinitely many times. As for randomized systems, regardless of the ring size, we can design a self-stabilizing system even for a distributed-daemon (d-daemon). However, every system proposed so far assumes a daemon to be fair, and effectively replies on this assumption. This paper tackles the problem of designing a self-stabilizing system, without assuming the fairness of a daemon. As a result, we present a randomized self-stabilizing mutual exclusion system for any size n (including composite size) of a unidirectional ring. The number of process states of the system is 2(n-1) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An empirical evaluation of performance-memory trade-offs in time warp

    Publication Year: 1997 , Page(s): 210 - 224
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (536 KB)  

    The performance of the Time Warp mechanism is experimentally evaluated when only a limited amount of memory is available to the parallel computation. An implementation of the cancelback protocol is used for memory management on a shared memory architecture, viz., KSR to evaluate the performance vs. memory tradeoff. The implementation of the cancelback protocol supports canceling back more than one memory object when memory has been exhausted (the precise number is referred to as the salvage parameter) and incorporates a non-work-conserving processor scheduling technique to prevent starvation. Several synthetic and benchmark programs are used that provide interesting stress cases for evaluating the limited memory behavior. The experiments are extensively monitored to determine the extent to which various factors may affect performance. Several observations are made by analyzing the behavior of Time Warp under limited memory: (1) Depending on the available memory and asymmetry in the workload, canceling back several memory objects at one time (i.e. a salvage parameter value of more than one) improves performance significantly, by reducing certain overheads. However, performance is relatively insensitive to the salvage parameter except at extreme values. (2) The speedup vs. memory curve for Time Warp programs has a well-defined knee before which speedup increases very rapidly with memory and beyond which there is little performance gain with increased memory. (3) A performance nearly equivalent to that with large amounts of memory can be achieved with only a modest amount of additional memory beyond that required for sequential execution, if memory management overheads are small compared to the event granularity. These results indicate that contrary to the common belief, memory usage by Time Warp can be controlled within reasonable limits without any significant loss of performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On runtime parallel scheduling for processor load balancing

    Publication Year: 1997 , Page(s): 173 - 186
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (980 KB)  

    Parallel scheduling is a new approach for load balancing. In parallel scheduling, all processors cooperate to schedule work. Parallel scheduling is able to accurately balance the load by using global load information at compile-time or runtime. It provides high-quality load balancing. This paper presents an overview of the parallel scheduling technique. Scheduling algorithms for tree, hypercube, and mesh networks are presented. These algorithms can fully balance the load and maximize locality at runtime. Communication costs are significantly reduced compared to other existing algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Graceful degradation in algorithm-based fault tolerant multiprocessor systems

    Publication Year: 1997 , Page(s): 137 - 153
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (540 KB)  

    Algorithm-based fault tolerance (ABFT) is a technique which improves the reliability of a multiprocessor system by providing concurrent error detection and fault location capability to it. It encodes data at the system level and modifies the algorithm to operate on the encoded data in order to expose both transient and permanent faults in any processor. Work done till now in this area takes care of only the fault detection and location part of the problem. However, if spare processors are not available, then after a faulty processor has been located, the work initially assigned to it has to be mapped to some nonfaulty processors in the system in such a way that the fault tolerance capability of the system is still maintained with as small a degradation in performance as possible. In this paper, we propose an integrated deterministic solution to the above problem which combines concurrent error detection and fault location with graceful degradation. There exists no previous deterministic ABFT method for the design of general t-fault locating systems, even for the case of t=1. We propose a general method for designing one-fault locating/s-fault detecting systems. We use an extended model for representing ABFT systems. This model considers the processors computing the checks to be a part of the ABFT system, so that faults in the check computing processors can also be detected and located using a simple diagnosis algorithm, and the checks can be mapped to other nonfaulty processors in the system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fusion of loops for parallelism and locality

    Publication Year: 1997 , Page(s): 193 - 209
    Cited by:  Papers (16)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (564 KB)  

    Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-carried dependences which prevent parallelism. In addition, performance losses result from cache conflicts in fused loops. In this paper, we present new techniques to: (1) allow fusion of loop nests in the presence of fusion-preventing dependences, (2) maintain parallelism and allow the parallel execution of fused loops with minimal synchronization, and (3) eliminate cache conflicts in fused loops. We describe algorithms for implementing these techniques in compilers. The techniques are evaluated on a 56-processor KSR2 multiprocessor and on a 18-processor Convex SPP-1000 multiprocessor. The results demonstrate performance improvements for both kernels and complete applications. The results also indicate that careful evaluation of the profitability of fusion is necessary as more processors are used View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A general method for maximizing the error-detecting ability of distributed algorithms

    Publication Year: 1997 , Page(s): 164 - 172
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (336 KB)  

    The bound on component failures and their spatial distribution govern the fault tolerance of any candidate error-detecting algorithm. For distributed memory multiprocessors, the specific algorithm and the topology of the processor interconnection network define these bounds. This paper introduces the maximal fault index, derived from the system topology and local communication patterns, to demonstrate how a maximal number of simultaneous component failures can be tolerated for a particular interconnection network and error-detecting algorithm. The index is used to design a mapping of processes to processor groups such that the error-detecting ability of the algorithm is preserved for certain multiple simultaneous processor failures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The cross product of interconnection networks

    Publication Year: 1997 , Page(s): 109 - 118
    Cited by:  Papers (20)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    We study the cross product as a method for generating and analyzing interconnection network topologies for multiprocessor systems. Consider two interconnection graphs G1 and G2 each with some established properties such as symmetry, low degree and diameter, scalability, simple optimal routing, recursive structure (partitionability), fault tolerance, existence of node-disjoint paths, low cost embedding, and efficient broadcasting. We investigate and evaluate the corresponding properties for the cross product of G1 and G2 based on the properties of G1 and those of G2. We also give a mathematical characterization of product families of graphs which are closed under the cross product operation. This investigation is useful in two ways. On one hand, it gives a new tool for further studying some of the known interconnection topologies, such as the hypercube and the mesh, which can be defined using the cross product operation. On the other hand, it can be used in defining and evaluating new interconnection graphs using the cross product operation on known topologies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal task assignment in homogeneous networks

    Publication Year: 1997 , Page(s): 119 - 129
    Cited by:  Papers (21)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    This paper considers the problem of assigning the tasks of a distributed application to the processors of a distributed system such that the sum of execution and communication costs is minimized. Previous work has shown this problem to be tractable for a system of two processors or a linear array of N processors, and for distributed programs of serial parallel structures. Here we focus on the assignment problem on a homogeneous network, which is composed of N functionally-identical processors, each with its own memory. Some processors in the network may have unique resources, such as data files or certain peripheral devices. Certain tasks may have to use these unique resources; they are called attached tasks. The tasks of a distributed program should therefore be assigned so as to make use of specific resources located at certain processors in the network while minimizing the amount of interprocessor communication. The assignment problem in such a homogeneous network is known to be NP-hard even for N=3, thus making it intractable for a network with a medium to large number of processors. We therefore focus on task assignment in general array networks, such as linear arrays, meshes, hypercubes, and trees. We first develop a modeling technique that transforms the assignment problem in an array or tree into a minimum-cut maximum-flow problem. The assignment problem is then solved for a general array or tree network in polynomial time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithmic mapping of feedforward neural networks onto multiple bus systems

    Publication Year: 1997 , Page(s): 130 - 136
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB)  

    This paper addresses the problem of mapping a feedforward ANN onto a multiple bus system, MBS, with p processors and b buses so as to minimize the total execution time. We present an algorithm which assigns the nodes of a given computational layer (c-layer) to processors such that the computation lower bound [Nl/p]tpl and the communication lower bound [Nl/b]tc are achieved simultaneously, where Nl is the number of nodes in the mapped c-layer l and tpl and tc are the computation and communication times, respectively, associated with a node in the layer. When computation and communication are not overlapped, we show that the optimal number of processors needed is either 1 or p, depending on the ratio tpl/tc . When computation and communication are overlapped, we show that the optimal number of processors needed is either 1 or ([tp l/tc])b. We show that there is a unique arrangement of interfaces such that the total number of interfaces is minimum and the optimal time is reached. Finally, we compare the relative merits of the MBS simulating ANNs over the recently introduced checkerboarding scheme View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple-edge-fault tolerance with respect to hypercubes

    Publication Year: 1997 , Page(s): 187 - 192
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (392 KB)  

    Previous works on edge-fault tolerance with respect to hypercubes Qn are mainly focused on 1-edge fault and 2- or 3-edge fault with limited size of n. We give a construction scheme for 2-EFT(Qn ) graphs and 3-EFT(Qn) graphs, where n is arbitrarily large. In our constructions, approximately log n extra degree is added to the vertices of Qn for 2-edge-fault tolerance, and one more degree for 3-edge-fault tolerance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing communication latency with path multiplexing in optically interconnected multiprocessor systems

    Publication Year: 1997 , Page(s): 97 - 108
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB)  

    Reducing communication latency, which is a performance bottleneck in optically interconnected multiprocessor systems, is of prominent importance. A conventional approach for establishing connections in multiplexed networks uses a set of independent time slots (or virtual channels) along a path for each connection. This approach requires the use of switching devices capable of interchanging time slots, and thus introduces latency in addition to hardware and control complexity. We propose an approach to all-optical time division multiplexed (TDM) communications in multiprocessor systems. The idea is to establish a connection along a path using a set of time slots (or virtual channels) that are dependent on each other, so that no time slot interchanging is required. We compare the proposed approach with the conventional one in terms of the overall communication latency. We found that, despite the possibility that establishing a connection may take a longer time, the proposed approach will result in lower overall communication latency as it eliminates the delays introduced by the time slot interchanging switching devices View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology