By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 7 • Date Jul 1996

Filter Results

Displaying Results 1 - 8 of 8
  • Traffic analysis and simulation performance of incomplete hypercubes

    Page(s): 740 - 754
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1308 KB)  

    The incomplete hypercube with arbitrary nodes provides far better incremental flexibility than the complete hypercube, whose size is restricted to exactly a power of 2. After faults arise in a complete hypercube system, it is desirable to reconfigure the system so as to retain as many healthy nodes as possible, often leading to an incomplete hypercube of arbitrary size. In this paper, the highest traffic density over links in an incomplete hypercube under uniform message distribution is shown to be bounded by 2 (messages per link per cycle), independent of its size and despite its structural nonhomogeneity. As a result, it is easily achievable to construct an incomplete hypercube with sufficient link communication capability where any potential points of congestion are avoided, ensuring high performance. Simulation results for the incomplete hypercube reveal that mean latency for delivering messages is roughly the same in an incomplete hypercube as in a compatible complete hypercube under both packet-switching and wormhole routing. The incomplete hypercube thus appears to be an attractive and practical architecture, since it shares every advantage of complete hypercubes while eliminating the restriction on the system size View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A subsystem-oriented performance analysis methodology for shared-bus multiprocessors

    Page(s): 755 - 767
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1268 KB)  

    A methodology, called Subsystem Access Time (SAT) modeling, is proposed for the performance modeling and analysis of shared-bus multiprocessors. The methodology is subsystem-oriented because it is based on a Subsystem Access Time Per Instruction (SATPI) concept, in which we treat major components other than processors (e.g., off-chip cache, bus, memory, I/O) as subsystems and model for each of them the mean access time per instruction from each processor. The SAT modeling methodology is derived from the Customized Mean Value Analysis (CMVA) technique, which is request-oriented in the sense that it models the weighted total mean delay for each type of request processed in the subsystems. The subsystem-oriented view of the proposed methodology facilitates divide-and-conquer modeling and bottleneck analysis, which is rarely addressed previously. These distinguishing features lead to a simple, general, and systematic approach to the analytical modeling and analysis of complex multiprocessor systems. To illustrate the key ideas and features that are different from CMVA, an example performance model of a particular shared-bus multiprocessor architecture is presented. The model is used to conduct performance evaluation for throughput prediction. Thereby, the SATPIs of the subsystems are directly utilized to identify the bottleneck subsystem and find the requests or subsystem components that cause the bottleneck. Furthermore, the SATPIs of the subsystems are employed to explore the impact of several performance influencing factors, including memory latency, number of processors, data bus width, as well as DMA transfer View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A unified framework for optimizing communication in data-parallel programs

    Page(s): 689 - 704
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1676 KB)  

    This paper presents a framework, based on global array data-flow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. We introduce available section descriptor, a novel representation of communication involving array sections. This representation allows us to apply techniques for partial redundancy elimination to obtain powerful communication optimizations. With a single framework, we are able to capture optimizations like (1) vectorizing communication, (2) eliminating communication that is redundant on any control flow path, (3) reducing the amount of data being communicated, (4) reducing the number of processors to which data must be communicated, and (5) moving communication earlier to hide latency, and to subsume previous communication. We show that the bidirectional problem of eliminating partial redundancies can be decomposed into simpler unidirectional problems even in the context of an array section representation, which makes the analysis procedure more efficient. We present results from a preliminary implementation of this framework, which are extremely encouraging, and demonstrate the effectiveness of this analysis in improving the performance of programs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of load sharing in HARTS with consideration of its communication activities

    Page(s): 724 - 739
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1520 KB)  

    We rigorously analyze load sharing (LS) in a distributed real-time system, called HARTS (Hexagonal Architecture for Real-Time Systems), while considering LS-related communication activities, such as task transfers and state-change broadcasts. First, we give an overview of the general distributed real-time LS approach described previously, and then adapt it to HARTS by exploiting the topological properties of HARTS. Second, we model task arrival/completion/transfer activities in HARTS as a continuous-time Markov chain from which we derive the distribution of queue length and the rate of generating LS-related traffic-task transfer-out rate and state-region change broadcast rate. Third, we derive the distribution of packet delivery time as a function of LS-related traffic rates by characterizing the hexagonal mesh topology and the virtual cut-through capability of HARTS. Finally, we derive the distribution of task waiting time (the time a task is queued for execution plus the time it would spend if the task is to be transferred), from which the probability of a task failing to complete in time, called the probability of dynamic failure, can be computed. The results obtained from our analytic models are verified through event-driven simulations, and can be used to study the effects of varying various design parameters on the performance of LS while considering the details of LS-related communication activities View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel asynchronous team algorithms: convergence and performance analysis

    Page(s): 677 - 688
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1100 KB)  

    This paper formalizes a general technique to combine different methods in the solution of large systems of nonlinear equations using parallel asynchronous implementations on distributed-memory multiprocessor systems. Such combinations of methods, referred to as team algorithms, are evaluated as a way of obtaining desirable properties of different methods and a sufficient condition for their convergence is derived. The load flow problem of electrical power networks is presented as an example problem that, under certain conditions, has the characteristics to make a team algorithm an appealing choice for its solution. Experimental results of an implementation on an Intel iPSC/860 Hypercube are reported, showing that considerable speedup and robustness can be obtained using team algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Localizing failures in distributed synchronization

    Page(s): 705 - 716
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1208 KB)  

    The fault-tolerance of distributed algorithms is investigated in asynchronous message passing systems with undetectable process failures. Two specific synchronization problems are considered, the dining philosophers problem and the binary committee coordination problem. The abstraction of a bounded doorway is introduced as a general mechanism for achieving individual progress and good failure locality. Using it as a building block, optimal fault-tolerant algorithms are constructed for the two problems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Balanced spanning trees in complete and incomplete star graphs

    Page(s): 717 - 723
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB)  

    Efficiently solving the personalized broadcast problem in an interconnection network typically relies on finding an appropriate spanning tree in the network. In this paper, we show how to construct in a complete star graph an asymptotically balanced spanning tree, and in an incomplete star graph a near-balanced spanning tree. In both cases, the tree is shown to have the minimum height. In the literature, this problem has only been considered for the complete star graph, and the constructed tree is about 4/3 times taller than the one proposed in this paper View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancing distributed event predicate detection algorithms

    Page(s): 673 - 676
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (404 KB)  

    Recently published algorithms for matching concurrent sets of events have the problem of unbounded message queue growth if events arrive in an undesirable order. This paper presents some algorithms that mitigate this problem by examining events waiting to be processed and removing those that cannot be part of a concurrent set View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology