By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 3 • Date March 2004

Filter Results

Displaying Results 1 - 11 of 11
  • An experimental evaluation of data dependence analysis techniques

    Page(s): 196 - 213
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2799 KB) |  | HTML iconHTML  

    Optimizing compilers rely upon program analysis techniques to detect data dependences between program statements. Data dependence information captures the essential ordering constraints of the statements in a program that need to be preserved in order to produce valid optimized and parallel code. Data dependence testing is very important for automatic parallelization, vectorization, and any other code transformation. In this paper, we examine the impact of data dependence analysis in practice. A number of data dependence tests have been proposed in the literature. In each test, there are different trade offs between accuracy and efficiency. We present an experimental evaluation of several data dependence tests, including the Banerjee test, the I-Test, and the Omega test. We compare these tests in terms of data dependence accuracy, compilation efficiency, effectiveness in parallelization, and program execution performance. We analyze the reasons why a data dependence test can be inexact and we explain how the examined tests handle such cases. We run various experiments using the Perfect Club Benchmarks and the scientific library Lapack. We present the measured accuracy of each test and the reasons for any approximation. We compare these tests in term's of efficiency and we analyze the trade offs between accuracy and efficiency. We also determine the impact of each data dependence test on the total compilation time. Finally, we measure the number of loops parallelized by each test and we compare the execution performance of each benchmark on a multiprocessor. Our results indicate that the Omega test is more accurate, but also very inefficient in the cases where the other two tests are inaccurate. In general, the cost of the Omega test is high and uses a significant percentage of the total compilation time. Furthermore, the difference in accuracy of the Omega test over the Banerjee test and the l-Test does not improve parallelization and program execution performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The minimal cost distribution tree problem for recursive expiration-based consistency management

    Page(s): 214 - 227
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1743 KB) |  | HTML iconHTML  

    The expiration-based scheme is widely used to manage the consistency of cached and replicated contents such as Web objects. In this approach, each replica is associated with an expiration time beyond which the replica has to be validated. While the expiration-based scheme has been investigated in the context of a single replica, not much work has been done on its behaviors with respect to multiple replicas. To allow for efficient consistency management, it is desirable to organize the replicas into a distribution tree where a lower level replica seeks validation with a higher level replica when its lifetime expires. This paper investigates the construction of a distribution tree for a given set of replicas with the objective of minimizing the total communication cost of consistency management. This is formulated as an optimization problem and is proven to be NP-complete. The optimal distribution tree is identified in some special cases and several heuristic algorithms are proposed for the general problem. The performance of the heuristic algorithms is experimentally evaluated against two classical graph-theoretic algorithms of tree construction: the shortest-paths tree and the minimum spanning tree. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A class of multistage conference switching networks for group communication

    Page(s): 228 - 243
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1399 KB) |  | HTML iconHTML  

    There is a growing demand for network support for group applications, in which messages from one or more sender(s) are delivered to a large number of receivers. Here, we propose a network architecture for supporting a fundamental type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. We consider adopting a class of multistage networks, such as a baseline, an omega, or an indirect binary cube network, composed of switch modules with fan-in and fan-out capability for a conference network which supports multiple disjoint conferences. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network. Our results show that, for a network of size n × n, the multiplicities of routing conflicts are small constants (between 2 and 4) for an omega network or an indirect binary cube network; while it can be as large as √n/q + 1 for a baseline network, where q is the minimum allowable conference size. Thus, our design for conference networks is based on an omega network or an indirect binary cube network. We also develop fast self-routing algorithms for setting up routing paths in the newly designed conference networks. As can be seen, such an n × n conference network has O(logn) routing time and communication delay and O(nlogn) hardware cost. The conference networks are superior to existing designs in terms of routing complexity, communication delay and hardware cost. The conference network proposed is rearrangeably nonblocking in general, and is strictly nonblocking under some conference service policy. It can be used in applications that require efficient or real-time group communication. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting global knowledge to achieve self-tuned congestion control for k-ary n-cube networks

    Page(s): 257 - 272
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1629 KB) |  | HTML iconHTML  

    Network performance in tightly-coupled multiprocessors typically degrades rapidly beyond network saturation. Consequently, designers must keep a network below its saturation point by reducing the load on the network. Congestion control via source throttling-a common technique to reduce the network load-prevents new packets from entering the network in the presence of congestion. Unfortunately, prior schemes to implement source throttling either lack vital global information about the network to make the correct decision (whether to throttle or not) or depend on specific network parameters, or communication patterns. This paper presents a global-knowledge-based, self-tuned, congestion control technique that prevents saturation at high loads across different communication patterns for k-ary n-cube networks. Our design is composed of two key components. First, we use global information about a network to obtain a timely estimate of network congestion. We compare this estimate to a threshold value to determine when to throttle packet injection. The second component is a self-tuning mechanism that automatically determines appropriate threshold values based on throughput feedback. A combination of these two techniques provides high performance under heavy load, does not penalize performance under light load, and gracefully adapts to changes in communication patterns. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scheduling divisible loads on heterogeneous linear daisy chain networks with arbitrary processor release times

    Page(s): 273 - 288
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (747 KB) |  | HTML iconHTML  

    The problem of distributing and processing a divisible load in a heterogeneous linear network of processors with arbitrary processors release times is considered. A divisible load is very large in size and has computationally intensive CPU requirements. Further, it has the property that the load can be partitioned arbitrarily into any number of portions and can be scheduled onto processors independently for computation. The load is assumed to arrive at one of the farthest end processors, referred to as boundary processors, for processing. The processors in the network are assumed to have nonzero release times, i.e., the time instants from which the processors are available for processing the divisible load. Our objective is to design a load distribution strategy by taking into account the release times of the processors in such a way that the entire processing time of the load is a minimum. We consider two generic cases in which all processors have identical release times and when all processors have arbitrary release times. We adopt both the single and multiinstallment strategies proposed in the divisible load scheduling literature in our design of load distribution strategies, wherever necessary, to achieve a minimum processing time. Finally, when optimal strategies cannot be realized, we propose two heuristic strategies, one for the identical case, and the other for nonidentical release times case, respectively. Several conditions are derived to determine whether or not optimal load distribution exists and illustrative examples are provided for the ease of understanding. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eligibility-based Round Robin for fair and efficient packet scheduling in wormhole switching networks

    Page(s): 244 - 256
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (972 KB) |  | HTML iconHTML  

    Interconnection networks of parallel systems are used for servicing traffic generated by different applications, often belonging to different users. When multiple users contend for channel bandwidth, fairness in bandwidth sharing becomes a key requirement. In fact, enforcing a fair sharing of channel bandwidth improves flow isolation, thus preventing misbehaving flows from affecting the performance of other flows. We present a novel packet scheduling algorithm, called eligibility-based round robin (EBRR), devised to provide fair queueing in interconnection networks. In fact, EBRR meets the constraints imposed by wormhole switching, which is the most popular switching technique in interconnection networks of parallel systems. It can also be applied to packet switching wide area networks (WANs), such as IP and ATM. We show that EBRR has O(1) complexity and better delay and fairness properties than existing algorithms of comparable complexity. Here, we also investigate the means for assessing the fairness of a scheduler: we show that using the relative fairness bound as a fairness measure may lead to erroneous results. We then propose an alternative measure, called the generalized relative fairness bound, that allows fairness to be assessed more precisely. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Parallel and Distributed Systems - Table of contents

    Page(s): 01
    Save to Project icon | Request Permissions | PDF file iconPDF (284 KB)  
    Freely Available from IEEE
  • Submission of manuscripts for review

    Page(s): 02
    Save to Project icon | Request Permissions | PDF file iconPDF (206 KB)  
    Freely Available from IEEE
  • Editor's note

    Page(s): 193 - 195
    Save to Project icon | Request Permissions | PDF file iconPDF (278 KB)  
    Freely Available from IEEE
  • Information for authors

    Page(s): 289
    Save to Project icon | Request Permissions | PDF file iconPDF (206 KB)  
    Freely Available from IEEE
  • [Front cover]

    Page(s): 290
    Save to Project icon | Request Permissions | PDF file iconPDF (285 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology