By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 3 • Date March 2003

Filter Results

Displaying Results 1 - 11 of 11
  • On the parallel execution time of tiled loops

    Page(s): 307 - 321
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2219 KB) |  | HTML iconHTML  

    Many computationally-intensive programs, such as those for differential equations, spatial interpolation, and dynamic programming, spend a large portion of their execution time in multiply-nested loops that have a regular stencil of data dependences. Tiling is a well-known compiler optimization that improves performance on such loops, particularly for computers with a multilevel hierarchy of parallelism and memory. Most previous work on tiling is limited in at least one of the following ways: they only handle nested loops of depth two, orthogonal tiling, or rectangular tiles. In our work, we tile loop nests of arbitrary depth using polyhedral tiles. We derive a prediction formula for the execution time of such tiled loops, which can be used by a compiler to automatically determine the tiling parameters that minimizes the execution time. We also explain the notion of rise, a measure of the relationship between the shape of the tiles and the shape of the iteration space generated by the loop nest. The rise is a powerful tool in predicting the execution time of a tiled loop. It allows us to reason about how the tiling affects the length of the longest path of dependent tiles, which is a measure of the execution time of a tiling. We use a model of the tiled iteration space that allows us to determine the length of the longest path of dependent tiles using linear programming. Using the rise, we derive a simple formula for the length of the longest path of dependent tiles in rectilinear iteration spaces, a subclass of the convex iteration spaces, and show how to choose the optimal tile shape. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constructing edge-disjoint spanning trees in product networks

    Page(s): 213 - 221
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1347 KB) |  | HTML iconHTML  

    A Cartesian product network is obtained by applying the cross operation on two graphs. We study the problem of constructing the maximum number of edge-disjoint spanning trees (abbreviated to EDSTs) in Cartesian product networks. Let G=(VG, EG) be a graph having n1 EDSTs and F=(VF, EF) be a graph having n2 EDSTs. Two methods are proposed for constructing EDSTs in the Cartesian product of G and F, denoted by G×F. The graph G has t1=|EG|·n1(|VG|-1) more edges than that are necessary for constructing n1 EDSTs in it, and the graph F has t2=|EF'-n2(|VF|-1) more edges than that are necessary for constructing n2 EDSTs in it. By assuming that t1≥n1 and t2≥n2, our first construction shows that n1+n2 EDSTS can be constructed in G×F. Our second construction does not need any assumption and it constructs n1+n2-1 EDSTs in G×F. By applying the proposed methods, it is easy to construct the maximum numbers of EDSTs in many important Cartesian product networks, such as hypercubes, tori, generalized hypercubes, mesh connected trees, and hyper Petersen networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Channel assignment with separation for interference avoidance in wireless networks

    Page(s): 222 - 235
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1694 KB) |  | HTML iconHTML  

    Given an integer σ>1, a vector (δ1, δ2,..., δσ-1), of nonnegative integers, and an undirected graph G=(V, E), an L(δ1, δ2,..., δσ-1)-coloring of G is a function f from the vertex set V to a set of nonnegative integers, such that |f(u)-f(v)|≥δi, if d(u,v)=i, for 11, δ2,..., δσ-1)-coloring for G is one using the smallest range λ of integers over all such colorings. This problem has relevant application in channel assignment for interference avoidance in wireless networks, where channels (i.e., colors) assigned to interfering stations (i.e., vertices) at distance i must be at least δi apart, while the same channel can be reused in vertices whose distance is at least σ. In particular, two versions of the coloring problem - L(2, 1, 1) and L(δ1, 1,..., 1) - are considered. Since these versions of the problem are NP-hard for general graphs, efficient algorithms for finding optimal colorings are provided for specific graphs modeling realistic wireless networks, including rings, bidimensional grids, and cellular grids. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis on a mobile agent-based algorithm for network routing and management

    Page(s): 193 - 202
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (337 KB) |  | HTML iconHTML  

    Ant routing is a method for network routing in agent technology. Although its effectiveness and efficiency have been demonstrated and reported in the literature, its properties have not yet been well studied. This paper presents some preliminary analysis on an ant algorithm in regard to its population growing property and jumping behavior. Results conclude that as long as the value max, {iΩj|} is known, the practitioner is able to design the algorithm parameters, such as the number of agents being created for each request, k, and the maximum allowable number of jumps of an agent, in order to meet the network constraint. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration

    Page(s): 236 - 247
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2342 KB)  

    Effective scheduling strategies to improve response times, throughput, and utilization are an important consideration in large supercomputing environments. Parallel machines in these environments have traditionally used space-sharing strategies to accommodate multiple jobs at the same time by dedicating the nodes to a single job until it completes. This approach, however, can result in low system utilization and large job wait times. This paper discusses three techniques that can be used beyond simple space-sharing to improve the performance of large parallel systems. The first technique we analyze is backfilling, the second is gang-scheduling, and the third is migration. The main contribution of this paper is an analysis of the effects of combining the above techniques. Using extensive simulations based on detailed models of realistic workloads, the benefits of combining the various techniques are shown over a spectrum of performance criteria. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Probabilistic reliable dissemination in large-scale systems

    Page(s): 248 - 258
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1988 KB) |  | HTML iconHTML  

    The growth of the Internet raises new challenges for the design of distributed systems and applications. In the context of group communication protocols, gossip-based schemes have attracted interest as they are scalable, easy to deploy, and resilient to network and process failures. However, traditional gossip-based protocols have two major drawbacks: 1) they rely on each peer having knowledge of the global membership; and 2) being oblivious to the network topology, they can impose a high load on network links when applied to wide-area settings. In this paper, we provide a theoretical analysis of gossip-based protocols which relates their reliability to key system parameters (the system size, failure rates, and number of gossip targets). The results provide guidelines for the design of practical protocols. In particular, they show how reliability can be maintained while alleviating drawback by: 1) providing each peer with only a small subset of the total membership information and drawback; and 2) organizing members into a hierarchical structure that reflects their proximity according to some network-related metric. We validate the analytical results by simulations and verify that the hierarchical gossip protocol considerably reduces the load on the network compared to the original, non-hierarchical protocol. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel computation of the Euclidean distance transform on a three-dimensional image array

    Page(s): 203 - 212
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1119 KB) |  | HTML iconHTML  

    In a two- or three-dimensional image array, the computation of Euclidean distance transform (EDT) is an important task. With the increasing application of 3D voxel images, it is useful to consider the distance transform of a 3D digital image array. Because the EDT computation is a global operation, it is prohibitively time consuming when performing the EDT for image processing. In order to provide the efficient transform computations, parallelism is employed. We first derive several important geometry relations and properties among parallel planes. We then, develop a parallel algorithm for the three-dimensional Euclidean distance transform (3D-EDT) on the EREW PRAM computation model. The time complexity of our parallel algorithm is O(log2 N) for an N×N×N image array and this is currently the best known result. A generalized parallel algorithm for the 3D-EDT is also proposed. We implement the proposed algorithms sequentially, the performance of which exceeds the existing algorithms (proposed by Yamada, 1984). Finally, we develop the corresponding parallel programs on both the emulated EREW PRAM model computer and the IBM SP2 to verify the speed-up properties of the proposed algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical real-time communication over Ethernet

    Page(s): 322 - 335
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1477 KB)  

    In order to realize real-time communication over Ethernet or fast Ethernet, one must be able to bound the medium access time within an acceptable limit. The multiple access nature of Ethernet makes it impossible to guarantee a deterministic medium access time (hence, packet-delivery deadlines) to individual stations. However, one can bound the medium access time statistically by limiting the packet-arrival rate at the medium access control (MAC) layer. While considering automated manufacturing systems as the main target application, this paper addresses the connection admission control (CAC) problem for statistically bounding the medium access time of Ethernet. Specifically, a packet is guaranteed to have a medium access time smaller than a predefined bound with a certain probability if the instantaneous packet-arrival rate is kept below a certain threshold. Through a mathematical analysis, we first derived such a threshold. In order to keep the packet-arrival rate under the given threshold, we developed and installed middleware which 1) resides between the transport layer and the Ethernet datalink layer, and 2) smooths packet streams between them. The implementation of this middleware requires only a minimal change in the OS kernel without modification to the current standard of Ethernet MAC protocol or TCP or UDP/IP stack. In order to solve the CAC problem, we derived the probability of transmitting a packet successfully upon each trial by modeling the MAC protocol, 1-persistent CSMA/CD, and the collision resolution protocol inary exponential backoff - of Ethernet. Our in-depth simulation results have shown this analytic model to provide a reasonably accurate estimate of packet-loss (or deadline-miss) ratio over fast Ethernet. Finally, we implemented the middleware on the Linux OS, experimentally demonstrating the effectiveness of our approach in providing real-time communication over Ethernet. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interagent communication and synchronization support in the DaAgent mobile agent-based computing system

    Page(s): 290 - 306
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (954 KB) |  | HTML iconHTML  

    This paper describes the design, implementation, and evaluation of interagent communication and synchronization models in the DaAgent mobile-agent based computing system. Based on the requirements of some sample Internet computing applications, eight system-level models of interagent communication and synchronization are proposed. A new synchronization mechanism called location synchronization that is relevant for interacting mobile agents is also proposed. This paper evaluates the eight models based on their utility, performance, level of communication and synchronization support, and applicability in the Internet computing environment. A prototype implementation and detailed performance evaluation of these models based on two interacting, multiagent applications are also presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Processor allocation in mesh multiprocessors using the leapfrog method

    Page(s): 276 - 289
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (900 KB) |  | HTML iconHTML  

    The mesh-connected multiprocessor has become popular because of its simple and regular structure. A new data structure, the R-array, is proposed to represent the mesh at first. The element in the R-array stores statistical information about occupied conditions of the mesh. Statistical information of the R-array can direct the allocation process to jump to the processes that can serve as a base of a free submesh. Based on a simple and reasonable assumption, we develop a stochastic process to analyze behaviors of the proposed scheme. The proposed scheme is the first whose probabilities of locating free submeshes under different workloads are precisely computed. These results can be applied to each full-recognition scheme. In addition, the execution costs of the proposed scheme can also be accurately calculated. Finally, simulations are performed which show that the proposed schemes are faster than most. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A progressive approach to handling message-dependent deadlock in parallel computer systems

    Page(s): 259 - 275
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1066 KB) |  | HTML iconHTML  

    Handling deadlocks is essential for providing reliable communication paths between processing nodes in parallel computer systems. The existence of multiple message types and associated inter-message dependencies may cause message-dependent deadlocks in networks that are designed to be free of routing deadlock. Most methods currently used for dealing with message-dependent deadlocks require more system resources than are necessary and/or do not use system resources efficiently. This may have an adverse effect on system performance if resources are scarce. In this paper, we characterize the frequency of message-dependent deadlocks in multiprocessor/multicomputer systems. We also propose a handling technique for message-dependent deadlocks based on progressive deadlock recovery and evaluate its performance with other approaches. Results show that message-dependent deadlocks occur very infrequently under typical circumstances thus, rendering approaches based on avoiding them overly restrictive in the common case. The proposed technique relaxes restrictions considerably, allowing the routing of packets and the handling of message-dependent deadlocks to be much more efficient-particularly when network resources are scarce. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology