By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 3 • Date March 2008

Filter Results

Displaying Results 1 - 16 of 16
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (97 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (81 KB)  
    Freely Available from IEEE
  • Offloading Data Distribution Management to Network Processors in HLA-Based Distributed Simulations

    Page(s): 289 - 298
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1586 KB) |  | HTML iconHTML  

    The high-level architecture (HLA) standard developed by the Department of Defense in the United States is a key technology to perform distributed simulation. Inside the HLA framework, many different simulators (termed federates) may be interconnected to create a single more complex simulator (federation). Data distribution management (DDM) is an optional subset of services that controls which federates should receive notification of state modifications made by other federates. A simple DDM implementation will usually generate much more traffic than needed, whereas a complex one might introduce too much overhead. In this work, we describe an approach to DDM that delegates a portion of the DDM computation to a processor on the network card in order to provide more CPU time for other federate and Runtime Infrastructure (RTI) computations while still being able to exploit the benefits of a complex DDM implementation to reduce the amount of information exchange. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel Implementation of the 2D Discrete Wavelet Transform on Graphics Processing Units: Filter Bank versus Lifting

    Page(s): 299 - 310
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3660 KB) |  | HTML iconHTML  

    The widespread usage of the discrete wavelet transform (DWT) has motivated the development of fast DWT algorithms and their tuning on all sorts of computer systems. Several studies have compared the performance of the most popular schemes, known as filter bank scheme (FBS) and lifting scheme (LS), and have always concluded that LS is the most efficient option. However, there is no such study on streaming processors such as modern Graphics Processing Units (GPUs). Current trends have transformed these devices into powerful stream processors with enough flexibility to perform intensive and complex floating-point calculations. The opportunities opened up by these platforms, as well as the growing popularity of the DWT within the computer graphics field, make a new performance comparison of great practical interest. Our study indicates that FBS outperforms LS in current-generation GPUs. In our experiments, the actual FBS gains range between 10 percent and 140 percent, depending on the problem size and the type and length of the wavelet filter. Moreover, design trends suggest higher gains in future-generation GPUs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TROP: A Novel Approximate Link-State Dissemination Framework For Dynamic Survivable Routing in MPLS Networks

    Page(s): 311 - 322
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2759 KB) |  | HTML iconHTML  

    In this paper, a novel approximate link-state dissemination framework, called TROP, is proposed for shared backup path protection (SBPP) in multiprotocol label switching (MPLS) networks. While performing dynamic explicit survivable routing in a distributed environment, link-state dissemination may cause a nontrivial signaling overhead in the process of exploring spare resource sharing among individual backup label switched paths (LSPs). Several previously reported studies have tackled this problem by initiating a compromise between the amount of dissemination and the achievable extent of resource sharing. The paper first summarizes the previously reported schemes into a compact and general link-state dissemination framework by way of singular value decomposition (SVD). To improve the accuracy of the matrix reconstruction and to eliminate the overestimation of the sharable spare capacity along each link, a novel SVD approach based on the min-plus algebra (also called tropical semirings) is introduced. Simulation results show that the proposed schemes can achieve a lower blocking probability than that by all the other counterpart schemes while taking the same complexity of link-state dissemination. This great advantage is gained at the expense of a longer computation time for solving a linear program (LP) in each dissemination cycle at the core nodes. We also consider the stale link-state phenomena that may cause imprecision in the routing information at the ingress nodes due to the delay in the periodic/event-driven link-state update message advertisement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • NFS-CD: Write-Enabled Cooperative Caching in NFS

    Page(s): 323 - 333
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1617 KB) |  | HTML iconHTML  

    We present the network file system with cluster delegation (NFS-CD), an enhancement to the NFSv4 that reduces server load and increases the scalability of distributed file systems in computing clusters. The cluster delegation feature of NFS-CD allows data sharing among clients by extending the NFSv4 delegation model so that multiple clients manage a single file without interacting with the server. Based on cluster delegation, we implement a fast-commit primitive, cooperative caching, and the ability to recover the uncommitted updates of a failed computer. NFS-CD supports both read and write operations in the cooperative cache without degrading the consistency model of NFSv4. We have implemented NFS-CD by modifying the Linux NFSv4 client only. Because the server remains unchanged, NFS-CD preserves the simple administration model of NFSv4 and interoperates with standard NFS clients. NFS-CD offers improved performance when compared to NFSv4 at the expense of slightly weaker reliability guarantees. An experimental evaluation of our implementation, using industry standard benchmarks and application workloads, reveals that NFS-CD reduces server load by more than half. It also demonstrates that under most workloads, file systems must support writes to the cooperative cache to achieve scale. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Embedding Hamiltonian Cycles in Crossed Cubes

    Page(s): 334 - 346
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2701 KB) |  | HTML iconHTML  

    We study the embedding of Hamiltonian cycle in the Crossed Cube, which is a prominent variant of the classical hypercube, obtained by crossing some straight links of a hypercube, and has been attracting much research interest in literatures since its proposal. We will show that due to the loss of link-topology regularity, generating Hamiltonian cycles in a crossed cube is a more complicated procedure than in its original counterpart. The paper studies how the crossed links affect an otherwise succinct process to generate a host of well-structured Hamiltonian cycles traversing all nodes. The condition for generating these Hamiltonian cycles in a crossed cube is proposed. An algorithm is presented that works out a Hamiltonian cycle for a given link permutation. The useful properties revealed and the algorithm proposed in this paper can find their way when system designers evaluate a candidate network's competence and suitability, balancing regularity and other performance criteria, in choosing an interconnection network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Hashing for Scalable Multicast in Wireless Ad Hoc Networks

    Page(s): 347 - 362
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1796 KB) |  | HTML iconHTML  

    Several multicast protocols for mobile ad hoc networks have been proposed, which build multicast trees by using location information that is available from the Global Positioning System (GPS) or localization algorithms and use geographic forwarding to forward packets down the multicast trees. These stateless multicast protocols carry encoded membership, location, and tree information in each packet and are more efficient and robust than stateful protocols (for example, ADMR and ODMRP), as they avoid the difficulty of maintaining distributed state in the presence of frequent topology changes. However, current stateless multicast protocols are not scalable to large groups because of the per-packet encoding overhead, and the centralized group membership and location management. We present the hierarchical rendezvous point multicast (HRPM) protocol, which significantly improves the scalability of stateless multicast with respect to the group size. HRPM consists of two key design ideas: 1) hierarchical decomposition of a large group into a hierarchy of recursively organized manageable-sized subgroups and 2) the use of distributed geographic hashing to construct and maintain such a hierarchy at virtually no cost. Our detailed simulations demonstrates that HRPM achieves significantly enhanced scalability and performance due to hierarchical organization and distributed hashing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DCMP: A Distributed Cycle Minimization Protocol for Peer-to-Peer Networks

    Page(s): 363 - 377
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2876 KB) |  | HTML iconHTML  

    Broadcast-based peer-to-peer (P2P) networks, including flat (for example, Gnutella) and two-layer superpeer implementations (for example, Kazaa), are extremely popular nowadays due to their simplicity, ease of deployment, and versatility. The unstructured network topology, however, contains many cyclic paths, which introduce numerous duplicate messages in the system. Although such messages can be identified and ignored, they still consume a large proportion of the bandwidth and other resources, causing bottlenecks in the entire network. In this paper, we describe the distributed cycle minimization protocol (DCMP), a dynamic fully decentralized protocol that significantly reduces the duplicate messages by eliminating unnecessary cycles. As queries are transmitted through the peers, DCMP identifies the problematic paths and attempts to break the cycles while maintaining the connectivity of the network. In order to preserve the fault resilience and load balancing properties of unstructured P2P systems, DCMP avoids creating a hierarchical organization. Instead, it applies cycle elimination symmetrically around some powerful peers to keep the average path length small. The overall structure is constructed fast with very low overhead. With the information collected during this process, distributed maintenance is performed efficiently even if peers quit the system without notification. The experimental results from our simulator and the prototype implementation on PlanetLab confirm that DCMP significantly improves the scalability of unstructured P2P systems without sacrificing their desirable properties. Moreover, due to its simplicity, DCMP can be easily implemented in various existing P2P systems and is orthogonal to the search algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Overhead Analysis of Scientific Workflows in Grid Environments

    Page(s): 378 - 393
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3038 KB) |  | HTML iconHTML  

    Scientific workflows are a topic of great interest in the grid community that sees in the workflow model an attractive paradigm for programming distributed wide-area grid infrastructures. Traditionally, the grid workflow execution is approached as a pure best effort scheduling problem that maps the activities onto the grid processors based on appropriate optimization or local matchmaking heuristics such that the overall execution time is minimized. Even though such heuristics often deliver effective results, the execution in dynamic and unpredictable grid environments is prone to severe performance losses that must be understood for minimizing the completion time or for the efficient use of high-performance resources. In this paper, we propose a new systematic approach to help the scientists and middleware developers understand the most severe sources of performance losses that occur when executing scientific workflows in dynamic grid environments. We introduce an ideal model for the lowest execution time that can be achieved by a workflow and explain the difference to the real measured grid execution time based on a hierarchy of performance overheads for grid computing. We describe how to systematically measure and compute the overheads from individual activities to larger workflow regions and adjust well-known parallel processing metrics to the scope of grid computing, including speedup and efficiency. We present a distributed online tool for computing and analyzing the performance overheads in real time based on event correlation techniques and introduce several performance contracts as quality-of-service parameters to be enforced during the workflow execution beyond traditional best effort practices. We illustrate our method through postmortem and online performance analysis of two real-world workflow applications executed in the Austrian grid environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing Queue Oscillation at a Congested Link

    Page(s): 394 - 407
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5173 KB) |  | HTML iconHTML  

    Queue length oscillation at a congested link causes many undesirable properties such as large delay jitter, underutilization of the link and packet drops in burst. The main reason of this oscillation is that most queue management schemes determine the drop probability based on the current traffic without consideration on the impact of that drop probability on the future traffic. In this paper, we propose a new active queue (AQM) scheme to reduce queue oscillation and realize stable queue length. The proposed scheme measures the current arrival and drop rates, and uses them to estimate the next arrival rate. Based on this estimation, the scheme calculates the drop probability which is expected to realize stable queue length. We present extensive simulation with various topologies and offered traffic to evaluate performance of the proposed scheme. The results show that the proposed scheme remarkably reduces queue length oscillation compared to other well-known AQMs. It is also shown that the proposed scheme improves fairness among TCP flows due to the stable drop probability, and maintains high utilization with small queue length. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalability Analysis of the Hierarchical Architecture for Distributed Virtual Environments

    Page(s): 408 - 417
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2373 KB) |  | HTML iconHTML  

    A distributed virtual environment (DVE) is a shared virtual environment where multiple users at their workstations interact with each other over a network. Some of these systems may support a large number of users, for example, multiplayer online games. An important issue is how well the system scales as the number of users increases. In terms of scalability, a promising system architecture is a two-level hierarchical architecture. At the lower level, multiple servers are deployed; each server interacts with its assigned users. At the higher level, the servers ensure that their copies of the virtual environment are as consistent as possible. Although the two-level architecture is believed to have good properties with respect to scalability, not much is known about its performance characteristics. In this paper, we develop a performance model for the two-level architecture and obtain analytic results on the workload experienced by each server. Our results provide valuable insights into the scalability of the architecture. We also investigate the issue of consistency and develop a novel technique to achieve weak consistency among copies of the virtual environment at the various servers. Simulation results on the consistency/scalability trade-off are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical Scheduling for Symmetric Multiprocessors

    Page(s): 418 - 431
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1769 KB) |  | HTML iconHTML  

    Hierarchical scheduling has been proposed as a scheduling technique to achieve aggregate resource partitioning among related groups of threads and applications in uniprocessor and packet scheduling environments. Existing hierarchical schedulers are not easily extensible to multiprocessor environments because 1) they do not incorporate the inherent parallelism of a multiprocessor system while resource partitioning and 2) they can result in unbounded unfairness or starvation if applied to a multiprocessor system in a naive manner. In this paper, we present hierarchical multiprocessor scheduling (H-SMP), a novel hierarchical CPU scheduling algorithm designed for a symmetric multiprocessor (SMP) platform. The novelty of this algorithm lies in its combination of space and time multiplexing to achieve the desired bandwidth partition among the nodes of the hierarchical scheduling tree. This algorithm is also characterized by its ability to incorporate existing proportional-share algorithms as auxiliary schedulers to achieve efficient hierarchical CPU partitioning. In addition, we present a generalized weight feasibility constraint that specifies the limit on the achievable CPU bandwidth partitioning in a multiprocessor hierarchical framework and propose a hierarchical weight readjustment algorithm designed to transparently satisfy this feasibility constraint. We evaluate the properties of H-SMP using hierarchical surplus fair scheduling (H-SFS), an instantiation of H-SMP that employs surplus fair scheduling (SFS) as an auxiliary algorithm. This evaluation is carried out through a simulation study that shows that H-SFS provides better fairness properties in multiprocessor environments as compared to existing algorithms and their naive extensions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Computer Society Digital Library [advertisement]

    Page(s): 432
    Save to Project icon | Request Permissions | PDF file iconPDF (105 KB)  
    Freely Available from IEEE
  • TPDS Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (81 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (97 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology