By Topic

High Performance Distributed Computing, 2006 15th IEEE International Symposium on

Date 19-23 June 2006

Filter Results

Displaying Results 1 - 25 of 70
  • Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing

    Page(s): i - xxii
    Save to Project icon | Request Permissions | PDF file iconPDF (360 KB)  
    Freely Available from IEEE
  • Keynote The Renaissance of Decentralized Systems

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (138 KB)  

    Provides an abstract of the keynote presentation and a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Peer-to-Peer Systems and Overlay Networks

    Page(s): 5 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (92 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Peer to peer size estimation in large and dynamic networks: A comparative study

    Page(s): 7 - 17
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (842 KB) |  | HTML iconHTML  

    As the size of distributed systems keeps growing, the peer to peer communication paradigm has been identified as the key to scalability. Peer to peer overlay networks are characterized by their self-organizing capabilities, resilience to failure and fully decentralized control. In a peer to peer overlay, no entity has a global knowledge of the system. As much as this property is essential to ensure the scalability, monitoring the system under such circumstances is a complex task. Yet, estimating the size of the system is core functionality for many distributed applications to parameter setting or monitoring purposes. In this paper, we propose a comparative study between three algorithms that estimate in a fully decentralized way the size of a peer to peer overlay. Candidate approaches are generally applicable irrespective of the underlying structure of the peer to peer overlay. The paper reports the head to head comparison of estimation system size algorithms. The simulations have been conducted using the same simulation framework and inputs and highlight the differences in cost and accuracy of the estimation between the algorithms both in static and dynamic settings View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IQ-Paths: Predictably High Performance Data Streams across Dynamic Network Overlays

    Page(s): 18 - 29
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1581 KB) |  | HTML iconHTML  

    Overlay networks are a key vehicle for delivering network and processing resources to high performance applications. For shared networks, however, to consistently deliver such resources at desired levels of performance, overlays must be managed at runtime, based on the continuous assessment and prediction of available distributed resources. Data-intensive applications, for example, must assess, predict, and judiciously use available network paths, and dynamically choose alternate or exploit concurrent paths. Otherwise, they cannot sustain the consistent levels of performance required by tasks like remote data visualization, online program steering, and remote access to high end devices. The multiplicity of data streams occurring in complex scientific workflows or in large-scale distributed collaborations exacerbate this problem, particularly when different streams have different performance requirements. This paper presents IQ-Paths, a set of techniques and their middleware realization that implement self-regulating overlay streams for data-intensive distributed applications. Self-regulation is based on (1) the dynamic and continuous assessment of the quality of each overlay path, (2) the use of online network monitoring and statistical analyses that provide probabilistic guarantees about available path bandwidth, loss rate, and RTT, and (3) self-management, via an efficient packet routing and scheduling algorithm that dynamically schedules data packets to different overlay paths in accordance with their available bandwidths. IQ-Paths offer probabilistic guarantees for application-level specifications of stream utility, based on statistical predictions of available network bandwidth. This affords applications with the ability, for instance, to send control or steering data across overlay paths that offer strong guarantees for future bandwidth vs. across less guaranteed paths. Experimental results presented in this paper use IQ-Paths to better handle the different ki- - nds of data produced by two high performance applications: (1) a data-driven or interactive high performance code with user-defined utility requirements and (2) an adaptive overlay version of the popular Grid-FTP application View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • WOW: Self-Organizing Wide Area Overlay Networks of Virtual Workstations

    Page(s): 30 - 42
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (472 KB) |  | HTML iconHTML  

    This paper describes WOW, a distributed system that combines virtual machine, overlay networking and peer-to-peer techniques to create scalable wide-area networks of virtual workstations for high-throughput computing. The system is architected to: facilitate the addition of nodes to a pool of resources through the use of system virtual machines (VMs) and self-organizing virtual network links; to maintain IP connectivity even if VMs migrate across network domains; and to present to end-users and applications an environment that is functionally identical to a local-area network or cluster of workstations. We describe a novel, extensible user-level decentralized technique to discover, establish and maintain overlay links to tunnel IP packets over different transports (including UDP and TCP) and across firewalls. We also report on several experiments conducted on a testbed WOW deployment with 118 P2P router nodes over PlanetLab and 33 VMware-based VM nodes distributed across six firewalled domains. Experiments show that the latency in joining a WOW network is of the order of seconds: in a set of 300 trials, 90% of the nodes self-configured P2P routes within 10 seconds, and more than 99% established direct connections to other nodes within 200 seconds. Experiments also show that the testbed delivers good performance for two unmodified, representative benchmarks drawn from the life-sciences domain. The testbed WOW achieves an overall throughput of 53 jobs/minute for PBS-scheduled executions of the MEME application (with average single-job sequential running time of 24.1s) and a parallel speedup of 13.5 for the PVM-based fastDNAml application. Experiments also demonstrate that the system is capable of seamlessly maintaining connectivity at the virtual IP layer for typical client/server applications (NFS, SSH, PBS) when VMs migrate across a WAN View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Applications [breaker page]

    Page(s): 43 - 44
    Save to Project icon | Request Permissions | PDF file iconPDF (92 KB)  
    Freely Available from IEEE
  • A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs

    Page(s): 45 - 56
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (282 KB) |  | HTML iconHTML  

    Active Harmony is an automated runtime performance tuning system. In this paper we describe several case studies of using Active Harmony to improve the performance for scientific libraries and applications. We improved the tuning mechanism so it can work iteratively with benchmarking runs. By tuning the computation and data distribution, Active Harmony helps applications that utilize the PETSc library to achieve better load balance and to reduce the execution time up to 18%. For the climate simulation application POP using 480 processors, the tuning results show that by changing the block size and parameter values, the execution time is reduced up to 16.7%. Active Harmony is able to improve GS2, a plasma physics code, up to a factor of 5.1 times faster. The experiment results show that the Active Harmony system is a feasible and useful tool to automated performance tuning for scientific libraries and applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Path Grammar Guided Trace Compression and Trace Approximation

    Page(s): 57 - 68
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (268 KB) |  | HTML iconHTML  

    Trace-driven simulation is an important technique used in the evaluation of computer architecture innovations. However using it for studying parallel computers and applications is at best very challenging. Acquiring, representing and storing the traces are among the major issues. In this paper, we introduce path grammar guided trace compression (PGGTC) and effective address trace approximation (TA) to speedup compression and reduce trace sizes. PGGTC relies on static analysis to build rules and determine actions to guide online trace compression. Combined with gzip, PGGTC can compresses control flow traces over 330 times smaller than using gzip alone. Compared to the widely popular Sequitur algorithm alone, PGGTC with gzip is on average 40 times faster, while the traces are only 3 times bigger. PGGTC can be also used with Sequitur to double the compression ratios of Sequitur by itself and do it 14 times faster than Sequitur by itself. Address traces of parallel applications with significant randomness are often impossibly large even after being compressed with any lossless scheme including PGGTC. For effective address trace reduction, we introduce trace approximation (TA). Performance-wise similar effective addresses are generated based on very compact summaries of how the memory is accessed during each structure instance instead of compressing them. We demonstrate two approaches: selective dumping and memory signatures, to summarize the properties of effective address sequences. Both approaches are validated by feeding the generated approximate trace to cache simulators of 25 different configurations. The simulated results are very close to the simulation results based on full effective traces while the selective dumped address or memory signatures require several order of magnitude less disk space to store. In summary, we move trace-driven simulation into the realm of the feasible for larger parallel machines and applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Filecules in High-Energy Physics: Characteristics and Impact on Resource Management

    Page(s): 69 - 80
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (607 KB) |  | HTML iconHTML  

    Grid computing has reached the stage where deployments are mature and many collaborations run in production mode. Mature grid deployments offer the opportunity for revisiting and perhaps updating traditional beliefs related to workload models, which in turn leads to the re-evaluation of traditional resource management techniques. This paper analyzes usage patterns in a typical grid community, a large-scale data-intensive scientific collaboration in high-energy physics. We focus mainly on data usage, since data is the major resource for this class of applications. Our observations led us to propose a new abstraction for resource management in scientific data analysis applications: we define a filecule as a group of files that is always used together. We show that filecules exist and present their characteristics. The existence of filecules suggests a new granularity for data management, which, if incorporated in design, can significantly outperform the traditional solutions for data caching, replication and placement based on single-file granularity. We reason about the impact of filecules on resource management and show compelling evidence for using this abstraction when designing data management services View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault Tolerance and Reliability

    Page(s): 81 - 82
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (92 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault Tolerance of Tornado Codes for Archival Storage

    Page(s): 83 - 92
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (317 KB) |  | HTML iconHTML  

    This paper examines a class of low density parity check (LDPC) erasure codes called Tornado codes for applications in archival storage systems. The fault tolerance of Tornado code graphs is analyzed and it is shown that it is possible to identify and mitigate worst-case failure scenarios in small (96 node) graphs through use of simulations to find and eliminate critical node sets that can cause Tornado codes to fail even when almost all blocks are present. The graph construction procedure resulting from the preceding analysis is then used to construct a 96-device Tornado code storage system with capacity overhead equivalent to RAID 10 that tolerates any 4 device failures. This system is demonstrated to be superior to other parity-based RAID systems. Finally, it is described how a geographically distributed data stewarding system can be enhanced by using cooperatively selected Tornado code graphs to obtain fault tolerance exceeding that of its constituent storage sites or site replication strategies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource Availability Prediction in Fine-Grained Cycle Sharing Systems

    Page(s): 93 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (332 KB) |  | HTML iconHTML  

    Fine-grained cycle sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users of a host. A characteristic of such resources is that they are generally provided voluntarily and their availability fluctuates highly. Guest jobs may fail because of unexpected resource unavailability. To provide fault tolerance to guest jobs without adding significant computational overhead, it requires to predict future resource availability. This paper presents a method for resource availability prediction in FGCS systems. It applies a semi-Markov Process and is based on a novel resource availability model, combining generic hardware-software failures with domain-specific resource behavior in FGCS. We describe the prediction framework and its implementation in a production FGCS system named iShare. Through the experiments on an iShare testbed, we demonstrate that the prediction achieves accuracy above 86% on average and outperforms linear time series models, while the computational cost is negligible. Our experimental results also show that the prediction is robust in the presence of irregular resource unavailability View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Replicating Nondeterministic Services on Grid Environments

    Page(s): 105 - 116
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (362 KB) |  | HTML iconHTML  

    Replication is a technique commonly used to increase the availability of services in distributed systems, including grid and Web services. While replication is relatively easy for services with fully deterministic behavior, grid and Web services often include nondeterministic operations. The traditional way to replicate such nondeterministic services is to use the primary-backup approach. While this is straightforward in synchronous systems with perfect failure detection, typical grid environments are not usually considered to be synchronous systems. This paper addresses the problem of replicating nondeterministic services by designing a protocol based on Paxos and proposing two performance optimizations suitable for replicated grid services. The first improves the performance in the case where some service operations do not change the service state, while the second optimizes grid service requests that use transactions. Evaluations done both on a local cluster and on Planet-Lab demonstrate that these optimizations significantly reduce the service response time and increase the throughput of replicated services View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource management [breaker page]

    Page(s): 117 - 118
    Save to Project icon | Request Permissions | PDF file iconPDF (92 KB)  
    Freely Available from IEEE
  • Service contracts and aggregate utility functions

    Page(s): 119 - 131
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (664 KB) |  | HTML iconHTML  

    Utility functions are used by clients of a service to communicate the value of a piece of work and other QoS aspects such as its timely completion. However, utility functions on individual work items do not capture how important it is to complete all or part of a batch of items; for this purpose, a higher-level construct is required. We propose a multi-job aggregate-utility function, and show how a service provider that executes jobs on rented resources can use it to drive admission control and job scheduling decisions. Using a profit-seeking approach to its policies, we find that the service provider can cope gracefully with client overload and varying resource availability. The result is significantly greater value delivered to clients, and higher profit (net value) generated for the service provider View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Market-Based Resource Allocation using Price Prediction in a High Performance Computing Grid for Scientific Applications

    Page(s): 132 - 143
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (579 KB) |  | HTML iconHTML  

    We present the implementation and analysis of a market-based resource allocation system for computational grids. Although grids provide a way to share resources and take advantage of statistical multiplexing, a variety of challenges remain. One is the economically efficient allocation of resources to users from disparate organizations who have their own and sometimes conflicting requirements for both the quantity and quality of services. Another is secure and scalable authorization despite rapidly changing allocations. Our solution to both of these challenges is to use a market-based resource allocation system. This system allows users to express diverse quantity- and quality-of-service requirements, yet prevents them from denying service to other users. It does this by providing tools to the user to predict and tradeoff risk and expected return in the computational market. In addition, the system enables secure and scalable authorization by using signed money-transfer tokens instead of identity-based authorization. This removes the overhead of maintaining and updating access control lists, while restricting usage based on the amount of money transferred. We examine the performance of the system by running a bioinformatics application on a fully operational implementation of an integrated grid market View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal Bandwidth Sharing in Grid Environments

    Page(s): 144 - 155
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (412 KB) |  | HTML iconHTML  

    We consider the problem of bulk data transfers and bandwidth sharing in the context of grid infrastructures. Grid computing empowers high-performance computing in a large-scale distributed environment. Network bandwidth, which makes the expensive computational and storage resources work in concert, plays an active role on carrying grid applications traffic. Due to specific traffic patterns and application scenarios, grid network resource management encounters new challenges. From the bandwidth sharing perspective, this article looks at network bandwidth shared among computing and storage elements. Referred to as short-lived, grid data requests with transmission window and volume are scheduled in the network. By manipulating the transmission window, the request accept rate and network resource utilization are to be optimized. The formulated optimization problem is proven NP-complete. Associated with proposed heuristics, simulations are carried out to illustrate the pros and cons of each bandwidth sharing strategy and its application scenarios. A tuning factor, that allows for adapting performance objective, is introduced to adjust network infrastructure and workload View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Tool for Prioritizing DAGMan Jobs and Its Evaluation

    Page(s): 156 - 168
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (627 KB) |  | HTML iconHTML  

    It is often difficult to perform efficiently a collection of jobs with complex job dependencies due to temporal unpredictability of the grid. One way to mitigate the unpredictability is to schedule job execution in a manner that constantly maximizes the number of jobs that can be sent to workers. A recently developed scheduling theory provides a basis to meet that optimization goal. Intuitively, when the number of such jobs is always large, high parallelism can be maintained, even if the number of workers changes over time in an unpredictable manner. In this paper we present the design, implementation, and evaluation of a practical scheduling tool inspired by the theory. Given a DAGMan input file with interdependent jobs, the tool prioritizes the jobs. The resulting schedule significantly outperforms currently used schedules under a wide range of system parameters, as shown by simulation studies. For example, a scientific data analysis application, AIRSN, was executed at least 13% faster with 95% confidence. An implementation of the tool was integrated with the Condor high-throughput computing system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Software Environments

    Page(s): 169 - 170
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (92 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Motor: A Virtual Machine for High Performance Computing

    Page(s): 171 - 182
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (342 KB) |  | HTML iconHTML  

    High performance application development remains challenging, particularly for scientists making the transition to a grid environment. In general areas of computing, virtual environments such as Java and .Net have proved successful in fostering application development. Unfortunately, these existing virtual environments do not provide the necessary high performance computing abstractions required by e-scientists. In response, we propose and demonstrate a new approach to the development of a high performance virtual infrastructure: Motor is a virtual machine developed by integrating a high performance message passing library directly within a virtual infrastructure. Motor provides high performance application developers with a common runtime, garbage collection and system libraries, including high performance message passing, whilst retaining strong message passing performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Runtime Support for Memory Adaptation in Scientific Applications via Local Disk and Remote Memory

    Page(s): 183 - 194
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (522 KB) |  | HTML iconHTML  

    The ever increasing memory demands of many scientific applications and the complexity of today's shared computational resources still require the occasional use of virtual memory, network memory, or even out-of-core implementations, with well known drawbacks in performance and usability. In this paper, we present a general framework, based on our earlier MML B prototype, that enables fully customizable, memory malleability in a wide variety of scientific applications. We provide several necessary enhancements to the environment sensing capabilities of MMLIB and introduce a remote memory capability, based on MPI communication of cached memory blocks between `compute nodes' and designated memory servers. We show experimental results from three important scientific applications that require the general MML B framework. Under constant memory pressure, we observe execution time improvements of factors between three and five over relying solely on the virtual memory system. With remote memory employed, these factors are even larger and significantly better than other, system-level remote memory implementations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Building a Generic SOAP Framework over Binary XML

    Page(s): 195 - 204
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (293 KB) |  | HTML iconHTML  

    The prevailing binding of SOAP to HTTP specifies that SOAP messages be encoded as an XML 1.0 document which is then sent between client and server. XML processing however can be slow and memory intensive, especially for scientific data, and consequently SOAP has been regarded as an inappropriate protocol for scientific data. Efficiency considerations thus lead to the prevailing practice of separating data from the SOAP control channel. Instead, it is stored in specialized binary formats and transmitted either via attachments or indirectly via a file sharing mechanism, such as GridFTP or HTTP. This separation invariably complicates development due to the multiple libraries and type systems to be handled; furthermore it suffers from performance issues, especially when handling small binary data. As an alternative solution, binary XML provides a highly efficient encoding scheme for binary data in the XML and SOAP messages, and with it we can gain high performance as well as unifying the development environment without unduly impacting the Web service protocol stack. In this paper we present our implementation of a generic SOAP engine that supports both textual XML and binary XML as the encoding scheme of the message. We also present our binary XML data model and encoding scheme. Our experiments show that for scientific applications binary XML together with the generic SOAP implementation not only ease development, but also provide better performance and are more widely applicable than the commonly used separated schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • I/O

    Page(s): 205 - 206
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (92 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving I/O Performance of Clustered Storage Systems by Adaptive Request Distribution

    Page(s): 207 - 217
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (277 KB) |  | HTML iconHTML  

    We develop an adaptive load distribution protocol for logical volume I/O workload in clustered storage systems. It exploits data redundancy among decentralized storage servers to dynamically route I/O workload on a per-request basis, offering short-term load balancing and improved I/O performance. Our protocol builds on tunable hashing techniques and is based purely on client logic. Therefore, it does not limit system scalability and requires no change to the existing infrastructure. It distributes the I/O requests of a client to storage servers selected adoptively by a decentralized tunable hashing scheme, and, applies different policies to read and write requests. It also makes no assumption about inter-server communication latency and thus is robust to different network configurations. It supports both replication and erasure coding data redundancy schemes. Experimental results show that our protocol performs closely to a centralized load-balancing algorithm and verify the robustness of our protocol View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.