By Topic

High Performance Distributed Computing, 1999. Proceedings. The Eighth International Symposium on

Date 6-6 Aug. 1999

Filter Results

Displaying Results 1 - 25 of 51
  • Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469)

    Save to Project icon | Request Permissions | PDF file iconPDF (293 KB)  
    Freely Available from IEEE
  • Toward a common component architecture for high-performance scientific computing

    Page(s): 115 - 124
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1047 KB)  

    Describes work in progress to develop a standard for interoperability among high-performance scientific components. This research stems from the growing recognition that the scientific community needs to better manage the complexity of multidisciplinary simulations and better address scalable performance issues on parallel and distributed architectures. The driving force for this is the need for fast connections among components that perform numerically intensive work and for parallel collective interactions among components that use multiple processes or threads. This paper focuses on the areas we believe are most crucial in this context, namely an interface definition language that supports scientific abstractions for specifying component interfaces and a port connection model for specifying component interactions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Page(s): 355 - 356
    Save to Project icon | Request Permissions | PDF file iconPDF (113 KB)  
    Freely Available from IEEE
  • Ninth IEEE International Symposium on - High Performance Distributed Computing

    Page(s): 357 - 359
    Save to Project icon | Request Permissions | PDF file iconPDF (142 KB)  
    Freely Available from IEEE
  • Data management for large-scale scientific computations in high performance distributed systems

    Page(s): 263 - 272
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (840 KB)  

    With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a new environment which is built around an active meta-data management system (MDMS). The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques to the MDMS. We discuss the importance of an active MDMS and show how the three components, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our initial implementation and illustrate that significant improvements are made possible without undue programming effort View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Remote application scheduling on metacomputing systems

    Page(s): 349 - 350
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (176 KB)  

    Efficient and robust metacomputing requires the decomposition of complex jobs into tasks that must be scheduled on distributed processing nodes. There are various ways of creating a schedule and implementing it efficiently, depending upon global system state knowledge. Many computations may be structured as process networks, where data is either pushed from the source node to the target node where it will be used, or is pulled from source to target at the instigation of the target. We have developed a metacomputing infrastructure to investigate this idea, which employs the concept of a rich data pointer, the DISCWorld Remote Access Mechanism (DRAM), which can point to either data or services, and can be traded in a client/multiple-server model. We present an extension of the DRAM concept and implementation to represent and describe data that has not yet been created, the “DRAM Future” (DRAMF). We show how the use of the DRAMF facilitates efficient metacomputing scheduling and runtime optimisation on high performance distributed systems. We present a recursive algorithm for determining the optimal placement of a job's components in the presence of partial system state information. This algorithm uses only a selected subset of all available processing nodes, and we implement it using DRAMFs. There are many research issues to consider when designing a robust and general algorithm for scheduling and process placement on distributed systems. We address some of these issues in our Distributed Information Systems Control World project[6] as do other research projects[2] View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Overview of a performance evaluation system for global computing scheduling algorithms

    Page(s): 97 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (596 KB)  

    While there have been several proposals of high-performance global computing systems, scheduling schemes for the systems have not been well investigated. The reason is difficulties of evaluation by large-scale benchmarks with reproducible results. Our Bricks performance evaluation system allows the analysis and comparison of various scheduling schemes in a typical high-performance global computing setting. Bricks can simulate various behaviors of global computing systems, especially the behavior of networks and resource scheduling algorithms. Moreover, Bricks is partitioned into components such that not only can its constituents be replaced to simulate various different system algorithms, but it also allows the incorporation of existing global computing components via its foreign interface. To test the validity of the latter characteristics, we incorporated the NWS (Network Weather Service) system, which monitors and forecasts global computing systems behavior. Experiments were conducted by running NWS under a real environment versus a Bricks-simulated environment, given the observed parameters of the real environment. We observed that Bricks behaved in the same manner as the real environment, and NWS also behaved similarly, making quite comparative forecasts under both environments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An approach for MPI based metacomputing

    Page(s): 333 - 334
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (184 KB)  

    Coupling computer resources to work on a given problem has been a successful strategy for years. Due to the existence of high bandwidth WAN it has become possible to couple big machines to build clusters that outperform the most powerful existing single computers. The article discusses such a powerful cluster, a metacomputer. While for MPPs the message passing model was standardized already 4 years ago (1995), a standard for interoperable MPI has only been published recently. We briefly present an approach for MPI based metacomputing called PACX-MPI. We also describe other approaches, and give an overview of the technical concept View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A methodology for supporting collaborative exploratory analysis of massive data sets in tele-immersive environments

    Page(s): 62 - 69
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (640 KB)  

    This paper proposes a methodology for employing collaborative, immersive virtual environments as a high-end visualization interface for massive data-sets. The methodology employs feature detection, partitioning, summarization and decimation to significantly cull massive data-sets. These reduced data-sets are then distributed to the remote CAVEs, ImmersaDesks and desktop workstations for viewing. The paper also discusses novel techniques for collaborative visualization and meta-data creation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurately measuring MPI broadcasts in a computational grid

    Page(s): 29 - 37
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (500 KB)  

    An MPI library's implementation of broadcast communication can significantly affect the performance of applications built with that library. In order to choose between similar implementations or to evaluate available libraries, accurate measurements of broadcast performance are required. As we demonstrate, existing methods for measuring broadcast performance are either inaccurate or inadequate. Fortunately, we have designed an accurate method for measuring broadcast performance, even in a challenging grid environment. Measuring broadcast performance is not easy. Simply sending one broadcast after another allows them to proceed through the network concurrently, thus resulting in inaccurate per broadcast timings. Existing methods either fail to eliminate this pipelining effect or eliminate it by introducing overheads that are as difficult to measure as the performance of the broadcast itself. This problem becomes even more challenging in grid environments. Latencies along different links can vary significantly. Thus, an algorithm's performance is difficult to predict from it's communication pattern. Even when accurate prediction is possible, the pattern is often unknown. Our method introduces a measurable overhead to eliminate the pipelining effect, regardless of variations in link latencies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Baglets: adding hierarchical scheduling to Aglets

    Page(s): 229 - 235
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (468 KB)  

    A significant number of new Java-based technologies for mobile code (aka agents) have recently emerged. The `Aglets' system, from IBM's research labs, provides an elegant mechanism for creating mobile code, but lacks a native scheduling mechanism for determining where code should be executed. We present the results of an investigation into adding sophisticated scheduling capabilities to Aglets (which we refer to as brilliant Aglets, or Baglets) from the H-SWEB project, which provides hierarchical scheduling across sets of WWW server clusters. H-SWEB uses scheduling techniques in monitoring and adapting to workload variation at distributed server clusters for supporting distributed computation. We show how the two systems can be integrated, and present several algorithms indicating a major advantage (over 350%) can be achieved through the use of dynamic scheduling information. We provide a detailed discussion of our system architecture and implementation, and briefly summarize the experimental results which have been achieved View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using Gateway system to provide a desktop access to high performance computational resources

    Page(s): 294 - 298
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (372 KB)  

    In this paper, we discuss the use of Gateway for seamless desktop access to high performance resources. We illustrate our ideas with two Gateway applications that require access to remote resources: the Landscape Management System (LMS) and Quantum Simulations (QS). For LMS we use Gateway to retrieve data from many different sources as well as to allocate remote computational resources needed to solve the problem at hand. Gateway transparently controls the necessary data transfer between hosts for the user. Quantum Simulations requires access to HPCC resources and therefore we layered Gateway on top of the Globus metacomputing toolkit. This way Gateway plays the role of a job broker for Globus View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatially decomposed multigrain MOM problems on NOWs

    Page(s): 149 - 155
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (412 KB)  

    Integral equations solved by the method of moments (MOM) are an important and computationally intense class of problems in antenna design and other areas of electromagnetics. Particularly when structures become electrically large, MOM solutions become intractable as they lead to large, densely-filled, complex matrices, the solution of which is numerically and computationally intensive. Several numerical techniques have been applied to make these solutions more tractable, notably spatial decomposition and a multigrain method known as reduced current fidelity. This paper investigates the impact of these techniques on parallel computing approaches to solving this class of problem, using networks of workstations (NOWs). The resulting effects on the efficiency and communication patterns of the parallel programs are explored View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An evaluation of linear models for host load prediction

    Page(s): 87 - 96
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (968 KB)  

    Evaluates linear models for predicting the Digital Unix five-second host load average from 1 to 30 seconds into the future. A detailed statistical study of a large number of long, fine-grain load traces from a variety of real machines leads to consideration of the Box-Jenkins (1994) models (AR, MA, ARMA, ARIMA), and the ARFIMA (autoregressive fractional integrated moving average) models (due to self-similarity). These models, as well as a simple windowed-mean scheme, are then rigorously evaluated by running a large number of randomized test cases on the load traces and by data-mining their results. The main conclusions are that the load is consistently predictable to a very useful degree, and that the simpler models, such as AR, are sufficient for performing this prediction View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A performance broker for CORBA

    Page(s): 19 - 26
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB)  

    CORBA applications can transparently use service instances running on the client's machine, on the local-area network, or across the Internet. Standard CORBA services help the application locate service instances, but do not provide a mechanism to identify service instances that will give good performance. The PerformanceBroker executes performance test suites on application service instances and selects service instances that will give superior application performance. The Broker weights performance test results according to client-specified criteria to choose the service instances that will provide the best application performance and allocates those service instances to clients. Tests with a distributed ray tracing application show that service instances chosen by the PerformanceBroker give better performance than service instances chosen by round-robin or random selection in local-area network and Internet environments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MATmarks: a shared memory environment for MATLAB programming

    Page(s): 341 - 342
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (148 KB)  

    MATmarks is an extension of the MATLAB tool that enables shared memory programming on a network of workstations by adding a small set of commands. The authors present a high level overview of the MATmarks system, the commands we added to MATLAB, and the performance gains we achieved as a result View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MPI code encapsulating using parallel CORBA object

    Page(s): 3 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (728 KB)  

    This paper describes a technique that allows an MPI code to be encapsulated into a component. Our technique is based on an extension to the Common Object Request Broker Architecture (CORBA) from the OMG (Object Management Group). The proposed extensions do not modify the CORBA core infrastructure (the Object Request Broker) so that it can fully co-exist with existing CORBA applications. An MPI code is seen as a new kind of CORBA object that hides most of the cumbersome problems when dealing with parallelism. Such technique can be used to connect MPI codes to existing CORBA software infrastructures which are now being developed in the framework of several research and development projects such as JACO3, JULIUS or TENT from DLR. To illustrate the concept of parallel CORBA object, we present a virtual reality application that is made of the coupling of a light simulation application (radiosity) and a visualisation tool using VRML and Java View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using embedded network processors to implement global memory management in a workstation cluster

    Page(s): 319 - 328
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB)  

    Advances in network technology continue to improve the communication performance of workstation and PC clusters, making high-performance workstation-cluster computing increasingly viable. These hardware advances, however, are taxing traditional host-software network protocols to breaking point. A modern gigabit network can swamp a host's IO bus and processor, limiting communication performance and slowing computation unacceptably. Fortunately, host-programmable network processors used by these networks present a potential solution. Offloading selected host processing to these embedded network processors lowers host overhead and improves latency. This paper examines the use of embedded network processors to improve the performance of workstation-cluster global memory management. We have implemented a revised version of the GMS global memory system that eliminates host overhead by as much as 29% on active nodes and improves page fault latency by as much as 39% View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault tolerant computing on the grid: what are my options?

    Page(s): 351 - 352
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (140 KB)  

    High-performance distributed computing across wide-area networks has become an active topic of research. Achieving large-scale distributed computing in a seamless manner introduces a number of difficult problems. This paper examines one of the most critical problems, fault tolerance. We have examined fault tolerance options for a common class of high-performance parallel applications, single-program-multiple-data (SPMD). Performance models for two fault tolerance methods, checkpoint-recovery (CR) and wide-area replication (WR), have been developed. These models enable quantitative comparisons of the two methods as applied to SPMD applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High performance phylogenetic inference

    Page(s): 335 - 336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (160 KB)  

    Phylogenetic analysis is an integral part of many biological research programs. In essence, it is the study of gene genealogy. It is the study of gene mutation and the generational relationships. Phylogenetic analysis is being used in many diverse areas such as human epidemiology, viral transmission, biogeography, and systematics. Researchers are now commonly generating many DNA sequences from many individuals, thus creating very large data sets. However, our ability to analyze the data has not kept pace with data generation, and phylogenetics has now reached a crossroads where we cannot effectively analyze the data we generate. The chief challenge of phylogenetic systematics in the next century will be to develop algorithms and search strategies to effectively analyze large data sets. The crux of the computational problem is that the actual landscape of possible topologies can be extraordinarily difficult to evaluate with large data sets. The parsimony ratchet is actually a family of iterative tree search methods that use a statistical approach to sampling tree islands and ultimately finding the most parsimonious trees for a data set. Each iteration of the parsimony ratchet may occur in parallel as there is no direct dependency between iterations. The authors' implementation of the parallel ratchet is masterworker based. A master process is launched in the DOGMA system which then launches worker tasks on available nodes. Each worker task is simply wrapper code that is used to interact with the newest release version of NONA View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Direct queries for discovering network resource properties in a distributed environment

    Page(s): 38 - 46
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (716 KB)  

    The development and performance of network-aware applications depends on the availability of accurate predictions of network resource properties. Obtaining this information directly from the network is a scalable solution that provides the accurate performance predictions and topology information needed for planning and adapting application behavior across a variety of networks. The performance predictions obtained directly from the network are as accurate as application-level benchmarks, but the network-based technique provides the added advantages of scalability and topology discovery. We describe how to determine network properties directly from the network using SNMP. We provide an overview of SNMP and describe the features it provides that make it possible to extract both available bandwidth and network topology information from network devices. The available bandwidth predictions based on network queries using SNMP are compared with traditional predictions based on application history to demonstrate that they are equally useful. To demonstrate the feasibility of topology discovery, we present results for a large Ethernet at CMU View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Job-length estimation and performance in backfilling schedulers

    Page(s): 236 - 243
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (588 KB)  

    Backfilling is a simple and effective way of improving the utilization of space-sharing schedulers. Simple first-come-first-served approaches are ineffective because large jobs can fragment the available resources. Backfilling schedulers address this problem by allowing jobs to move ahead in the queue, provided that they will not delay subsequent jobs. Previous research has shown that inaccurate estimates of execution times can lead to better backfilling schedules. We characterize this effect on several workloads, and show that average slowdowns can be effectively reduced by systematically lengthening estimated execution times. Further, we show that the average job slowdown metric can be addressed directly by sorting jobs by increasing execution time. Finally, we modify our sorting scheduler to ensure that incoming jobs can be given hard guarantees. The resulting scheduler guarantees to avoid starvation, and performs significantly better than previous backfilling schedulers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An approach to immersive performance visualization of parallel and wide-area distributed applications

    Page(s): 247 - 254
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (612 KB)  

    Complex, distributed applications pose new challenges for performance analysis and optimization. This paper outlines an online approach to performance analysis where developers are active participants, using integrated measurement and immersive performance visualization to tune parallel and distributed applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic task migration in home-based software DSM systems

    Page(s): 339 - 340
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (176 KB)  

    Dynamic task migration is an effective strategy to maximize the performance and resource utilization in metacomputing environments. Traditionally, however, a “task” means only the corresponding code of computation, i.e., the data related to this computation is usually neglected. As such, when one task is migrated from processor A to processor B, the data required by this task remains on processor A. Thus, the processor B has to perform remote communication when it executes this task, eliminating the advantage of task migration, or even further degrading the performance. Hence, the definition of the traditional “task” should be revisited. We define a task as follows: Task=Computation subtask+Data subtask. Computation subtask is the program code to be executed, while the data subtask is the operations to access the related data located in memory. In fact, as the speed gap between processors and memory becomes larger and larger, the importance of data subtask becomes more obvious than before. Therefore, we argue that both subtasks should be migrated to a new processor during task migration. Based on this observation, we propose a dynamic loop-level task migration scheme. This scheme is implemented within the context of the JIAJIA software DSM system (W. Ha et al., 1999). The evaluation results show that the task migration scheme improves the performance of our benchmark applications by 36% to 50% compared with static task allocation schemes. As a result, the new scheme performs an average of 30% better than other computation-only migration schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Grids as production computing environments: the engineering aspects of NASA's Information Power Grid

    Page(s): 197 - 204
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (692 KB)  

    Information Power Grid (IPG) is the name of NASA's project to build a fully distributed computing and data management environment-a Grid. The IPG project has near, medium, and long-term goals that represent a continuum of engineering, development, and research topics. The overall goal is to provide the NASA scientific and engineering communities a substantial increase in their ability to solve problems that depend on use of large-scale and/or dispersed resources: aggregated computing, diverse data archives, laboratory instruments and engineering test facilities, and human collaborators. The approach involves infrastructure and services than can locate, aggregate, integrate, and manage resources from across the NASA enterprise. An important aspect of IPG is to produce a common view of these resources, and at the same time provide for distributed management and local control. In addition to addressing the overall goal of enhanced science and engineering, there is a potential important side effect. With a large collection of resources that have common use interfaces and a common management approach, the potential exists for a considerable pool of computing capability that could relatively easily, for example, be called on in extraordinary situations such as crisis response View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.