By Topic

High Performance Distributed Computing, 1997. Proceedings. The Sixth IEEE International Symposium on

Date 5-8 Aug. 1997

Filter Results

Displaying Results 1 - 25 of 39
  • Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)

    Save to Project icon | Request Permissions | PDF file iconPDF (224 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 377
    Save to Project icon | Request Permissions | PDF file iconPDF (66 KB)  
    Freely Available from IEEE
  • Replaying distributed programs without message logging

    Page(s): 137 - 147
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1008 KB)  

    Debugging long program runs can be difficult because of the delays required to repeatedly re-run the execution. Even a moderately long run of five minutes can incur aggravating delays. To address this problem, techniques exist that allow re-executing a distributed program from intermediate points by using combinations of checkpointing and message logging. In this paper we explore another idea: how to support replay without logging the contents of any message. When no messages are logged, the set of global states from which replay is possible is constrained, and it has been unknown how to compute this set without exhaustively searching the space of all global states, whose size is exponential in the number of processes. We present a simple and efficient hybrid on-the-fly/post-mortem algorithm for detecting the necessary and sufficient conditions under which parts of the execution can be replayed without message logs. A small amount of trace (two vectors) is recorded at each checkpoint and a fast post-mortem algorithm computes global states from which replay can begin. This algorithm is independent of the checkpointing technique used View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Packing messages as a tool for boosting the performance of total ordering protocols

    Page(s): 233 - 242
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (872 KB)  

    This paper compares the throughput and latency of four protocols that provide total ordering. Two of these protocols are measured with and without message packing. We used a technique that buffers application messages for a short period of time before sending them, so more messages are packed together. The main conclusion of this comparison is that message packing influences the performance of total ordering protocols under high load overwhelmingly more than any other optimization that was checked in this paper, both in terms of throughput and latency. This improved performance is attributed to the fact that packing messages reduces the header overhead for messages, the contention on the network, and the load on the receiving CPUs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predicting slowdown for networked workstations

    Page(s): 92 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (936 KB)  

    Most applications share the resources of networked workstations with other applications. Since system load can vary dramatically, allocation strategies that assume that resources have a constant availability and/or capability are unlikely to promote performance-efficient allocations in practice. In order to best allocate application tasks to machines, it is critical to provide a realistic model of the effects of contention on application performance. In this paper, we present a model that provides an estimate of the slowdown imposed by competing load on applications targeted to high-performance clusters and networks of workstations. The model provides a basis for predicting realistic communication and computation costs and is shown to achieve good accuracy for a set of scientific benchmarks commonly found in high-performance applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Collective buffering: Improving parallel I/O performance

    Page(s): 148 - 157
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (900 KB)  

    “Parallel I/O” is the support of a single parallel application run on many nodes; application data is distributed among the nodes, and is read or written to a single logical file, itself spread across nodes and disks. Parallel I/O is a mapping problem from the data layout in node memory to the file layout on disks. Since the mapping can be quite complicated and involve significant data movement, optimizing the mapping is critical for performance. We discuss our general model of the problem, describe four Collective Buffering algorithms we designed, and report experiments testing their performance on an Intel Paragon and an IBM SP2 both housed at NASA Ames Research Center. Our experiments show improvements of up to two order of magnitude over standard techniques and the potential to deliver peak performance with minimal hardware support View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Flexible general purpose communication primitives for distributed systems

    Page(s): 201 - 210
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (724 KB)  

    This paper presents the slotted-FIFO communication mode that supports communication primitives for the entire spectrum of reliability and ordering requirements of distributed applications: FIFO as well as non-FIFO, and reliable as well as unreliable communication. Hence, the slotted-FIFO communication mode is suitable for multimedia applications, as well as non real-time distributed applications. As FIFO ordering is not required for all messages, message buffering requirements are considerably reduced. Also, message latencies are lower. We quantify such advantages by means of a simulation study. A low overhead protocol implementing slotted-FIFO communication is also presented. The protocol incurs a small resequencing cost View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PARDIS: A parallel approach to CORBA

    Page(s): 31 - 39
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (832 KB)  

    This paper describes PARDIS, a system containing explicit support for interoperability of PARallel DIStributed applications. PARDIS is based on the Common Object Request Broker Architecture (CORBA). Like CORBA, it provides interoperability between heterogeneous components by specifying their interfaces in a meta-language, the CORBA IDL, which call be translated into the language of interacting components. However, PARDIS extends the CORBA object model by introducing SPMD objects representing data-parallel computations. SPMD objects allow the request broker to interact directly with the distributed resources of a parallel application. This capability ensures request delivery to all the computing threads of a parallel application and allows the request broker to transfer distributed arguments directly between the computing threads of the client and the server. To support this kind of argument transfer, PARDIS defines a distributed argument type-distributed sequence-a generalization of CORBA sequence representing distributed data structures of parallel applications. In this paper we will give a brief description of basic component interaction in PARDIS and give an account of the rationale and support for SPMD objects and distributed sequences. We will then describe two ways of implementing argument transfer in invocations on SPMD objects and evaluate and compare their performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A directory service for configuring high-performance distributed computations

    Page(s): 365 - 375
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (964 KB)  

    High-performance execution in distributed computing environments often requires careful selection and configuration not only of computers, networks, and other resources but also of the protocols and algorithms used by applications. Selection and configuration in turn require access to accurate, up-to-date information on the structure and state of available resources. Unfortunately no standard mechanism exists for organizing or accessing such information. Consequently different tools and applications adopt ad hoc mechanisms, or they compromise their portability and performance by using default configurations. We propose a Metacomputing Directory Service that provides efficient and scalable access to diverse, dynamic, and distributed information about resource structure and state. We define an extensible data model to represent required information and present a scalable, high-performance, distributed implementation. The data representation and application programming interface are adopted from the Lightweight Directory Access Protocol; the data model and implementation are new. We use the Globus distributed computing toolkit to illustrate how this directory service enables the development of more flexible and efficient distributed computing services and applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The software architecture of a virtual distributed computing environment

    Page(s): 40 - 49
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (948 KB)  

    The requirements of grand challenge problems and the deployment of gigabit networks makes the network computing framework an attractive and cost effective computing environment with which to interconnect geographically distributed processing and storage resources. Our project, Virtual Distributed Computing Environment (VDCE), provides a problem-solving environment for high-performance distributed computing over wide area networks. VDCE delivers well-defined library functions that relieve end-users of tedious task implementations and also support reusability. In this paper we present the conceptual design of VDCE software architecture, which is defined in three modules: (a) the Application Editor, a user-friendly application development environment that generates the Application Flow Graph (AFG) of an application; (b) the Application Scheduler, which provides an efficient task-to-resource mapping of AFG; and (c) the VDCE Runtime System, which is responsible for running and managing application execution and monitoring the VDCE resources View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed service paradigm for remote video retrieval request

    Page(s): 191 - 200
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (876 KB)  

    The per service cost have been serious impediment to wide spread usage of on-line digital continuous media service, especially in the entertainment arena. Although handling the continuous media may be achievable due to the technology advances in past few years, its competitiveness in the market with the existing service type such as video rental is still in question. In this paper, we propose a service paradigm for continuous media delivery in a distributed infrastructure in an effort to reduce the resource requirement to support a set of service requests. The storage resource and network resource to support a set of requests should be properly quantified to a uniform metric to measure the efficiency of the service schedule. We developed a cost model which maps the given service schedule to a quantity. The proposed cost model is used to capture the amortized resource requirement of the schedule and thus to measure the efficiency of the schedule. We develop a scheduling algorithm which strategically replicates the requested continuous media files at the various intermediate storages View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance aspects of switched SCI systems

    Page(s): 223 - 231
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (764 KB)  

    The Scalable Coherent Interface (SCI) defines a high-speed interconnect that provides a coherent distributed shared memory system. With the use of switches separate rings can be connected to form large topology-independent configurations. It has been realized that congestion in SCI systems generates additional retry traffic which reduces the available communication bandwidth. This paper investigates additional flow control mechanisms for overloaded switches. They are based on a supplementary retry delay and show a significant throughput gain. Furthermore two different management schemes for the output buffers are investigated. Computer simulations are used to compare the models and to determine system parameters View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Channel allocation methods for data dissemination in mobile computing environments

    Page(s): 274 - 281
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (708 KB)  

    We discuss several channel allocation methods for data dissemination in mobile computing systems. We suggest that the broadcast and on-demand channels have different access performance under different system parameters and that a mobile cell should use a combination of both to obtain optimal access time for a given workload and system parameters. We study the data access efficiency of three channel configurations: all channels are used as on-demand channels (exclusive on-demand); all channels are used for broadcast (exclusive broadcast); and some channels are on-demand channels and some are broadcast channels (hybrid). Simulations on obtaining the optimal channel allocation for lightly-loaded, medium-loaded, and heavy-loaded conditions is conducted and the result shows that an optimal channel allocation significantly improves the system performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Forecasting network performance to support dynamic scheduling using the network weather service

    Page(s): 316 - 325
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (872 KB)  

    The Network Weather Service is a generalizable and extensible facility designed to provide dynamic resource performance forecasts in metacomputing environments. In this paper, we outline its design and detail the predictive performance of the forecasts it generates. While the forecasting methods are general, we focus on their ability to predict the TCP/IP end-to-end throughput and latency that is attainable by an application using systems located at different sites. Such network forecasts are needed both to support scheduling, and by the metacomputing software infrastructure to develop quality-of-service guarantees View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speed up your database client with adaptable multithreaded prefetching

    Page(s): 102 - 111
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (804 KB)  

    In many client/server object database applications, performance is limited by the delay in transferring pages from the server to the client. We present a prefetching technique that can avoid this delay, especially where there are several database servers. Part of the novelty of this approach lies in the way that multithreading on the client workstation is exploited, in particular for activities such as prefetching and flushing dirty pages to the server. Using our own complex object benchmark we analyze the performance of the prefetching technique with multiple clients, multiple servers and different buffer pool sizes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing layered communication protocols

    Page(s): 169 - 177
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB)  

    Layering of communication protocols offers many well-known advantages but typically leads to performance inefficiencies. We present a model for layering, and point out where the performance problems occur in stacks of layers using this model. We then investigate the common execution paths in these stacks and how to identify them. These paths are optimized using three techniques: optimizing the computation, compressing protocol headers, and delaying processing. All of the optimizations can be automated in a compiler with the help of minor annotations by the protocol designer. We describe the performance that we obtain after implementing the optimizations by hand on a full-scale system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design patterns for parallel computing using a network of processors

    Page(s): 293 - 304
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (912 KB)  

    High complexity of building parallel applications is often cited as one of the major impediments to the mainstream adoption of parallel computing. To deal with the complexity of software development, abstractions such as macros, functions, abstract data types, and objects are commonly employed by sequential as well as parallel programming models. This paper describes the concept of a design pattern for the development of parallel applications. A design pattern in our case describes a recurring parallel programming problem and a reusable solution to that problem. A design pattern is implemented as a reusable code skeleton for quick and reliable development of parallel applications. A parallel programming system, called DPnDP (Design Patterns and Distributed Processes), that employs such design patterns is described. In the past, parallel programming systems have allowed fast prototyping of parallel applications based on commonly occurring communication and synchronization structures. The uniqueness of our approach is in the use of a standard structure and interface for a design pattern. This has several important implications: first, design patterns can be defined and added to the system's library in an incremental manner without requiring any major modification of the system (extensibility). Second, customization of a parallel application is possible by mixing design patterns with low level parallel code resulting in a flexible and efficient parallel programming tool (flexibility). Also, a parallel design pattern can be parameterized to provide some variations in terms of structure and behavior View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel FFT on ATM-based networks of workstations

    Page(s): 2 - 11
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (836 KB)  

    In this paper, we first evaluate the performance degradation caused by unequal bandwidths on the execution of conventional parallel algorithms such as the fast Fourier transform on an ATM-based Network of Workstations. We then present a strategy based on dynamic redistribution of data points to reduce the bottlenecks caused by unequal bandwidths. We also extend this strategy to deal with processor heterogeneity. Using analysis and simulation we show that there is a considerable reduction in the runtime if the proposed redistribution strategy is adopted. The basic idea presented in this paper can also be used to improve the runtimes of parallel applications in connection-oriented environments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed-thread scheduling methods for reducing page-thrashing

    Page(s): 356 - 364
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (848 KB)  

    Although distributed threads on distributed shared memory (DSM) provide an easy programming model for distributed computer systems, it is not easy to build a high performance system with them, because a software DSM system is prone to page-thrashing. One way to reduce page-thrashing is to utilize thread migration, which leads to changes in page access patterns on DSM. In this paper, we propose thread scheduling methods based upon page access information and discuss an analytical model for evaluating this information. Then, we describe our implementation of distributed threads, PARSEC (Parallel software environment for workstation cluster). Using user-level threads, PARSEC implements thread migration and thread scheduling based upon the page access information. We also measure the performance of some applications with these thread scheduling methods. These measurements indicate that the thread scheduling methods greatly reduce page-thrashing and improve total system performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A secure communications infrastructure for high-performance distributed computing

    Page(s): 125 - 136
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1056 KB)  

    Applications that use high-speed networks to connect geographically distributed supercomputers, databases, and scientific instruments may operate over open networks and access valuable resources. Hence, they can require mechanisms for ensuring integrity and confidentiality of communications and for authenticating both users and resources. Security solutions developed for traditional client-server applications do not provide direct support for the program structures, programming tools, and performance requirements encountered in these applications. We address these requirements via a security-enhanced version of the Nexus communication library, which we use to provide secure versions of parallel libraries and languages, including the Message Passing Interface. These tools permit a fine degree of control over what, where, and when security mechanisms are applied. In particular, a single application can mix secure and nonsecure communication allowing the programmer to make fine-grained security/performance tradeoffs. We present performance results that quantify the performance of our infrastructure View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ASCI applications

    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (32 KB)  

    In discussions of ASCI, the high-profile procurements of large computers frequently figure prominently. However, from the outset of the ASCI program, applications have been recognized as the driver. These applications feature complex, multi-physics simulations of natural phenomena that generate massive data sets as output. As we have moved from computing systems dominated by parallel vector processing to massively parallel processing we have designed new applications from the ground up to take advantage of the new capabilities. Early payoffs from this effort include running problems that are one to two orders of magnitude larger than any we have been able to run in the past. With these larger problems, we are begining the computational exploration of domains in physics, chemistry and engineering that were previously closed. As we write these codes, issues associated with languages, debuggers and visualization tools have quickly risen to the surface. The process of running large problems has strained the computational infrastructure almost to the breaking point but indicates the direction for future work View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Argonne Voyager multimedia server

    Page(s): 71 - 80
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB)  

    With the growing presence of multimedia-enabled systems, we will see an integration of collaborative computing concepts into future scientific and technical workplaces. Desktop teleconferencing is common today, while more complex teleconferencing technology that relies on the availability of multipoint-enabled tools is starting to become available on PCs. A critical problem when using these collaborative tools is archiving multistream, multipoint meetings and making the content available to others. Ideally, one would like the ability to capture, record, play back, index, annotate, and distribute multimedia stream data as easily as we currently handle text or still-image data. The Argonne Voyager project is exploring and developing media server technology needed to provide such a flexible, virtual multipoint recording/playback capability. In this article we describe the motivating requirements, architecture, implementation, operation, performance, and related work View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cut-through delivery in Trapeze: An exercise in low-latency messaging

    Page(s): 243 - 252
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (872 KB)  

    New network technology continues to improve both the latency and bandwidth of communication in computer clusters. The fastest high-speed networks approach or exceed the I/O bus bandwidths of “gigabit-ready” hosts. These advances introduce new considerations for the design of network interfaces and messaging systems for low-latency communication. This paper investigates cut-through delivery, a technique for overlapping host I/O DMA transfers with network traversal. Cut-through delivery significantly reduces end-to-end latency of large messages, which are often critical for application performance. We have implemented cut-through delivery in Trapeze, a new messaging substrate for network memory and other distributed operating system services. Our current Trapeze prototype is capable of demand-fetching 8 K virtual memory pages in 200 μs across a Myrinet cluster of DEC AlphaStations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Utilizing heterogeneous networks in distributed parallel computing systems

    Page(s): 336 - 345
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1172 KB)  

    Heterogeneity is becoming quite common in distributed parallel computing systems, both in processor architectures and in communication networks. Different types of networks have different performance characteristics, while different types of messages may have different communication requirements. In this work, we analyze two techniques for exploiting these heterogeneous characteristics and requirements to reduce the communication overhead of parallel application programs executed on distributed computing systems. The performance based path selection (PBPS) technique selects the best (lowest latency) network among several for each individual message, while the second technique aggregates multiple networks into a single virtual network. We present a general approach for applying and evaluating these techniques to a distributed computing system with multiple interprocessor communication networks. We also generate performance curves for a cluster of IBM workstations interconnected with Ethernet, ATM, and Fibre Channel networks. As we show with several of the NAS benchmarks, these curves can be used to estimate the potential improvement in communication performance that can be obtained with these techniques, given some simple communication characteristics of an application program View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A distributed load balancing algorithm for the hot cell problem in cellular mobile networks

    Page(s): 254 - 263
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1052 KB)  

    We propose a novel channel management algorithm, called distributed load balancing with selective borrowing (D-LBSB), for cellular mobile networks. As an underlying approach, we start with a fixed channel assignment scheme where each cell is initially allocated a set of local channels, each to be assigned on demand to a user in that cell. The novelty of our D-LBSB scheme lies in handling the hot cell problem because it proposes to migrate unused channels from suitable cold cells to the hot ones through a distributed channel borrowing algorithm. With the help of a Markov model, the probability of a cell being hot and the call blocking probability in a cell are derived. Detailed simulation experiments are carried out in order to evaluate our proposed methodology. Performance comparison reveals that the D-LBSB scheme performs better than a centralized version in an overloaded system, and significantly better than several other existing schemes in terms of call blocking probability, under moderate and heavy loads View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.