By Topic

Distributed Computing Systems, 1999. Proceedings. 19th IEEE International Conference on

Date 5-5 June 1999

Filter Results

Displaying Results 1 - 25 of 59
  • Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003)

    Save to Project icon | Request Permissions | PDF file iconPDF (301 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 553 - 554
    Save to Project icon | Request Permissions | PDF file iconPDF (221 KB)  
    Freely Available from IEEE
  • Mobile agent programming in Ajanta

    Page(s): 190 - 197
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (84 KB)  

    The paper gives an overview of Ajanta, a Java based system for mobile agent programming. We outline the Ajanta architecture, and discuss the basic elements that comprise an agent based application. Ajanta's programming environment is defined in terms of a set of primitive operations for agent creation, dispatch, migration and remote control. Agents can access server resources using a proxy based access control mechanism. We describe a scheme for agent migration based on the composition of some basic migration patterns which incorporate exception handling mechanisms. Finally, we present two agent based distributed applications implemented using the Ajanta system. One is a middleware which supports file sharing over the Internet and the other is a distributed calendar manager View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stateful group communication services

    Page(s): 82 - 89
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (104 KB)  

    Reliable group multicasts provide a nice abstraction for communicating data reliably among group members and have been used for a variety of applications. In this paper we present Corona, a group communication service for building collaboration tools and reliable data dissemination services in Web-based environments, where clients connect independently of other clients and are not necessarily connected to the group multicast services all the time. The key features of Corona are: (1) the shared state of a group consists of a set of objects shared collectively among group members; (2) Corona supports multiple state transfer policies to accommodate clients with different needs and resources; (3) the communication service provides the current group state or state updates to new clients even when other clients are not available; (4) the service supports persistent groups that tolerate client failures and leaves. We show that the overhead incurred by the multicast service in managing each group's shared state has little impact on the latency seen by the clients or the server throughput. We also show that the multicast service does not have to be aware of the client-specific semantics of the objects in the group's state View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A dynamic object replication and migration protocol for an Internet hosting service

    Page(s): 101 - 113
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB)  

    The paper proposes a protocol suite for dynamic replication and migration of Internet objects. It consists of an algorithm for deciding on the number and location of object replicas and an algorithm for distributing requests among currently available replicas. Our approach attempts to place replicas in the vicinity of a majority of requests, while ensuring at the same time that no servers are overloaded. The request distribution algorithm uses the same simple mechanism to take into account both server proximity and load, without actually knowing the latter. The replica placement algorithm executes autonomously on each node, without the knowledge of other object replicas in the system. The proposed algorithms rely on the information available in databases maintained by Internet routers. A simulation study using synthetic workloads and the network backbone of UUNET, one of the largest Internet service providers, shows that the proposed protocol is effective in eliminating hot spots and achieves a significant reduction in backbone traffic and server response time at the expense of creating only a small number of extra replicas View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using hysteresis to reduce adaptation cost of a dynamic quorum assignment

    Page(s): 114 - 124
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB)  

    One classic technique for coordinating distributed computations is to require each processor to get permission for certain actions from some quorum of processors, such that every processor's quorum overlaps every other processor's quorum. A dynamic quorum assignment allows the processor-to-quorum mapping to adapt during system execution, to improve performance and availability when processors fail or change load. This work considers how the run-time quorum adaptation cost is impacted by the selection of a quorum mapping function. An effective quorum mapping function exhibits not only desirable quorum size and load properties, but also a type of hysteresis that minimizes the changes made to the processor-to-quorum mapping whenever the mapping is recomputed. A new quorum mapping function called MEMRING is given that exhibits hysteresis by identifying quorums that are similar to previously chosen quorums. This behavior reduces the needed number of modifications to dynamic quorum assignment data structures, and can consequently reduce the amount of interprocessor communication needed for distributed control of quorum adaptation. The expected cost of distributed quorum adaptation using MEMRING is shown to be less than the expected cost of using other quorum mapping functions that have similar quorum size and load properties View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trust vs. threats: recovery and survival in electronic commerce

    Page(s): 126 - 133
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (76 KB)  

    The paper analyzes threats and attacks in the Internet commerce world and suggests schemes to detect the attacks when they occur, prevent further loss once an attack is detected, and provides remedial corrective actions so as to enable victims of commerce-related attacks to resume conducting business transactions. Some commerce based transaction recovery mechanisms are suggested to recover from losses these attacks may have caused. Suitable cryptographic primitives and protocols that realize an anonymous complaint and an anonymous receipt are developed to provide trust and security against e-commerce related attacks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PASS-a service for efficient large scale dissemination of time varying data using CORBA

    Page(s): 496 - 506
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (132 KB)  

    A common class of wide-area distributed applications remotely collect time-varying data and send it to consumers around the network. Some examples of these include network management, stock ticker data and event logs. The environment in which these applications must operate often dictates the schemes for disseminating the data between the writers and the readers. If the transport channel can be optimized to match the application's behavior patterns and the network resource constraints, sufficient improvements in application-level quality of service (QoS) can be achieved. The PASS (Piecewise Asynchronous Sample Service) system addresses this problem by using a flexible system of interconnected servers. PASS servers are distributed geographically around the network and are connected to readers and writers using the CORBA protocol. The forwarding policies used by the servers and the server interconnections can be customized for each application. Thus, PASS acts like an application-level multicast service with variable forwarding policies. PASS has been used to disseminate the up/down status of a large number of devices to a network management system. The PASS forwarding policy used very little network bandwidth while responding to failures in half a network round-trip time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Redirection algorithms for load sharing in distributed Web-server systems

    Page(s): 528 - 535
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (96 KB)  

    Replication of information among multiple World Wide Web servers is necessary to support high request rates to popular Web sites. A clustered Web server organization is preferable to multiple independent mirrored servers because it maintains a single interface to the users and has the potential to be more scalable, fault-tolerant and better load-balanced. In this paper, we propose a Web cluster architecture in which the Domain Name System (DNS) server, which dispatches the user requests among the servers through the URL name to the IP address mapping mechanism, is integrated with a redirection request mechanism based on HTTP. This should alleviate the side-effect of caching the IP address mapping at intermediate name servers. We compare many alternative mechanisms, including synchronous vs. asynchronous activation and centralized vs. distributed decisions on redirection. Moreover, we analyze the reassignment of entire domains or individual client requests, different types of status information and different server selection policies for redirecting requests. Our results show that the combination of centralized and distributed dispatching policies allows the Web server cluster to handle high load skews in the WWW environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On classes of problems in asynchronous distributed systems with process crashes

    Page(s): 470 - 477
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (172 KB)  

    This paper is on classes of problems encountered in asynchronous distributed systems in which processes can crash but links are reliable. The hardness of a problem is defined with respect to the difficulty to solve it despite failures: a problem is easy if it can be solved in presence of failures, otherwise it is hard. Three classes of problems are defined: F, NF and NFC. F is the class of easy problems, namely, those that can be solved in presence of failures (e.g., reliable broadcast). The class NF includes harder problems, namely, the ones that can be solved in a non-faulty system (e.g., consensus). The class NFC (NF-complete) is a subset of NF that includes the problems that are the most difficult to solve in presence of failures. It is shown that the terminating reliable broadcast problem, the non-blocking atomic commitment problem and the construction of a perfect failure detector (problem P) are equivalent problems and belong to NFC. Moreover the consensus problem is not in NFC. The paper presents a general reduction protocol that reduces any problem of NF to P. This shows that P is a problem that lies at the core of distributed fault-tolerance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Advanced Communication Toolkit for implementing the Broker pattern

    Page(s): 458 - 467
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (132 KB)  

    The Broker pattern is a powerful solution when building middleware communication systems. Existing toolkits, such as BAST, GTS, and ACE, although useful, are insufficient to implement the Broker pattern architecture. These systems concentrate on wrappers for communication protocols, and on implementing auxiliary communication patterns, but address only some aspects of object communication. In this work we demonstrate how the Broker pattern can be easily implemented by using an Advanced Communication Toolkit (ACT). ACT model defines four layers according to the increasing degree of abstraction of exchanged information. The resulting systems are highly customizable, extensible, portable, and can communicate at any of the four layers independently. ACT supports various high-level communication protocols (e.g., HTTP, IIOP, SMTP) and can be used to implement Broker-based systems such as OMG CORBA, Java RMI, or Microsoft DCOM View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Causally ordered multicast: the conservative approach

    Page(s): 36 - 44
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (124 KB)  

    Process group toolkits provide methods to structure a system as a set of groups of cooperating processes, to detect process failures, and to order events (by ordering messages). Such tools have a performance cost for applications, particularly when a system is built using a large number of overlapping groups. We built an event-driven simulation to study performance of causally ordered message delivery in large systems composed of overlapping groups. Our studies, the first ever of multiple group systems, reveal some conditions under which the delays can be very large: two orders of magnitude greater than when delays are not required. Further, in a large system these delays can lead to increased system burstiness which limits system scalability. These results suggest that a system supporting multiple overlapping groups needs to be carefully designed and the system should often provide users with control over when to apply ordering guarantees View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing message overhead in TMR systems

    Page(s): 45 - 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (116 KB)  

    Traditional TMR protocols assume either single, reliable voters for each triple-modular redundant unit (TMRU) or triplicated voters (one for each processor) for each TMRU. In the first case a voter is a single point of failure for the system. In the second case, many physical messages must be sent across the communication network for each logical data item. We examine some protocols which attempt to maintain the functionality of the triplicated voter TMR protocol while reducing the number of physical messages required by one third. Possible solutions are examined to the many issues that result from this reduction in communication. Three different reduced-communication triple-modular redundant (RTMR) protocols are considered, each of which makes different assumptions about the nature of the underlying computation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load balancing and hot spot relief for hash routing among a collection of proxy caches

    Page(s): 536 - 543
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (716 KB)  

    Hash routing partitions the entire URL space among a collection of cooperating proxy caches. Each partition is assigned to a cache server. Duplication of cache contents is eliminated. Client requests to a cache server for non-assigned partition objects are forwarded to proper sibling caches. As a result, the load level of the cache servers can be quite unbalanced. We examine an adaptable controlled replication (ACR) of non-assigned partition objects in each cache server to reduce the load imbalance and relieve the problem of hot-spot references. Trace-driven simulations are conducted to study the effectiveness of ACR. The results show that: (1) access skew exists, and the load of the cache servers tends to be unbalanced in hash routing; (2) with a relatively small amount of ACR, say 10% of the cache size, significant improvements in load balance can be achieved; and (3) ACR provides a very effective remedy for load imbalance due to hot-spot references View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Search space reduction in QoS routing

    Page(s): 142 - 149
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB)  

    To provide real time service, integrated networks require the underlying routing algorithm to be able to find low cost paths that satisfy given Quality of Service (QoS) constraints. The problem of constrained shortest (least cost) path routing is known to be NP hard, and some heuristics have been proposed to find a near optimal solution. However, these heuristics either impose relationships among the link metrics to reduce the complexity of the problem which may limit the general applicability of the heuristic, or are too costly in terms of execution time to be applicable to large networks. We focus on solving the delay constrained minimum cost path problem, and present a fast algorithm to find a near optimal solution. This algorithm, called DCCR (Delay-Cost-Constrained Routing), is a variant of the k-shortest path algorithm. DCCR uses a new adaptive path weight function, together with an additional constraint imposed on the path cost, to restrict the search space. Thus, DCCR can return a near optimal solution in a very short time. Furthermore, we use the method proposed by D. Blokh and G. Gutin (1995) to further reduce the search space by using a tighter bound on path cost. This makes our algorithm more accurate and even faster. We call this improved algorithm SSR+DCCR (Search Space Reduction+DCCR). Through extensive simulations, we confirm that SSR+DCCR performs very well compared to the optimal but very expensive solution View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interpreting stale load information

    Page(s): 285 - 296
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (480 KB)  

    In this paper we examine the problem of balancing load in a large-scale distributed system when information about server loads may be stale. It is well known that sending each request to the machine with the apparent lowest load can behave badly in such systems, yet this technique is common in practice. Other systems use round-robin or random selection algorithms that entirely ignore load information or that only use a small subset of the load information. Rather than risk extremely bad performance on one hand or ignore the chance to use load information to improve performance on the other, we develop strategies that interpret load information based on its age. Through simulation, we examine several simple algorithms that use such load interpretation strategies under a range of workloads. Our experiments suggest that by properly interpreting load information, systems can (1) match the performance of the most aggressive algorithms when load information is fresh relative to the job arrival rate, (2) outperform the best of the other algorithms we examine by as much as 60% when information is moderately old, (3) significantly outperform random load distribution when information is older still, and (4) avoid pathological behavior even when information is extremely old View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A property-based clustering approach for the CORBA Trading Service

    Page(s): 517 - 525
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB)  

    The CORBA Trading Service is an object service advertiser for heterogeneous distributed computing environments. Current approaches for the design and implementation of such a CORBA service do not deal with some of the major problems of searching for service offers in large-scale distributed systems, namely performance and scalability problems. This paper proposes an appropriate approach for clustering service offers based on the service properties, in order to enhance the efficiency of the trading service. The proposed approach clusters service offers within a hierarchy of contexts by specialisation of property sets. Performance results of the proposed clustering approach are discussed, and the benefits of including clustering of properties with the CORBA Trading Service are shown View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mockingbird: flexible stub compilation from pairs of declarations

    Page(s): 393 - 402
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (104 KB)  

    Mockingbird is a prototype tool for developing interlanguage and distributed applications. It compiles stubs from pairs of interface declarations, allowing existing data types to be reused on both sides of every interface. Other multilanguage stub compilers impose data types on the application, complicating development. Mockingbird supports C/C++, Java, and CORBA IDL, and can be extended to other languages. Mockingbird can generate stubs that convert between types whose structural equivalence would be missed by other tools. We show that this kind of tool improves programming productivity, and describe, in detail, Mockingbird's design and implementation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and performance evaluation of a Java-based multicast browser tool

    Page(s): 314 - 322
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (108 KB)  

    This paper presents a case study in the use of reliable multicasting in Web-based multi-party applications. To carry out this study, we have designed and implemented WEBCLASS, a multicast browser tool written in Java. In WEBCLASS, all the actions of a “master” Web browser are mimicked on a set of client browsers. Monitoring of the master browser is performed by a set of threads, which use a reliable multicast protocol to disseminate state information and Web resources to programs that control the client browsers. The architecture and operation of the main components of WEBCLASS are described, and experimental results of a performance study are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast and fair mutual exclusion for shared memory systems

    Page(s): 224 - 231
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (240 KB)  

    Two fast mutual exclusion algorithms using read-modify-write and atomic read/write registers are presented. The first one uses both compare&swap and fetch&store; the second uses only fetch&store. Fetch&store are more commonly available than compare&swap. It is impossible to obtain better algorithms if “time” is measured by counting remote memory references. We were able to maintain the same level of performance with or without the support of compare&swap. However, fairness is degraded from 1-bounded bypass to lockout freedom without the support View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Initial synchronization of TDMA communication in distributed real-time systems

    Page(s): 370 - 379
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB)  

    This paper discusses the startup phase of a TDMA protocol intended for safety-critical real-time systems using a broadcast bus. The protocol contains sender id in each message, and nodes send messages of equal size in a fixed order. A single channel media is used and data and synchronization information must therefore share the same channel. Synchronization is challenging, since clocks must be synchronized to guarantee successful transmissions, and successful transmissions rely on synchronized clocks. We describe three different start-up algorithms and evaluate them with respect to complexity, resynchronization time and performance in the presence of transient faults View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal dynamic location update for PCS networks

    Page(s): 134 - 141
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (180 KB)  

    The movement based dynamic location update scheme is studied. An analytical model is applied to formulate the costs of location update and paging in the movement based location update scheme. The problem of minimizing the total cost is formulated as an optimization problem that finds the optimal threshold in the movement based location update scheme. We prove that the total cost function is a convex function of the threshold. Based on the structure of the optimal solution, an efficient algorithm is proposed to find the optimal threshold directly. Furthermore, the proposed algorithm is applied to study the effects of changing important parameters of mobility and calling patterns numerically View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ETE: a customizable approach to measuring end-to-end response times and their components in distributed systems

    Page(s): 152 - 162
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1944 KB)  

    Detecting and resolving performance problems in distributed systems often requires measurements of end-to-end (“finger tip to eyeball”) response times. Existing approaches embed transaction definitions in instrumentation codes. As a result, service providers (e.g., ISPs) cannot tailor transaction definitions to the usage patterns of their customers. We propose a new approach-ETE (end-to-end)-in which transaction definitions are externalized so that they can be customized. This is accomplished by having instrumentation generate events (not transactions) and employing a separate component-the transaction generator-that uses external definitions of transactions to construct response time measurements from event streams. ETE provides measurements of both end-to-end response times and their components. The latter reflect delays for services within distributed systems (e.g., name resolution service). We have used ETE to measure response times for Web transactions, terminal emulators, and Lotus Notes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Progressive construction of consistent global checkpoints

    Page(s): 55 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (152 KB)  

    A checkpoint pattern is an abstraction of the computation performed by a distributed application. A progressive view of this abstraction is formed by a sequence of consistent global checkpoints that may have occurred in this order during the execution of the application. Considering pairs of checkpoints, we have determined that a checkpoint must be observed before another in a progressive view if the former Z-precedes the latter. Based on the Z-precedence and characteristics of the checkpoint pattern, we propose original algorithms for the progressive construction of consistent global checkpoints. We demonstrate that the Z-precedence between a pair of checkpoints is a much simpler way to express the existence of a zigzag path connecting them, and we discuss other advantages of our relation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Processing transactions over optimistic atomic broadcast protocols

    Page(s): 424 - 431
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (132 KB)  

    Atomic broadcast primitives allow fault-tolerant cooperation between sites in a distributed system. Unfortunately, the delay incurred before a message can be delivered makes it difficult to implement high performance, scalable applications on top of atomic broadcast primitives. A new approach has been proposed which, based on optimistic assumptions about the communication system, reduces the average delay for message delivery. We develop this idea further and present a replicated database architecture that employs the new atomic broadcast primitive in such a way that the coordination phase of the atomic broadcast is fully overlapped with the execution of transactions, providing high performance without relaxing transaction correctness View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.