By Topic

Reliable Distributed Systems, 1996. Proceedings., 15th Symposium on

Date 23-25 Oct. 1996

Filter Results

Displaying Results 1 - 25 of 25
  • Proceedings 15th Symposium on Reliable Distributed Systems [front matter]

    Publication Year: 1996
    Request permission for commercial reuse | PDF file iconPDF (180 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1996, Page(s): 229
    Request permission for commercial reuse | PDF file iconPDF (47 KB)
    Freely Available from IEEE
  • Fail-aware failure detectors

    Publication Year: 1996, Page(s):200 - 209
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1028 KB)

    In existing asynchronous distributed systems it is impossible to implement failure detectors which are perfect, i.e. they only suspect crashed processes and eventually suspect all crashed processes. Some recent research has however proposed that any “reasonable” failure detector for solving the election problem must be perfect. We address this problem by introducing two new classes of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fault-tolerant CORBA name server

    Publication Year: 1996, Page(s):188 - 197
    Cited by:  Papers (7)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1012 KB)

    OMG CORBA applications require a distributed naming service in order to install and to retrieve object references. High availability of the naming service is important since most CORBA applications need to access it at least once during their lifetime. Unfortunately, the OMG standards do not deal with availability issues; the naming services of many of the commercially available CORBA object reque... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Primary copy method and its modifications for database replication in distributed mobile computing environment

    Publication Year: 1996, Page(s):178 - 187
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (944 KB)

    Rapidly expanding cellular communication technology, wireless LANs and satellite services have made it possible for mobile users to access information anywhere and at any time. In a mobile computing environment replication might be considered as an essential technique providing reliability, throughput increase and data availability. This paper addresses the replica control protocols with an emphas... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation and performance of a stable-storage service in Unix

    Publication Year: 1996, Page(s):86 - 95
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (984 KB)

    This paper describes the design, implementation, and performance of a stable-storage service that has been implemented on top of the Unix operating system. This service allows servers to create, access, and delete persistent memory that survives server crashes. We describe its functionality and exported operations, discuss the experiences and performance of its implementation, and offer concrete e... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving the performance of coordinated checkpointers on networks of workstations using RAID techniques

    Publication Year: 1996, Page(s):76 - 85
    Cited by:  Papers (18)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (932 KB)

    Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration, coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept, there are several design decisions concerning the placement of checkpoint files that can impact the performance and functionality of coordinated checkpointers. Although several such checkp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A proposal for ensuring high availability of distributed multimedia applications

    Publication Year: 1996, Page(s):220 - 227
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (704 KB)

    Recent advances in computing, like high-speed networks and data-compression, make extensible distributed multimedia applications a challenging application-domain of distributed systems. Such applications like VoD (Video on Demand) or real-time conferencing are characterized by QoS (quality of service) requirements which depend on the quality of video and sound transmitted to the client and on the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic fault tolerance in DCMA-a dynamically configurable multicomputer architecture

    Publication Year: 1996, Page(s):22 - 31
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1732 KB)

    This paper introduces a new architecture for a fault-tolerant computer system which connects high-end PCs or workstations by a high-speed network. To achieve platform independence, coupling is based on the widely used PCI-bus. In contrast to commercially available fault-tolerant systems we strongly emphasize mechanisms for tolerating transient and intermittent faults. To keep hardware costs low th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimizing timestamp size for completely asynchronous optimistic recovery with minimal rollback

    Publication Year: 1996, Page(s):66 - 75
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (952 KB)

    Basing rollback recovery on optimistic message logging and replay avoids the need for synchronization between processes during failure-free execution. Some previous research has also attempted to reduce the need for synchronization during recovery, but these protocols have suffered from three problems: not eliminating all synchronization during recovery, not minimizing rollback, or providing these... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Locating more corruptions in a replicated file

    Publication Year: 1996, Page(s):168 - 177
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (804 KB)

    When a data file is replicated at more than one site, we are interested in detecting corruption by comparing the multiple copies. In order to reduce the amount of messaging for large files, techniques based on page signatures and combined signatures have been explored. However, for 3 or more sites, the known methods assume that the number of corrupted page copies to be at most [M/2]-1, where M is ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of a multistage interconnection network using binary decision diagrams (BDD)

    Publication Year: 1996, Page(s):34 - 43
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (580 KB)

    The authors use the BDD to help derive a closed-form solution for the reliability of a multistage interconnection network with n stages. The BDD reveals repeated structures, the reliability of which can be encoded in a recursive formula. An exact solution of a network with an arbitrary number of stages can be computed in time proportional to the number of stages. They also provide results which in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-line testing for application software of widely distributed system

    Publication Year: 1996, Page(s):54 - 63
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (792 KB)

    Widely distributed systems are constructed step-by-step over a long lime. These systems must permit on-line testing. On-line testing verifies newly added application software by receiving the real data in the real environment without disrupting the operating subsystems. To enable testing during system operation, an on-line test technique based on autonomous decentralized system structure was propo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Developing reliable applications on cluster systems

    Publication Year: 1996, Page(s):165 - 166
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (116 KB)

    A cluster is a group of computers which are loosely connected together to provide fast and reliable services. There have been many applications built on cluster systems such as distributed/parallel database applications, telecommunication systems and, recently, internet/intranet servers. Cluster systems can deliver similar or better performance and reliability than traditional mainframes, supercom... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical adaptive distributed system-level diagnosis applied for SNMP-based network fault management

    Publication Year: 1996, Page(s):98 - 107
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (880 KB)

    Fault management is a key functional area of network management systems, but currently deployed applications often implement rudimentary diagnosis mechanisms. This paper presents a new hierarchical adaptive distributed system-level diagnosis (Hi-ADSD) algorithm and its implementation based on SNMP (simple network management protocol). Hi-ADSD is a fully distributed algorithm that has diagnosis lat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting data-flow for fault-tolerance in a wide-area parallel system

    Publication Year: 1996, Page(s):2 - 11
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (972 KB)

    Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunately, most parallel processing systems have not been designed with fault-tolerance in mind. Mentat is a high-performance object-oriented parallel processing system that is based on an exte... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analyzing dynamic voting using Petri nets

    Publication Year: 1996, Page(s):44 - 53
    Cited by:  Papers (18)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (912 KB)

    Dynamic voting is considered a promising technique for achieving high availability in distributed systems with data replication. To date, stochastic analysis of dynamic voting algorithms is restricted to either site or link Markov models, but not both, possibly because of the difficulty in specifying the state-space which grows exponentially as the number of sites increases. Furthermore, to reduce... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ongoing fault diagnosis

    Publication Year: 1996, Page(s):108 - 117
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (780 KB)

    We consider a dynamic fault diagnosis problem: there are n processors, to be tested in a series of rounds. In every testing round we use a directed matching to have some processors report on the status (good or faulty) of other processors. Also, in each round up to t processors may break down, and we may direct that up to t processors are repaired. We show that it is possible to limit the number o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A transparent light-weight group service

    Publication Year: 1996, Page(s):130 - 139
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (852 KB)

    The virtual synchrony model for group communication has proven to be a powerful paradigm for building distributed applications. Implementations of virtual synchrony usually require the use of failure detectors and failure recovery protocols. In applications that require the use of a large number of groups, significant performance gains can be attained if these groups share the resources required t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A causal message ordering scheme for distributed embedded real-time systems

    Publication Year: 1996, Page(s):210 - 219
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (844 KB)

    In any distributed system, messages must be ordered according to their cause-and-effect relation to ensure correct behavior of the system. Causal ordering is also essential for services like atomic multicast and replication. In distributed real-time systems, not only must proper causal ordering be ensured, but message deadlines must be met as well. Previous algorithms which ensure such behavior in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Specialized N-modular redundant processors in large-scale distributed systems

    Publication Year: 1996, Page(s):12 - 21
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1012 KB)

    Computers are being used to achieve increasingly sophisticated control for large and complex systems. Many of these systems require a large shared state-space or database. Thus, handling real-time concurrent accesses to a shared database is an essential feature for modern fault-tolerant systems. Many fault-tolerant systems have been implemented for uniformly tolerating various types of failures, s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Strong and weak virtual synchrony in Horus

    Publication Year: 1996, Page(s):140 - 149
    Cited by:  Papers (21)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (876 KB)

    This paper presents two variants of virtual synchrony, which are supported by Horus. The first variant, called strong virtual synchrony, includes the property that every message is delivered within the view in which it is sent. This property is very useful in developing applications, since it helps in minimizing the amount of context information that needs to be sent on messages, and the amount of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Diagnosing crosstalk-faulty switches in photonic networks

    Publication Year: 1996, Page(s):118 - 127
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1008 KB)

    A procedure for diagnosing crosstalk and crosstalk-faulty switches in photonic dilated Benes networks (DBNs) is presented. It obtains the crosstalk ratios of each and every switch in an N×N DBN in 4N tests, along with O(N·log2N) calculations. One of its applications is to identify single or multiple switches in the DBN which are generating excessive crosstalk, or crosstalk-... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design of a CORBA group communication service

    Publication Year: 1996, Page(s):150 - 159
    Cited by:  Papers (43)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (960 KB)

    The common object request broker architecture (CORBA) is becoming a standard for distributed application middleware, and there are increasing needs for enriching the basic functionalities of CORBA. While mechanisms for persistence, transactions, event channels, etc. have been designed and specified for CORBA, no standard support is provided to handle object replication. In this paper we discuss th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Observations from 16 years at a fault-tolerant computer company

    Publication Year: 1996, Page(s):162 - 164
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (244 KB)

    Observations acquired from 16 years of experience working for a vendor of fault-tolerant computer systems are presented, along with two “war stories” that illustrate some of the principles View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.