By Topic

Reliable Distributed Systems, 1989., Proceedings of the Eighth Symposium on

Date 10-12 Oct. 1989

Filter Results

Displaying Results 1 - 20 of 20
  • Proceedings of the Eighth Symposium on Reliable Distributed Systems (Cat. No.89CH2807-6)

    Publication Year: 1989
    Request permission for commercial reuse | PDF file iconPDF (136 KB)
    Freely Available from IEEE
  • Implementing fault-tolerant replicated objects using Psync

    Publication Year: 1989, Page(s):42 - 52
    Cited by:  Papers (15)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (844 KB)

    Psync is an IPC protocol that explicitly preserves the partial order of messages exchanged among a set of processes. A description is given of how Psync can be used to implement replicated objects in the presence of network and host failures. Unlike conventional algorithms that depend on an underlying mechanism that totally orders messages for implementing replicated objects, the authors' approach... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recovery-management in the RelaX distributed transaction layer

    Publication Year: 1989, Page(s):21 - 28
    Cited by:  Papers (9)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (784 KB)

    Transactions are especially valuable in distributed systems, since they isolate the programmer from the effects of both concurrency and failures. In implementing transactions at the system level, flexibility has to be introduced into the transaction concept. Specifically, the premature release of data objects has to be addressed. To assure recoverability, resulting dependencies between transaction... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bounded approximate reliability models for distributed systems

    Publication Year: 1989, Page(s):137 - 147
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (844 KB)

    A study is made of several methods for reducing complex fault tree models of fault-tolerant distributed systems. For each method the authors provide bounds on the estimate of unreliability that is obtained from the reduced model. They discuss methods for truncating the solution of a model expressed as a fault tree and then develop techniques that apply to the construction of the fault tree model. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Limits on scalability in gracefully degradable large-scale systems

    Publication Year: 1989, Page(s):148 - 157
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (632 KB)

    The authors present an analysis of the scalability of large-scale degradable homogeneous multiprocessors by assessing the limitations imposed by reliability considerations on the number of processors. They demonstrate that graceful degradation in large-scale systems is not scalable. An increase in the number of processors must be matched by a significant increase in the coverage factor in order to... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient kernel-level dependable multicast protocol for distributed systems

    Publication Year: 1989, Page(s):94 - 101
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (764 KB)

    Multicast communication in a distributed system connected by a local area network can increase parallelism, and it can also provide a greater functionality than one-to-one communication. In the authors' multicast protocol, the sender directs a message to a named group of receivers, which can be specified by function without requiring the sender to know the specific members of the group. Each host'... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Decentralized loopback reconfiguration of a bidirectional ring LAN

    Publication Year: 1989, Page(s):72 - 78
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (660 KB)

    A decentralized reconfiguration method for local networks of bidirectional ring architecture is presented. The method relies on a suitably designed two-mode medium access control protocol that is used uniformly in all network configurations. The reconfiguration transitions are performed autonomously by individual nodes, based purely on the local status of medium access control; no specialized topo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A low overhead checkpointing and rollback recovery scheme for distributed systems

    Publication Year: 1989, Page(s):12 - 20
    Cited by:  Papers (14)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (620 KB)

    A major obstacle in implementing a rollback recovery scheme for fault tolerance in a concurrent distributed system is the domino effect. A low overhead checkpointing scheme is proposed to prevent this effect. Each process saves its state periodically. The state-save synchronization among processes is implemented by bounding clock drifts. A communication protocol that assures that all saved states ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Consistent replicated transactions: a highly reliable program execution environment

    Publication Year: 1989, Page(s):30 - 41
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (988 KB)

    A highly reliable program execution environment which enables user programs to tolerate underlying hardware failures is presented. The approach is to run multiple copies of the user programs at the same time. As long as one copy survives, the user program can be completed successfully. In the meantime, the user interacts with the replicated program as if it were a normal program. The authors call ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed processing test simulator (DPTS)

    Publication Year: 1989, Page(s):167 - 175
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (732 KB)

    DPTS (distributed processing test simulator) is a protocol and associated system-level software that result in a system in which the software can efficiently interact across processor boundaries, provide some measure of processor and/or communications fault tolerance, enhance cost effectiveness and reduce life-cycle cost, permit decreased dependency on system complexity, and eliminate concern over... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Testing reliable distributed applications through simulated events

    Publication Year: 1989, Page(s):160 - 166
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    There are many distributed applications that incorporate application-specific reliability algorithms that operate on top of general-purpose networking, operating system, and programming language facilities. The authors present a framework for application-level reliability testing suitable for a wide range of distributed applications using low-level events and the automatic generation of series of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ring network reliability-the probability that all operative nodes can communicate

    Publication Year: 1989, Page(s):64 - 71
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (556 KB)

    The authors consider the reliability of fail-soft ring networks using as a measure the probability that all operative nodes can communicate P[C]. Although computing P[C] is in general an NP-hard problem, it is shown that, for ring networks, closed-form expressions can be derived in terms of the failure probabilities of nodes, links, and switches (the configuration components used... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using checkpoints to localize the effects of faults in distributed systems

    Publication Year: 1989, Page(s):2 - 11
    Cited by:  Papers (7)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (836 KB)

    A checkpointing scheme can be used to ensure forward progress of a computation (program) even when failures occur. In a distributed system, many autonomous programs can execute concurrently and obtain services from a set of shared servers. In such a system, it is desirable to to restrict a checkpoint or rollback operation to a single program to localize the effects of failures, even when processes... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Availability analysis of the primary site approach for fault tolerance

    Publication Year: 1989, Page(s):130 - 136
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (428 KB)

    The primary site approach is often used to support fault tolerance against node failures. The authors present an analytic model to evaluate the availability of a system using the primary site approach. The effect of the number of replicas and the checkpoint interval were studied using the model. The authors found that the optimal checkpoint interval is proportional to the square root of the checkp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recovering from process failures in the time warp mechanism

    Publication Year: 1989, Page(s):53 - 61
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (712 KB)

    A recovery procedure for distributed systems using the time warp control mechanism is described. Time warp is an optimistic execution technique in which synchronization is achieved using rollback. The recovery procedure is a protocol that exploits the redundancy already available to implement process rollback in the time warp mechanism. Thus, the recovery protocol has little additional bookkeeping... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A token-based protocol for reliable, ordered multicast communication

    Publication Year: 1989, Page(s):84 - 93
    Cited by:  Papers (24)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (836 KB)

    A description is given of the token-passing multicast (TPM) protocol, a token-based protocol that provides reliable, ordered multicast communication for distributed process groups in the presence of failures and network partitions. The TPM protocol combines several positive features of other reliable multicast schemes into a single protocol, yet maintains a relatively simple structure and requires... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Workload analysis for performance study of distributed deadlock detection algorithms

    Publication Year: 1989, Page(s):104 - 111
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (576 KB)

    The authors present an approach to distributed workload analysis which can be used as a basis for the performance study of distributed deadlock detection algorithms. In particular, the expected number of times a deadlock detection algorithm is locally initiated and the subsequent number of remote invocations are derived. Simulation work was done to validate the approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Self-stabilization of the alternating-bit protocol

    Publication Year: 1989, Page(s):80 - 83
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (252 KB)

    The alternating-bit protocol is a fundamental protocol for transmitting data across an unreliable transmission medium. The reliability of the protocol depends on its initial state. The authors present a self-stabilizing version of the alternating-bit protocol, i.e. the system converges to a state that guarantees reliable data transmission regardless of its initial state. Applications of the protoc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing atomic rendezvous within a transactional framework

    Publication Year: 1989, Page(s):119 - 128
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (556 KB)

    The authors address the problem of implementing the CSP (communicating sequential processes) rendezvous within a transactional framework. Instead of implementing a fair nondeterministic choice and assuming the correct functioning of processors and communication media, the authors propose an efficient transactional implementation of the atomic rendezvous in the presence of processor failures in a m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance comparison of concurrency control protocols for transaction processing systems with regional locality

    Publication Year: 1989, Page(s):112 - 118
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (652 KB)

    An examination is made of a system structure and protocols to improve the performance and availability of a distributed transaction processing (TP) system when there is some regional locality of data reference. Several TP applications, such as reservation systems, insurance, and banking, belong to this category. While maintaining a distributed system at each region, a central system is introduced ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.