By Topic

Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems

15-16 Dec. 1997

Filter Results

Displaying Results 1 - 25 of 38
  • Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems

    Publication Year: 1997
    Request permission for commercial reuse | PDF file iconPDF (199 KB)
    Freely Available from IEEE
  • Index of authors

    Publication Year: 1997, Page(s): 243
    Request permission for commercial reuse | PDF file iconPDF (62 KB)
    Freely Available from IEEE
  • Checkpointing Message-Passing Interface (MPI) parallel programs

    Publication Year: 1997, Page(s):147 - 152
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (460 KB)

    Many scientific problems can be distributed on a large number of processes to take advantage of low cost workstations. In a parallel systems, a failure on any processor can halt the computation and requires restarting all applications. Checkpointing is a simple technique to recover the failed execution. Message Passing Interface (MPI) is a standard proposed for writing portable message-passing par... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant object on network-wide distributed object-oriented systems for future telecommunications applications

    Publication Year: 1997, Page(s):139 - 146
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (740 KB)

    This paper describes a fault-tolerant object using replication of objects in network-wide distributed object-oriented communications systems, and a mechanism for managing multiple objects that execute the target functions in the systems. This mechanism is located in the distributed processing platform that controls the execution of objects. The replication management mechanism combines fault detec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The adaptable distributed recovery block scheme and a modular implementation model

    Publication Year: 1997, Page(s):131 - 138
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (772 KB)

    The purpose of adaptive fault-tolerance (AFT) is to meet the dynamically and widely changing fault-tolerance requirement by efficiently and adaptively utilizing a limited and dynamically changing amount of available redundant processing resources. In this paper we present one concrete AFT scheme, named the adaptable distributed recovery block (ADRB) scheme, which is an extension of the Distributed... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Formal verification of a TDMA protocol start-up mechanism

    Publication Year: 1997, Page(s):235 - 242
    Cited by:  Papers (18)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (668 KB)

    This paper presents a formal verification of the start-up algorithm of the DACAPO protocol. The protocol uses TDMA (Time Division Multiple Access) bus arbitration. It was verified that an ensemble of four communicating stations becomes synchronized and operational within a bounded time from an arbitrary initial state. The system model included a clock drift corresponding to ±10-3... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Increasing software reliability through rollback and on-line fault repair

    Publication Year: 1997, Page(s):208 - 213
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB)

    We propose a new paradigm for increasing the reliability of a software system by combing reactive and proactive approaches. The proposed approach employs rollback and restart for masking transient failure, and employs on-line software version charge to remove faults from the software. A model for reliability analysis of a system employing the proposed approach is presented. The analysis shows that... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of a reliable real-time token-ring protocol

    Publication Year: 1997, Page(s):180 - 185
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (432 KB)

    This paper presents an analytical model using stochastic Petri nets for the performance of a priority based real-time protocol that uses data-link layer message logging for fast recovery in the event of station crashes. The advantage of using message logging at the data-link layer over traditional higher layer recovery mechanisms is demonstrated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High performance fault tolerant computer and its fault recovery

    Publication Year: 1997, Page(s):2 - 6
    Cited by:  Papers (1)  |  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (368 KB)

    The authors proposed a new architecture for an FTC called QPR (Quad Processor Redundancy) in which duplicated CPUs operate under a hardware lock step, and duplicated I/Os are managed by software. A dual system bus combines two duplicated areas. After recovery from a fault, it is necessary to resynchronize the system, so the contents of the main memory must be copied from the normal CPU to the othe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A cache error propagation model

    Publication Year: 1997, Page(s):15 - 21
    Cited by:  Papers (8)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (640 KB)

    Cache memory is a small, fast, memory system that holds frequently used data. With increasing processor speed, aggressive design practices increase the probability of fault occurrence and the presence of latent errors as the processor allows a short duration for read and write. The fault may corrupt the cache memory system or lead to an erroneous internal CPU state. The authors investigate error p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical modeling and dependability evaluation of distributed systems

    Publication Year: 1997, Page(s):91 - 96
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (492 KB)

    We propose a new dependability evaluation method for a distributed system where programs and data files are replicated and allocated to computing nodes of the system in a redundant manner. The proposed method consists of two-level hierarchical procedures. At the lower level, the behavior of system components is analyzed using Markov models, while at the upper level, the whole system is modeled by ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability modeling of structured systems: exploring symmetry in state-space generation

    Publication Year: 1997, Page(s):78 - 84
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (632 KB)

    A large number of systems are implemented using regular interconnected topologies. Markov analysis of such systems results in large state spaces. We explore symmetry, in particular rotational and permutational, of such systems to achieve a significant reduction in the size of the state space required to analyze them. The resulting much smaller state spaces allow analyses of very large systems. We ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized authenticated self-synchronizing Byzantine agreement protocols

    Publication Year: 1997, Page(s):122 - 129
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (696 KB)

    In order to make a dependable distributed computer system resilient to arbitrary failures of its processors, deterministic Byzantine agreement protocols (BAPs) can be applied. Many BAPs found in literature require that communication takes place in synchronized rounds of information exchange and require that all correct processors know the start of the BAP and start the protocol simultaneously It i... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive system-level diagnosis and its application

    Publication Year: 1997, Page(s):66 - 71
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (508 KB)

    System-level diagnosis is known for decades, and many solid results have been provided for this problem. This paper presents a new adaptive diagnosis strategy which requires far fewer tests than previous methods. Since tests can be applied concurrently in this method, the total time span for fault location can be further reduced. Although the practical implications of system-level diagnosis is few... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An implementation of the FTAG model in concurrent ML

    Publication Year: 1997, Page(s):229 - 234
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (520 KB)

    Non-imperative programming models can simplify the development of fault-tolerant software, in part because of their potential for automatically generating concurrent implementations. This paper describes the design of a concurrent implementation of FTAG, a previously-described functional model for writing fault-tolerant software based on attribute grammars. The implementation involves translating ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time-lag duplexing-a fault tolerance technique for online transaction processing systems

    Publication Year: 1997, Page(s):202 - 207
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB)

    In this paper the concept of time-lag duplexing is proposed to achieve fault tolerance. Time-lag duplexing incorporates time and component redundancy to provide for transient errors both easy error recovery and tolerance against errors in common irredundant components. As a result, minimum performance and cost penalties are incurred. In this paper the fault detection and recovery algorithm using t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An extended binary tree quorum strategy for K-mutual exclusion in distributed systems

    Publication Year: 1997, Page(s):110 - 115
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    In this paper, we propose two strategies called generalized binary tree quorum and extended binary tree quorum for k-mutual exclusion, which impose a logical structure on the network. Both of the proposed strategies are based on a logical binary tree structure. The quorum size constructed from both strategies is [1g2 n/k] in the best case and is [n+k/2k] in the worst case, where n is th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Behavior of a computer based interlocking system under transient hardware faults

    Publication Year: 1997, Page(s):174 - 179
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (604 KB)

    The paper addresses the safety analysis and evaluation of a hard real-time, interlocking, railway control system. The major objective is to demonstrate an efficient methodology capable of capturing crucial system dependability characteristics while allowing meaningful results to be obtained within a reasonable time. The evaluation is done by simulating the execution of the control software under t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A system solution to reducing frequency of memory repairs

    Publication Year: 1997, Page(s):53 - 58
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (448 KB)

    Single symbol error correcting and double symbol error detecting (SSC-DSD) codes have been used in m-bit-per-chip computer memories for fault-tolerance and for savings in repair costs. In this paper, we present a solution to the reduction of memory repair actions for memories designed with SSC-DSD codes. We present a scheme that extends the basic SSC-DSD scheme to the data recovery of double symbo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault handling mechanisms in the RETHER protocol

    Publication Year: 1997, Page(s):153 - 159
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (772 KB)

    RETHER is a software-driven token-passing protocol designed to provide bandwidth guarantee for real-time multimedia applications over off-the-shelf Ethernet hardware. To our knowledge, it is the first all-software and fully-implemented real-time protocol on top of commodity Ethernet hardware. Because token passing is used to regulate network accesses, node crashes and/or packer corruption may lead... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault tolerant constructive algorithm for feedforward neural networks

    Publication Year: 1997, Page(s):215 - 220
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    In this paper, a constructive algorithm for fault tolerant feedforward neural network, called FTCA, is proposed. The algorithm starts with a network with a single hidden neuron, and a new hidden unit is added to the network whenever it fails to converge. Before inserting the new hidden neuron into the network, only the weights connecting the new hidden neuron to the other neurons are trained (i.e.... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Checkpointing in CosMiC: a user-level process migration environment

    Publication Year: 1997, Page(s):187 - 193
    Cited by:  Papers (9)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (560 KB)

    The CosMiC system is a user-level process migration environment. Process migration is defined as the mechanism to checkpoint the state of an unfinished process, transfer the state from one machine to another and resume process execution on the new machine. The main purposes of process migration are: (1) to utilize the CPU power and balance load on all machines in an environment; (2) to provide fau... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fault-tolerant embedded microcontroller testbed

    Publication Year: 1997, Page(s):7 - 14
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (640 KB)

    The paper presents a design approach for implementing a fault-tolerant embedded computing node based on the use of low-cost commodity microcontrollers. A combination of software and relatively simple external logic is used to implement fault-tolerance in a redundant set of microcontrollers. A node can be protected with different amounts of redundancy (duplex, triplex, hybrid) depending upon the ne... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An embedded fail-safe interlocking system

    Publication Year: 1997, Page(s):22 - 27
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (612 KB)

    The paper presents a fail-safe railway interlocking system embedded in an Area Control Center (ACC) system. The host of the system is a TANDEM NONSTOP HIMALAYA K200 computer. The fault tolerant computer aims at high safety, reliability and availability. In addition, the dispatcher management system, device supervision system, and train control system are integrated in the host computer to ensure h... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant real-time scheduling using passive replicas

    Publication Year: 1997, Page(s):98 - 103
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    In hard real-time multiprocessor systems, it is necessary to have a fault-tolerant task execution scheme to meet the tasks' deadlines even in the presence of a processor failure. In this paper, we propose a delayed scheduling algorithm using a passive replica method. This scheme has relatively a small overhead for backup processes. For the purpose of high schedulability, we allow the primary copy ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.