By Topic

Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems

15-16 Dec. 1997

Filter Results

Displaying Results 1 - 25 of 38
  • Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems

    Publication Year: 1997
    Request permission for commercial reuse | PDF file iconPDF (199 KB)
    Freely Available from IEEE
  • Index of authors

    Publication Year: 1997, Page(s): 243
    Request permission for commercial reuse | PDF file iconPDF (62 KB)
    Freely Available from IEEE
  • Checkpointing in CosMiC: a user-level process migration environment

    Publication Year: 1997, Page(s):187 - 193
    Cited by:  Papers (9)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (560 KB)

    The CosMiC system is a user-level process migration environment. Process migration is defined as the mechanism to checkpoint the state of an unfinished process, transfer the state from one machine to another and resume process execution on the new machine. The main purposes of process migration are: (1) to utilize the CPU power and balance load on all machines in an environment; (2) to provide fau... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Formal verification of a TDMA protocol start-up mechanism

    Publication Year: 1997, Page(s):235 - 242
    Cited by:  Papers (19)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (668 KB)

    This paper presents a formal verification of the start-up algorithm of the DACAPO protocol. The protocol uses TDMA (Time Division Multiple Access) bus arbitration. It was verified that an ensemble of four communicating stations becomes synchronized and operational within a bounded time from an arbitrary initial state. The system model included a clock drift corresponding to ±10-3... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of a reliable real-time token-ring protocol

    Publication Year: 1997, Page(s):180 - 185
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (432 KB)

    This paper presents an analytical model using stochastic Petri nets for the performance of a priority based real-time protocol that uses data-link layer message logging for fast recovery in the event of station crashes. The advantage of using message logging at the data-link layer over traditional higher layer recovery mechanisms is demonstrated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An implementation of the FTAG model in concurrent ML

    Publication Year: 1997, Page(s):229 - 234
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (520 KB)

    Non-imperative programming models can simplify the development of fault-tolerant software, in part because of their potential for automatically generating concurrent implementations. This paper describes the design of a concurrent implementation of FTAG, a previously-described functional model for writing fault-tolerant software based on attribute grammars. The implementation involves translating ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An extended binary tree quorum strategy for K-mutual exclusion in distributed systems

    Publication Year: 1997, Page(s):110 - 115
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    In this paper, we propose two strategies called generalized binary tree quorum and extended binary tree quorum for k-mutual exclusion, which impose a logical structure on the network. Both of the proposed strategies are based on a logical binary tree structure. The quorum size constructed from both strategies is [1g2 n/k] in the best case and is [n+k/2k] in the worst case, where n is th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Behavior of a computer based interlocking system under transient hardware faults

    Publication Year: 1997, Page(s):174 - 179
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (604 KB)

    The paper addresses the safety analysis and evaluation of a hard real-time, interlocking, railway control system. The major objective is to demonstrate an efficient methodology capable of capturing crucial system dependability characteristics while allowing meaningful results to be obtained within a reasonable time. The evaluation is done by simulating the execution of the control software under t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The use of neurons with higher functionality to enhance the fault tolerance of neural networks

    Publication Year: 1997, Page(s):221 - 228
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (504 KB)

    So far we have proposed fault-tolerant design techniques of neural networks based on the property of conventional neurons (component elements of neural networks). If a higher functionality of each neuron is available rather than simple weighted sum of the inputs, we can design neural networks which tolerate the mixture of stuck-at-1 and stuck-at-0 faults without exploiting the triplication scheme View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On transaction liveness in replicated databases

    Publication Year: 1997, Page(s):104 - 109
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (520 KB)

    This paper makes a first attempt to give a precise characterisation of liveness in replicated database systems. We introduce the notion of liveness degrees, which express the expectation a database user might have about the termination of its transactions, despite concurrency and failures. Our liveness degrees are complementary to the traditional transactional safety degrees (e.g., serializability... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A system solution to reducing frequency of memory repairs

    Publication Year: 1997, Page(s):53 - 58
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (448 KB)

    Single symbol error correcting and double symbol error detecting (SSC-DSD) codes have been used in m-bit-per-chip computer memories for fault-tolerance and for savings in repair costs. In this paper, we present a solution to the reduction of memory repair actions for memories designed with SSC-DSD codes. We present a scheme that extends the basic SSC-DSD scheme to the data recovery of double symbo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability simulation of fault-tolerant software and systems

    Publication Year: 1997, Page(s):167 - 173
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (552 KB)

    Fault tolerance is a survival attribute of complex computer systems and software in their ability to deliver continuous service to their users in the presence of faults. Formulating an analytic model for dependability and performance evaluation of hardware/software fault tolerant architectures can be quite cumbersome. Also, in practice, isolating the effect of various parameters on a system, while... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault tolerant constructive algorithm for feedforward neural networks

    Publication Year: 1997, Page(s):215 - 220
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    In this paper, a constructive algorithm for fault tolerant feedforward neural network, called FTCA, is proposed. The algorithm starts with a network with a single hidden neuron, and a new hidden unit is added to the network whenever it fails to converge. Before inserting the new hidden neuron into the network, only the weights connecting the new hidden neuron to the other neurons are trained (i.e.... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant real-time scheduling using passive replicas

    Publication Year: 1997, Page(s):98 - 103
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    In hard real-time multiprocessor systems, it is necessary to have a fault-tolerant task execution scheme to meet the tasks' deadlines even in the presence of a processor failure. In this paper, we propose a delayed scheduling algorithm using a passive replica method. This scheme has relatively a small overhead for backup processes. For the purpose of high schedulability, we allow the primary copy ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Double and triple error detecting capability of Internet checksum and estimation of probability of undetectable error

    Publication Year: 1997, Page(s):47 - 52
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (436 KB)

    The Internet checksum is calculated by 16-bit one's complement arithmetic. The occurrence of two or more errors, however, may not be detected. In this manuscript, we formulate the checksum procedure as a nonlinear code. Part of distance distribution of the nonlinear code is calculated. By using the results, we derived lower and upper bounds on the probability of an undetectable error when the nonl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of a fault-tolerant microprocessor: a simulation approach

    Publication Year: 1997, Page(s):161 - 166
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB)

    This paper presents an approach for assessing the merits and the cost of incorporating processor-level error detection and recovery mechanisms. The approach is exemplified by implementing several fault-tolerant mechanisms into a 32-bit, MIPS R3000-compatible, RISC microprocessor and conducting simulation-based fault injection experiments. The mechanisms are triple modular redundancy (TMR), retry o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Increasing software reliability through rollback and on-line fault repair

    Publication Year: 1997, Page(s):208 - 213
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB)

    We propose a new paradigm for increasing the reliability of a software system by combing reactive and proactive approaches. The proposed approach employs rollback and restart for masking transient failure, and employs on-line software version charge to remove faults from the software. A model for reliability analysis of a system employing the proposed approach is presented. The analysis shows that... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault coverage estimation model for partially testable multichip modules

    Publication Year: 1997, Page(s):72 - 77
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (412 KB)

    This paper proposes a simple and efficient model for designers to estimate fault coverage for partially testable MCMs. This model relates fault coverage, test methodology, and the ratio and distribution of DFT dies (dies with design for testability features) in an MCM. Experimental results show that our model can efficiently predict the fault coverage of a partially testable MCM with less than 5% ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical modeling and dependability evaluation of distributed systems

    Publication Year: 1997, Page(s):91 - 96
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (492 KB)

    We propose a new dependability evaluation method for a distributed system where programs and data files are replicated and allocated to computing nodes of the system in a redundant manner. The proposed method consists of two-level hierarchical procedures. At the lower level, the behavior of system components is analyzed using Markov models, while at the upper level, the whole system is modeled by ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new class of t-error correcting/d-error detecting (d>t) and all unidirectional error detecting codes

    Publication Year: 1997, Page(s):41 - 46
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (428 KB)

    In this paper, a new class of t-error correcting/d-error detecting and all unidirectional error detecting (t-EC/d-ED/AUED) codes has been proposed. Compared to the published results, this scheme, in general, needs less or equal number of check bits. Further, both the encoding/decoding algorithms for this class of codes can be implemented with faster and simpler hardware. In case of ROM implementat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault handling mechanisms in the RETHER protocol

    Publication Year: 1997, Page(s):153 - 159
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (772 KB)

    RETHER is a software-driven token-passing protocol designed to provide bandwidth guarantee for real-time multimedia applications over off-the-shelf Ethernet hardware. To our knowledge, it is the first all-software and fully-implemented real-time protocol on top of commodity Ethernet hardware. Because token passing is used to regulate network accesses, node crashes and/or packer corruption may lead... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fault-tolerant embedded microcontroller testbed

    Publication Year: 1997, Page(s):7 - 14
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (640 KB)

    The paper presents a design approach for implementing a fault-tolerant embedded computing node based on the use of low-cost commodity microcontrollers. A combination of software and relatively simple external logic is used to implement fault-tolerance in a redundant set of microcontrollers. A node can be protected with different amounts of redundancy (duplex, triplex, hybrid) depending upon the ne... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time-lag duplexing-a fault tolerance technique for online transaction processing systems

    Publication Year: 1997, Page(s):202 - 207
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB)

    In this paper the concept of time-lag duplexing is proposed to achieve fault tolerance. Time-lag duplexing incorporates time and component redundancy to provide for transient errors both easy error recovery and tolerance against errors in common irredundant components. As a result, minimum performance and cost penalties are incurred. In this paper the fault detection and recovery algorithm using t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive system-level diagnosis and its application

    Publication Year: 1997, Page(s):66 - 71
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (508 KB)

    System-level diagnosis is known for decades, and many solid results have been provided for this problem. This paper presents a new adaptive diagnosis strategy which requires far fewer tests than previous methods. Since tests can be applied concurrently in this method, the total time span for fault location can be further reduced. Although the practical implications of system-level diagnosis is few... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Engineering oriented dependability evaluation: MEADEP and its applications

    Publication Year: 1997, Page(s):85 - 90
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (744 KB)

    MEADEP is a user-friendly dependability evaluation tool for measurement-based analysis of completing systems. Features of MEADEP include: a data processor for converting data in various formats to the MEADEP format, a statistical analysis module for graphical data presentation and parameter estimation, a graphical modeling interface for building reliability block diagrams and Markov chains, a libr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.