By Topic

Fault-Tolerant Computing, 1990. FTCS-20. Digest of Papers., 20th International Symposium

Date 26-28 June 1990

Filter Results

Displaying Results 1 - 25 of 59
  • Polynomial time solvable fault detection problems

    Publication Year: 1990, Page(s):56 - 63
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (644 KB)

    A class of combinational circuits, called the (k,K)-circuits is presented, and a polynomial-time algorithm to detect any single or multiple stuckfault in such circuits is introduced. The (k,K)-circuits are a generalization of H. Fujiwara's (1988) K-bounded circuits. The fault detection problem is formulated as an energy minimization problem using the bidirectional neural net model proposed earlier... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Three-valued neural networks for test generation

    Publication Year: 1990, Page(s):64 - 71
    Cited by:  Papers (8)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (360 KB)

    A three-valued (0, 1, and 1/2) neural network, which is an extension of the binary Hopfield model, is proposed, and it is shown that the test generation problem can be solved by the three-valued model more effectively than by the binary model. In the three-valued model, the energy function of networks, hyperplanes of neurons, and update rules of neuron states are extended so that the third value, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CATCH-compiler-assisted techniques for checkpointing

    Publication Year: 1990, Page(s):74 - 81
    Cited by:  Papers (39)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (529 KB)

    A compiler-based approach to generating efficient checkpoints for process recovery is described. The presented approach to checkpointing is programmer, operating system, and hardware transparent. Compile-time information is exploited to maintain the desired checkpoint interval and to reduce the size of checkpoints. Compiler-generated sparse potential checkpoint code is used to maintain the desired... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cache-aided rollback error recovery (CARER) algorithm for shared-memory multiprocessor systems

    Publication Year: 1990, Page(s):82 - 88
    Cited by:  Papers (28)  |  Patents (23)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (494 KB)

    Three cache-aided error-recovery algorithms for use in shared-memory multiprocessor systems are presented. They rely on hardware and specially designed cache memory for all their soft error management operations and can be easily incorporated into existing cache-coherence protocols. An example illustrating their use in a multiprocessor system employing Dragon as its cache-coherence protocol is giv... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cache management in a tightly coupled fault tolerant multiprocessor

    Publication Year: 1990, Page(s):89 - 96
    Cited by:  Papers (16)  |  Patents (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (589 KB)

    Some aspects of a fault-tolerant tightly coupled multiprocessor architecture are presented. The originality of this architecture resides in the use of a stable transactional memory shared by all processors. To ensure fault tolerance, each update of a memory block is included into an atomic transaction managed by the stable transactional memory. All the blocks that are part of a transaction are wri... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Checkpointing and rollback-recovery in distributed object based systems

    Publication Year: 1990, Page(s):97 - 104
    Cited by:  Papers (12)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (645 KB)

    Checkpointing and rollback-recovery algorithms in distributed object-based systems are presented. By utilizing the structure of objects and operation invocations, the authors have derived efficient algorithms that involve fewer participants than when invocations are treated as messages and existing algorithms for message-based systems are used. It is planned to implement these algorithms and evalu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and analysis of test schemes for algorithm-based fault tolerance

    Publication Year: 1990, Page(s):106 - 113
    Cited by:  Papers (18)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (762 KB)

    The design and analysis of test schemes for algorithm-based fault tolerance (ABFT) are examined. The problem is studied under the assumption that no bound is imposed on the size of a test. Upper and lower bounds are established on the number of tests needed to detect a given number of errors. These bounds are sharply different from those previously established under the bounded test size model. Th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fault-tolerant strategy for hierarchical control in distributed computing systems

    Publication Year: 1990, Page(s):290 - 297
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (620 KB)

    The authors describe a practical method for realizing fault-tolerant global control of resources in distributed computing systems. The method is particularly suitable for systems that are based on a centralized arbiter for making control decisions. Many applications in LAN-based computing, online transactions, and telecommunication systems fall into this category. The method exploits the inherent ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Digest of Papers. Fault-Tolerant Computing: 20th International Symposium [Front Cover]

    Publication Year: 1990
    Request permission for commercial reuse | PDF file iconPDF (415 KB)
    Freely Available from IEEE
  • Static allocation of process replicas in fault tolerant computing systems

    Publication Year: 1990, Page(s):298 - 306
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (607 KB)

    It is proved that there exist allocations that are optimal with respect to reliability. A simple transformation rule that derives an optimal allocation of replicated systems from an allocation of a given nonreplicated system is presented. This transformation preserves performance optimizing properties of the original allocation. Generally, replication gives a large number of processor links. A sec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel concurrent error detection scheme for FFT networks

    Publication Year: 1990, Page(s):114 - 121
    Cited by:  Papers (28)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (521 KB)

    A novel algorithm-based fault tolerance scheme is proposed for fast Fourier transform (FFT) networks. It is shown that the proposed scheme achieves 100% fault coverage theoretically. An accurate measure of the fault coverage for FFT networks is provided by taking the roundoff error into account. It is shown that the proposed scheme maintains the low hardware overhead and high throughput of J.Y. Jo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A dependence graph-based approach to the design of algorithm-based fault tolerant systems

    Publication Year: 1990, Page(s):122 - 129
    Cited by:  Papers (20)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (767 KB)

    A two-stage approach to the design of algorithm-based fault-tolerant (ABFT) systems is proposed. In the first stage a code is chosen to encode the data used in the algorithm. In the second stage the optimal architecture for implementing the scheme is chosen through the use of dependence graphs. Dependence graphs are a graph-theoretic form of algorithm representation. It is demonstrated that not al... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical design and analysis of fault-tolerant multiprocessor systems using concurrent error detection

    Publication Year: 1990, Page(s):130 - 137
    Cited by:  Papers (19)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (624 KB)

    A composition technique for building large fault-tolerant systems hierarchically using the concept of checks at different levels in the hierarchy is described. A small system of known fault detectability and locatability is replicated several times, and new checks are added at the next higher level. Such checks at different levels can be introduced into most of the existing multiprocessor systems.... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Loss-tolerance for electronic wallets

    Publication Year: 1990, Page(s):140 - 147
    Cited by:  Patents (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (632 KB)

    Assuming the existence of tamper-resistant devices with computational power and storage capacity similar to those of PCs and secure cryptosystems, the authors present loss tolerance schemes that leave the security, autonomy, and untraceability of the basic payment system that uses electronic wallets almost unchanged. These schemes are the distributed account list protocol and the marked standard v... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A formalism for monitoring real-time constraints at run-time

    Publication Year: 1990, Page(s):148 - 155
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (631 KB)

    A formalism is presented for specification and analysis of real-time constraints of systems at run time. Real-time logic (RTL) is employed to illustrate how timing properties can be specified elegantly in the form of annotation added to a program (or to a design specification). The algorithms for detecting a violation of a timing property at runtime, expressed in RTL, are presented.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Impact of reconfiguration logic on the optimization of defect-tolerant integrated circuits

    Publication Year: 1990, Page(s):158 - 165
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (516 KB)

    Two aspects of the impact of reconfiguration logic on the optimization of defect-tolerant integrated circuits (ICs) are analyzed. An important consequence to design decisions of neglecting reconfiguration logic is presented. Expressions are developed to predict the number of transistors necessary to implement the reconfiguration logic of a simple defect-tolerance strategy using CMOS technology. Th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault covers in reconfigurable PLAs

    Publication Year: 1990, Page(s):166 - 173
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB)

    Three kinds of faults are considered: stuck-at faults, bridging faults, and crosspoint faults. A new way of repairing bridging faults is introduced. It is shown that the problem of finding a minimum cover is NP-complete but that a special case of this problem can be formulated as a 2-SAT problem, which can be solved in polynomial time. The problem of finding a feasible cover for RPLAs (reconfigura... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Availability evaluation of MIN-connected multiprocessors using decomposition technique

    Publication Year: 1990, Page(s):176 - 183
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (664 KB)

    An analytical technique for the availability evaluation of multiprocessors using a multistage interconnection network (MIN) is presented. The MIN represents a Butterfly-type connection with a 4*4-switching element (SE). The novelty of this approach is that the complexity of constructing a single-level exact Markov chain (MC) is not required. By use of structural decomposition, the system is divide... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Estimates of MTTF and optimal number of spares of fault-tolerant processor arrays

    Publication Year: 1990, Page(s):184 - 191
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (597 KB)

    Reliability and mean-time-to-failure (MTTF) models of different fault-tolerant processor arrays (FTPAs) are introduced. On the basis of these models, approaches which allow for the analytical estimate of the necessary number of spares (NNS) and the optimal number of spares (ONS) are proposed. Knowledge of the NNS is suited to FTPAs where nonredundant hardware (hardware for which no redundancy is p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerance in the Advanced Automation System

    Publication Year: 1990, Page(s):6 - 17
    Cited by:  Papers (40)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1196 KB)

    The Advanced Automation System (AAS), a distributed real-time system intended to replace the present en-route and terminal approach US air traffic control computer systems over the next decade, is discussed. High availability of air traffic control services is an essential requirement of the system. The authors discuss the general approach to fault tolerance adopted in the AAS by reviewing some of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Anomaly detection for diagnosis

    Publication Year: 1990, Page(s):20 - 27
    Cited by:  Papers (21)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (662 KB)

    The author presents a method for detecting anomalous events in communication networks and other similarly characterized environments in which performance anomalies are indicative of failure. The methodology, based on automatically learning the difference between normal and abnormal behavior, has been implemented as part of an automated diagnosis system from which performance results are drawn and ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A software fault tolerance experiment for space applications

    Publication Year: 1990, Page(s):28 - 35
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (570 KB)

    The aim of the experiment described was to implement and assess fault-tolerant software within an industrial framework. Another significant aspect was to adapt the classical software engineering life cycle to this type of project. Two complementary techniques are considered: fault avoidance through the use of higher level language and strict development process; and fault tolerance by using techni... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An experience of a critical software development

    Publication Year: 1990, Page(s):36 - 45
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (543 KB)

    Some data about the design and validation of a safety critical software, the ESIN application software, are presented. The ESIN application software is integrated within an instrumentation system designed for experimental nuclear reactors. Its main function is to generate the emergency shutdown of the reactor. The development of this software has been based on a fault-avoidance approach: use of a ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identifying the cause of detected errors

    Publication Year: 1990, Page(s):48 - 55
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (545 KB)

    The author presents an approach to the consistent diagnosis of error monitoring observations in a distributed fault-tolerant computing system, even when the faulty source produces arbitrary errors. He describes the online algorithm used in the multicomputer architecture for fault tolerance (MAFT) to diagnose faulty system elements. By the use of syndrome information which categorizes detected erro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An analysis of a reconfigurable binary tree architecture based on multiple-level redundancy

    Publication Year: 1990, Page(s):192 - 199
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (520 KB)

    The analysis of a multiple-level redundant tree (MLRT) structure is presented for the design of a reconfigurable tree architecture. The MLRT scheme tolerates the catastrophic failure of several locally redundant modules in the corresponding locally redundant modular tree (LRMT) structure. This analysis and experimental study establishes the advantages of the MLRT structure over the LRMT structure.... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.