By Topic

Fault-Tolerant Computing, 1988. FTCS-18, Digest of Papers., Eighteenth International Symposium on

Date 27-30 June 1988

Filter Results

Displaying Results 1 - 25 of 58
  • PODS revisited-a study of software failure behaviour

    Page(s): 2 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (467 KB)  

    A description is given of an empirical study of the failure characteristics of software defects detected in the programs developed in the Project on Diverse Software (PODS). The results are interpreted in the context of a state machine model of software failure. The results of the empirical study case doubts on the general validity of the assumption of constant software failure probability and the assumption of constant software failure probability and the assumption that all defects have similar failure rates. In addition, an analysis of failure dependency lends support to the use of diversity as a means of minimizing the impact of design-level faults. Here, nonidentical faults exhibited coincident failure characteristics approximately in accord with the independence assumption, and some of the observed positive and negative correlation effects could be explained by failure masking effects, which can be removed by suitable design.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A large scale second generation experiment in multi-version software: description and early results

    Page(s): 9 - 14
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (646 KB)  

    The second-generation experiment is a large-scale empirical study of the development and operation of multiversion software systems that has engaged researchers at five universities and three research institutes. The authors present the history and current status of this experiment. The primary objective for the second generation experiments is an examination of multiple-version reliability improvement. Experimentation concerns have been focused on the development of multiversion software (MVS) systems, primarily design and testing issues, and the modeling and analysis of these systems. A preliminary analysis of the multiple software versions has been performed and is reported.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • In search of effective diversity: a six-language study of fault-tolerant flight control software

    Page(s): 15 - 22
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (753 KB)  

    Multiversion software systems achieve fault tolerance through software redundancy and diversity. The authors investigated multiversion software systems using six different programming languages to create six versions of software for an automatic landing program. The rationale, preparation, execution, and evaluation of this investigation are reported.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A sequential circuit test generation using threshold-value simulation

    Page(s): 24 - 29
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (488 KB)  

    A simulation-based directed search approach for generating test vectors for combinational circuits has been proposed. In this method, the search for a test vector is guided by a cost function computed by the simulator. Event-driven simulation deals with circuit delays in a very natural manner. Signal controllability information required for the cost function was incorporated in a form of logic model called the threshold-value model. These concepts are now extended to meet the needs of sequential circuit test generation. Such extensions include handling of unknown values, analysis of feedback loops, and analysis of race conditions in the threshold-value model. A threshold-value sequential test generation program, TVSET, is implemented. It automatically initializes the circuit and generates race-free tests for synchronous and asynchronous circuits.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Advanced automatic test pattern generation and redundancy identification techniques

    Page(s): 30 - 35
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (485 KB)  

    Based on the sophisticated strategies used in the automatic test pattern generation system SOCRATES, the authors present several concepts aiming at a further improvement and acceleration of the deterministic test pattern generation and redundancy identification process. In particular, they describe an improved implication procedure and an improved unique sensitization procedure. Both procedures significantly advance the deterministic test pattern generation and redundancy identification especially for those faults, for which it is difficult to generate a test pattern or to prove them to be redundant, respectively. As a result of the application of the proposed techniques, SOCRATES is capable of successfully generating a test pattern for all testable faults in a set of combinational benchmark circuits, and of identifying all redundant faults with less than 10 backtrackings.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generating pattern sequences for the pseudo-exhaustive test of MOS-circuits

    Page(s): 36 - 41
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (471 KB)  

    A method based on linear feedback shift registers over finite fields is presented to generate for a natural number n a pattern sequence with minimal length detecting each m-multiple stuck-open faults for M> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Volatile logging in n-fault-tolerant distributed systems

    Page(s): 44 - 49
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (532 KB)  

    The authors introduce two enhancements to optimistic recovery which allow messages to be logged without performing any I/O to stable storage. The first permits messages to be instantaneously logged in volatile storage, as in the sender-based message logging technique of D.B. Johnson and W. Zwaenepoel (1987), but without their restriction of single-fault-tolerance. The second permits message data and/or message arrival orders not to be logged in circumstances where this information can be reconstructed in other ways. They show that the combination of these two optimizations yields a transparent n-fault-tolerant system which logs to stable storage only those messages received from the outside world and a very small number of additional messages.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Approaches to implementation of a repairable distributed recovery block scheme

    Page(s): 50 - 55
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (601 KB)  

    The authors previously proposed (1984) the basic concept of the distributed recovery block (DRB) scheme as an approach to uniform treatment of hardware and software faults in real-time applications. Design issues that arise in implementing the DRB scheme are discussed together with some promising approaches. Issues in extending the DRB scheme with the capability of reincorporating a repaired node without disrupting the real-time computing service are also discussed. An experimental implementation of the repairable DRB scheme into a real-time distributed computer system (DCS) testbed and subsequent measurement of the system performance demonstrated the fast forward recovery capability and the logical soundness of the scheme.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault tolerant concurrent C: a tool for writing fault tolerant distributed programs

    Page(s): 56 - 61
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (499 KB)  

    Concurrent C is a superset of C that provides parallel programming facilities. The authors' local area network (LAN) multiprocessor implementation has led them to explore the design and implementation of a fault-tolerant version of Concurrent C called FT Concurrent C. FT Concurrent C allows the programmer to replicate critical processes. A program continues to operate with full functionality as long as at least one of the copies of a replicated process is operational and accessible. As far as the user is concerned, interacting with a replicated process is the same as interactive with an ordinary process. FT Concurrent C also provides facilities for notification upon process termination, detecting processor failure during process interaction and automatically terminating orphan processes. The authors discuss the different approaches to fault tolerance, describe the considerations in the design of FT Concurrent C, and present a programming example.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computational complexity of controllability/observability problems for combinational circuits

    Page(s): 64 - 69
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (447 KB)  

    The computational complexity of fault detection problems and various controllability and observability problems for combinational logic circuits are analyzed. It is shown that the fault detection problem is still NP-complete even for monotone circuits limited in fanout, i.e. the number of signal lines which fanouts from a signal line is limited to three. It is also shown that the observability problem for unate circuits is NP-complete, but that the controllability problem for unate circuits can be solved in time complexity O(m), where m is the number of lines in a circuit. Furthermore, two classes of circuits, called k-binate-bounded circuits and k-bounded circuits, are introduced. For k-binate-bounded circuits, the controllability problem is solvable in polynomial time, and for k-bounded circuits, the fault detection problem is solvable in polynomial time, when k> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An iterative technique for calculating aliasing probability of linear feedback signature registers

    Page(s): 70 - 75
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (397 KB)  

    An iterative technique for computing the exact probability of aliasing for any linear feedback signature register (i.e. characterized by any feedback polynomial, for any constant probability of error, and for any test length) is proposed. The technique is also applicable to a more general model of the aliasing problem wherein the probability of error may vary with each output bit. The complexity of the technique enables registers of lengths of interest in practice, e.g. 16, to be analyzed readily.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RIDDLE: a foundation for test generation on a high level design description

    Page(s): 76 - 81
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (436 KB)  

    A formal approach is presented to the analysis of a VLSI design described at the high level, which produces information conducive to the acceleration of test-generation algorithms. This analysis yields information which can be used to reduce the amount of effort expended during backtracking, by guiding this process towards decisions (assignments) less likely to cause conflicts and minimizing the amount of work between backtracks. RIDDLE, an algorithm that performs this analysis in time that is linear in the number of signals, is introduced. Experimental results for the special case of combinatorial gate-level designs are also given.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of workload influence on dependability

    Page(s): 84 - 89
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (458 KB)  

    The authors consider a general, analytic approach to the study of workload effects on computer system dependability, where the faults considered are transient and the dependability measure in question is the time to failure, T/sub f/. Under these conditions, workload plays two roles with opposing effects: it can help detect/correct a correctable fault, or it can cause the system to fail by activating an uncorrectable fault. As a consequence, the overall influence of workload on T/sub f/ is difficult to evaluate intuitively. To examine this in more formal terms, the authors establish a Markov renewal process model that represents the interaction among workload and fault accumulation ins systems for which fault tolerance can be characterized by fault margins. Using this model, they consider some specific examples and show how the probabilistic nature of T/sub f/ can be formulated directly in terms of parameters regarding workload, fault arrivals, and fault margins.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability analysis of non repairable systems using stochastic Petri nets

    Page(s): 90 - 95
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (398 KB)  

    The algorithm presented is designed to compute large degradable system reliability. These systems are modeled using nonrepetitive stochastic Petri nets. The reliability parameters are computed during the generation of the reachability graph. A criterion is given that allows the computation to be stopped as soon as a given precision is obtained.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An open layered architecture for dependability analysis and its application

    Page(s): 96 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (453 KB)  

    The author presents a proposal for an open layered architecture for dependability analysis, corresponding to the respective levels of abstraction. The motivation for this reference architecture is to support structuring, reusability, and variability of methods and tools. Each of the seven layers is discussed in detail, and the correspondence with currently available tools for dependability analysis is shown by examples. To demonstrate the feasibility of the approach, the layered architecture is used as a basis for design and implementation of MARPLE, a tool for dependability analysis of distributed systems. MARPLE mainly concentrates on the application layer and the model-generation layer. It is embedded in a system-design environment and bridges the gap between the design tool and dependability analysis.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers [Front Cover]

    Save to Project icon | Request Permissions | PDF file iconPDF (86 KB)  
    Freely Available from IEEE
  • FIAT-fault injection based automated testing environment

    Page(s): 102 - 107
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (393 KB)  

    An automated real-time distributed accelerated fault injection environment (FIAT) is presented as an attempt to provide suitable tools for the validation process. The authors present the concepts and design, as well as the implementation and evaluation of the FIAT environment. As this system has been built, evaluated and is currently in use, an example of fault tolerant systems such as checkpointing and duplicate and match is used to show its usefulness.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On simulating faults in parallel

    Page(s): 110 - 115
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (415 KB)  

    Hardware engines (e.g. YSE and EVE) have been built to perform functional simulation of large designs over many patterns. The authors present a method of simulating faults in parallel that is applicable to these hardware simulation engines (and to software simulators with similar characteristics). A notion of independence between faults is used to determine the faults that can be simulated in parallel. An efficient algorithm is developed to determine the independent subsets of faults. Results of applying the algorithm to large examples are presented and shown to be very good by comparing them with theoretical lower bounds. This technique makes it feasible to fault simulate large networks using these hardware simulation engines.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accelerated fault simulation by propagating disjoint fault-sets

    Page(s): 116 - 121
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (426 KB)  

    The authors propose a novel fault simulation method (the compressive method), which extends the idea of fault propagation on which the deductive and concurrent method are based. A fault set is used as a unit of fault propagation; it is a set of faults which cause the same effect on the primary outputs for a given input pattern. Thus, faults in the set are propagated in a lump, just like an individual fault in the concurrent method, and fault propagation is accelerated in proportion to number of elements in a fault set. The compressive method introduces union operation on the fault sets. The operation dynamically gathers faults into a fault set so that they are propagated in a bit unit. Fault simulation using this method provides better performance than the concurrent method; simulation time is shortened by 50-83% and memory storage is reduced by 50-80% in simulating a combinational circuit.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A reconvergent fanout analysis for efficient exact fault simulation of combinational circuits

    Page(s): 122 - 127
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (403 KB)  

    An exact fault simulation can be achieved by simulating only the faults on reconvergent fanout stems, while determining the detectability of faults on other lines by critical path tracing within fanout-free regions. The authors have delimited, for every convergent fanout stem, a region of the circuit outside of which the stem fault does not have to be simulated. Lines on the boundary of such a stem region, called exit lines, have the following property: if the stem fault is detected at the line, and the line is critical with respect to a primary output, then the stem fault is detected at the primary output. Any fault-simulation technique can be used to simulate the stem fault within its stem region. The fault simulation complexity of a circuit is shown to be directly related to the number and size of stem regions in the circuit. Results obtained for the well-known benchmark circuits are presented.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault simulation in a multilevel environment: the MOZART approach

    Page(s): 128 - 133
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (479 KB)  

    MOZART is a concurrent fault simulator for large circuits described at the RT, functional, gate, and switch levels. Performance is gained by means of techniques aimed at the reduction of unnecessary activity. Two such techniques are levelized two-pass simulation, which minimizes the number of events and evaluations, and list event scheduling, which allows optimized processing of simultaneous (fraternal) events for concurrent machines. Both analytical and experimental evidence is provided for the effectiveness of the solutions adopted in MOZART. A performance metric is introduced for fault simulation that is based on comparison with the serial algorithm and is more accurate than those used up till now. Possible tradeoffs between the speeds of fault-free and fault simulations are discussed.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modelling the influence of unreliable software in distributed computer systems

    Page(s): 136 - 141
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (393 KB)  

    The author develops a system dependability model accounting for the influence of software faults. When one software system resides on a set of distributed computers or processing units, it is necessary to comprehend the logical linkage between system units. The model is based on the concept that a logical fault in a system will cause an inconsistent (erroneous) internal state in the processing unit where the fault is activated, and that this error can cause a failure of the processing unit, be corrected without causing a failure, or propagate to other cooperating processing units. The model also accounts for the logical hardware faults and the transient faults, which have a similar pattern of manifestations. Analysis of the stationary dependability characteristics of distributed systems is discussed. An approximate procedure which allows evaluation of fairly large systems is outlined. A brief example is presented which demonstrates that even a low error propagation may have large performance consequences.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dependability evaluation of software fault-tolerance

    Page(s): 142 - 177
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (564 KB)  

    The authors present a detailed reliability and safety analysis of the two major software fault-tolerance approaches, recovery blocks (RB) and n-version programming (NVP). The methodology used for the modeling is based on the identification of the possible types of faults introduced during the specification and the implementation, and on the analysis of the behavior following fault activation. The main outcome of the evaluation concerns the derivation of analytical results for identifying the improvement that can result from the use of RB and NVP and for revealing the most critical types of related faults. The study of nested RBs shows that the proposed analysis approach can be applied to such realistic software structures and when an alternate is itself a RB, the results are analogous to the case of the addition of a third alternate. The reliability analysis showed that an improvement has to be expected, but that this improvement would be very low. The study of the discarding of a failed version in NVP shows that this strategy is always worthwhile for safety, whereas, for reliability, it is only worthwhile when independent faults dominate.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Experimental evaluation of software reliability growth models

    Page(s): 148 - 153
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (429 KB)  

    An experimental evaluation is presented of SRGMs (software reliability growth models). The experimental data sets were collected from compiler construction projects completed by five university students. The SRGMs studied are the exponential model, the hyperexponential model, and S-shaped models. It is shown that the S-shaped models are superior to the exponential model in both the accuracy of estimation and the goodness of fit (as determined by the Kolmogorov-Smirnov test). It is also shown that it is possible to estimate accurately residual faults from a subset of the test results. An estimation method is proposed for the hyperexponential model. It is based on the observation that the start time for testing is different for different program modules. It is shown that this method improves the goodness of fit significantly.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A unified built-in-test scheme: UBIST

    Page(s): 157 - 163
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (705 KB)  

    An original BIST (built-in self-test) scheme is proposed to cover some shortcomings of self-checking circuits and to ensure all tests needed for integrated circuits. In the BIST scheme, self-checking techniques and built-in self-test techniques are combined in an original way and take advantage one from the other. This results in a unified BIST scheme (UBIST), allowing a high fault coverage for all tests needed for integrated circuits, e.g. offline test (design verification, manufacturing test, and maintenance test) and online concurrent error detection.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.