By Topic

Fault-Tolerant Computing, 1992. FTCS-22. Digest of Papers., Twenty-Second International Symposium on

Date 8-10 July 1992

Filter Results

Displaying Results 1 - 25 of 61
  • Failure mode assumptions and assumption coverage

    Page(s): 386 - 395
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (833 KB)  

    A method is proposed for the formal analysis of failure mode assumptions and for the evaluation of the dependability of systems whose design correctness is conditioned on the validity of such assumptions. Formal definitions are given for the types of errors that can affect items of service delivered by a system or component. Failure node assumptions are then formalized as assertions on the types of errors that a component may induce in its enclosing system. The concept of assumption coverage is introduced to relate the notion of partially-ordered assumption assertions to the quantification of system dependability. Assumption coverage is shown to be extremely important in systems requiring very high dependability. It is also shown that the need to increase system redundancy to accommodate more severe modes of component failure can sometimes result in a decrease in dependability.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wafer testing with pairwise comparisons

    Page(s): 374 - 383
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (716 KB)  

    A novel diagnosis scheme is proposed for wafer testing, in which the test access port of each die is utilized to perform comparison tests on its neighbors. A probabilistic diagnosis algorithm is presented, which correctly identifies almost all dies, even when the probability of failure of a die is larger than 0.5. The algorithm is shown to be particularly suitable for constant degree structures, such as rectangular and octagonal grids. The algorithm is designed for wafer scale structures, where the boundary dies do not have a complete regular structure. The algorithm also allows for the fault coverage of the tests to be imperfect. In addition, diagnosis is done locally. Both the test time and the diagnosis time are invariant with respect to the number of dies on the wafer. The algorithm can also tolerate some systematic errors. The dies are tested in parallel with this approach.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Chip test optimization using defect clustering information

    Page(s): 366 - 373
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (442 KB)  

    The authors recently proposed an adaptive wafer-probe testing approach which used defect clustering information on the wafer to optimize die test cost and quality. They show that this information can also be captured and used to optimize the testing of packaged chips. The effectiveness of the proposed approach is analyzed for negative binomial defect statistics. The results show that a three- to five-fold improvement in defect levels can be easily obtained for the same test costs. It is also possible to separate out chips with defect levels over an order of magnitude better than the best possible method without using defect clustering information.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new statistical approach for fault-tolerant VLSI systems

    Page(s): 356 - 365
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (499 KB)  

    A novel approach to the statistics of fault-tolerant VLSI systems is presented by compounding binomial distributions with a beta distribution. This technique was discovered in the analysis of fault-tolerant dynamic random-access memory (DRAM) chips. Manufacturing data supporting this method are shown and the application of the approach to standard fault-tolerance schemes is described. Special forms of these statistics for computer calculations are also discussed and examples are given.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault injection for the formal testing of fault tolerance

    Page(s): 345 - 354
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (875 KB)  

    The authors address the issue of the use of fault injection for explicitly removing design/implementation faults in fault tolerance algorithms and mechanisms. A formalism is introduced that represents the fault tolerance algorithms and mechanisms by means of a set of assertions. This formalism enables the execution tree to be presented, where each path from the root to a leaf of the tree is a well-defined formula. It provides a framework for the generation of a functional deterministic test for programs implementing complex fault tolerance algorithms and mechanisms. This methodology has been used to extend a debugging tool aimed at testing fault tolerance protocols developed by BULL France. It has been successfully applied to the injection of faults in the inter-replica protocol supporting the application-level fault tolerance features of the architecture of the ESPRIT-funded Delta-4 project. The results of these experiments are discussed and analyzed.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FERRARI: a tool for the validation of system dependability properties

    Page(s): 336 - 344
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (770 KB)  

    The authors present FERRARI, a fault and error automatic real-time injector, which can evaluate complex systems by emulating most hardware faults in software. The current version of FERRARI runs on SPARC workstations, in an Xwindow environment. The motivation, methodology, design, implementation, and evaluation of FERRARI are presented. The techniques used to emulate permanent faults and transient errors in software are described in detail. Experimental results are presented for several error detection techniques. They demonstrate the effectiveness of FERRARI in its role as a fault and error injector.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two software techniques for on-line error detection

    Page(s): 328 - 335
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (725 KB)  

    Two software-based techniques for online detection of control flow errors were evaluated by fault injection. One technique, called block signature self-checking (BSSC), checks the control flow between program blocks. The other, called error capturing instructions (ECIs), inserts ECIs in the program area, the data area, and the unused area of the memory. To demonstrate these techniques, a program has been developed which modifies the executable code for the MC6809E 8-b microprocessor. The error detection techniques were evaluated using two fault injection techniques: heavy-ion radiation from a californium-252 source and power supply disturbances. Combinations of the two error detection techniques were tested for three different workloads. A combination BSSC, ECIs, and a watchdog timer was also evaluated.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A study of the effects of transient fault injection into a 32-bit RISC with built-in watchdog

    Page(s): 316 - 325
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (949 KB)  

    An error-detecting 32-b reduced instruction set computer (RISC) designed in a 1.2- mu m CMOS technology with an on-chip watchdog using embedded signature monitoring is presented. It was evaluated through simulation-based fault injection, using a register level model written in VHDL (very high-speed IC (VHSIC) description language). A chip area increase of 4.7% was caused by the watchdog. Two application programs were executed to study workload dependencies. The insertion of watchdog instructions resulted in a memory overhead of between 13% and 25% as well as a performance overhead of between 9% and 19%. A total of 2779 faults were injected into the processor during execution of the application programs. Only 23% of these resulted in effective errors. A minimum detection coverage of 95% was reached for effective errors classified as control flow errors with a median latency of one clock cycle. Few effective data errors, between 22% and 50%, were detected.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Direct methods for synthesis of self-monitoring state machines

    Page(s): 306 - 315
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1012 KB)  

    The authors consider synthesis methods that yield state assignments with checking invariants amenable to signature monitoring. They describe an automated synthesis approach based on a novel, generalized monitor architecture and prove that, for any given finite-state machine (FSM), a special, methodology-consistent state assignment exists. The state assignment permits each state's reference signature to be extracted directly from the state code. This eliminates the need for explicit reference-signature storage and yields continuous monitoring with near zero error-detection latency at each state. A practical tool that implements these synthesis algorithms can, in 37 seconds, generate state assignments for all 41 MCNC synthesis benchmark FSMs. Layout overhead comparisons obtained with an FSM macro-cell CAD system show that this technique can require as little as 52.3% of traditional duplication's layout area.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On combining off-line BIST and on-line control flow checking

    Page(s): 298 - 305
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (659 KB)  

    If offline testing is complemented by online checks, in general some of the test hardware is only used either for online checking or for testing. A novel target structure for self-testable and self-monitoring controllers is presented, and a formal framework for the synthesis of self-monitoring controllers is established. It combines a method of monitoring the control flow with a self-test structure in such a way, that the self-test hardware is utilized for facilitating control flow checking. The corresponding design procedure considers this target structure while synthesizing the controller from a behavioral description, and thus minimizes hardware overheads. Existing approaches for designing concurrently checked controllers can be represented as special cases in the formal framework established. Experimental results are summarized.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A structural technique for fault-protection in asynchronous interfaces

    Page(s): 288 - 295
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (707 KB)  

    Asynchronous VLSI circuits are reactive to input stimuli, and can be vulnerable to transient faults at their inputs. The ability to tolerate such faults is crucial for interface circuits, or transducers. The author proposes a structural protection technique based on: (1) synthesizing a correct transducer circuit from its original specification made under the correct environment assumption, and (2) augmenting it by a structurally separate wrapping of a special protection logic. This logic consists of the implementation of a perfect environment image and a special adjudicator component. The perfect environment model provides aliasing for the signals coming from the real failure-prone environment. The function of the adjudicator in its minimal case is to enable synchronization between the inputs coming from the real environment and their aliases generated inside the transducer. This approach greatly simplifies syntheses as compared to the one emerging directly from Dill's conformance by D.L. Diel (1988).<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthesis of multi-level combinational circuits for complete robust path delay fault testability

    Page(s): 280 - 287
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (777 KB)  

    Several synthesis rules for obtaining a multilevel multioutput logic circuit with 100% hazard-free robust testability of path delay faults are explored. In the simplest of these rules, an irredundant two-level implementation of the logic function, which is not robustly testable, is modified to a three-level or a four-level completely robust testable implementation. Algebraic factorization is applied to the modified implementation to obtain a completely robust testable multi level circuit at a relatively low area overhead. This rule was found to make most of the considered Berkeley programmable logic array (PLA) benchmark realizations completely testable. For a small number of cases where this synthesis rule is only partially applicable, another rule is presented, which can guarantee complete testability, but at a slightly higher area overhead. Both these synthesis rules also ensure testability of all multiple stuck-at faults with an easily derivable test set. Four other heuristic synthesis rules that can aid in obtaining a completely robust testable circuit at a lower area overhead in some cases are also presented.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incorporating testability considerations in high-level synthesis

    Page(s): 272 - 279
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (668 KB)  

    The authors propose an algorithm for module and register binding which generates register-transfer-level (RTL) designs having low testability cost. They also present an algorithm for altering the register and module binding to reduce testability overhead in the final design. These algorithms were coded and several experiments were conducted to check their performance. The results of these experiments are described. The study shows that the designs produced by the method in almost all the cases have reduced testability overhead.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Removal of redundancy in logic circuits under classification of undetectable faults

    Page(s): 263 - 270
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (494 KB)  

    The authors describe a method for removing redundant elements using test pattern generation. Redundancy in combinational circuits can be identified from the existence of undetectable stuck-at faults. By classifying undetectable faults into three categories according to their properties obtained in the test pattern generation process, some redundant elements can be concurrently removed. The method produces an irredundant circuit efficiently by using these properties. An improved procedure for redundancy removal is outlined to reduce the number of repetitions of test generation performed in the process. Some environmental results for ISCAS 85 benchmark circuits are also shown.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Testing with correlated test vectors

    Page(s): 254 - 262
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (734 KB)  

    The authors present a new built-in self-test (BIST) strategy based on correlated test vectors produced by a weighted random test-pattern generator. It is demonstrated that use of correlated test vectors reduces greatly the hardware complexity of the generator without causing significant degradation in the test outcome. Performance evaluation of this BIST technique is carried out quantitatively via probabilistic methods, and experimentally through deterministic fault simulation. Correlated weighted test patterns using the proposed scheme were applied to the ISCAS 1985 benchmark circuits, and comparisons are made to test results obtained using such techniques as WARP and GSCAN.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient test generation algorithm based on search state dominance

    Page(s): 246 - 253
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (441 KB)  

    J. Giraldi and M.L. Bushnell (1990) proposed a new test generation method, called the EST (equivalent state hashing) algorithm, which can reduce the search space for test generation by using binary decision diagram fragments to detect previously encountered search states. The authors extend the concept of search state equivalence to that of search state dominance, and propose a new extended method, the DST (dominant state hashing) algorithm, based on the search state dominance. The DST algorithm can prune the search space more effectively than the EST algorithm. The benefits of DST are illustrated through examples of decision trees.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Finite state machine testing based on growth and disappearance faults

    Page(s): 238 - 245
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (699 KB)  

    An approach to generating functional test sequences. for synchronous sequential nonscan circuits is presented. The method is applicable when the functional description of the circuit can be obtained in the cubical form or a personality matrix. The faults are modeled as growth and disappearance faults in the cubical description of the irredundant combinational function of the finite state machine (FSM). Considering the combinational logic alone, test vectors for these faults are efficiently derived using a cube-based method developed for programmable logic arrays. Experimental results on MCNC synthesis benchmark FSMs and some ISCAS89 sequential circuits show that the approach can efficiently obtain functional test sequences which give very high coverage of stuck faults in specific implementations. The functional test sequences are implementation independent and can be obtained even when details of specific implementation are unavailable or unknown.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A divide-and-conquer approach to test generation for large synchronous sequential circuits

    Page(s): 230 - 237
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (744 KB)  

    A method for generating tests for synchronous sequential circuits that does not use the complete circuit description is proposed, to allow true divide-and-conquer to be applied to test generation. The method is based on the generation of primary input sequences such that for as many sequences generated by the adjacent subcircuits as possible, a test sequence for the subcircuit under consideration would result. Repeated application of test sequences is proposed as a means of increasing the probability of fault detection. Experimental results are presented to show that the method is applicable to subcircuits of large sequential circuits.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A posteriori agreement for fault-tolerant clock synchronization on broadcast networks

    Page(s): 527 - 536
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (766 KB)  

    The authors present a clock synchronization algorithm, a posteriori agreement, based on a new variant of the well-known convergence nonaveraging technique. Exploiting an obvious characteristic of broadcast networks, largely reduces the effect of message delivery delay variance. In consequence, the precision achieved by the algorithm is drastically improved. Accuracy preservation is near to optimal. The solution does not require the use of dedicated hardware.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scheduling with rollback constraints in high-level synthesis of self-recovering ASICs

    Page(s): 519 - 526
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (578 KB)  

    The authors develop software mechanisms for incorporating on-chip self-recovery-using checkpointing and rollback-during high-level synthesis. They propose an algorithm for rollback point insertion to minimize rollback overhead. It identifies good rollback points by successively eliminating clock cycle boundaries that are either expensive or violate the recovery time constraint. Only the minimum number of rollback points are inserted. A flexible synthesis methodology is presented in which rollback point insertion can precede, succeed, or be intertwined with scheduling. A novel edge-based scheduling algorithm is described that schedules edges to clock cycle boundaries, in addition to scheduling nodes to clock cycles. The system has been used to schedule flow graphs from the literature. Experimental results are presented.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved construction methods for error correcting constant weight codes

    Page(s): 510 - 517
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (578 KB)  

    Two construction methods for t error correcting constant weight codes are developed. Both of these methods are improvements over the existing codes. One construction is recursive, which is based on the observation that a 2t error correcting code can be built by concatenating two t error correcting codes. This results in the reduction of code word length for higher t values. The other construction method also builds a 2t error correcting code by augmenting the codes of smaller error correcting capabilities as check symbols. Existing construction methods are used to build these check symbols so that the length of the code word will be optimized.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient multiple undirectional byte error-detecting codes for computer memory systems

    Page(s): 502 - 509
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (424 KB)  

    A novel method of construction of more efficient codes, which can detect t unidirectional byte errors or all unidirectional bit errors (t-UbED/AUED), is presented. In this construction, the authors generalize and improve the t-UbED/AUEd codes proposed by L.A. Dunning et al. (1990) in such a way that two weight syndromes need not be protected from unidirectional errors when t>2. Thus, this construction is more efficient and can be applied to all multiple unidirectional byte error-detecting codes.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Single b-bit byte error correcting and double bit error detecting codes for high-speed memory systems

    Page(s): 494 - 501
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (389 KB)  

    The authors propose a novel design method for single b-bit byte error correcting and double bit error detecting code, called Sb EC-DED code, suitable for high-speed memory systems using byte organized RAM chips. This type of byte error control code is practical from the viewpoint of having less redundancy and stronger error control capability than the existing codes. A code design method using elements from a coset of a subfield under addition gives the practical Sb EC-DED code with 64 information bits and 4-bit byte length which has the same check-bit length, 12 bits, as that of the single byte error correcting code. This also has very high error detection capabilities of random double byte errors and of random triple bit errors.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unordered error-correcting codes and their applications

    Page(s): 486 - 493
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (590 KB)  

    The authors give efficient constructions for error correcting unordered codes, i.e., codes such that any two codewords are at a certain minimal distance apart and at the same time they are unordered. They present three constructions together with tables that compare their efficiencies. These codes are used for detecting a predetermined number of symmetric errors and for detecting all unidirectional errors. An application to parallel asynchronous communications is included.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A comparison of software defects in database management systems and operating systems

    Page(s): 475 - 484
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (919 KB)  

    An analysis of software defects reported at customer sites in two large IBM database management products, DB2 and IMS, is presented. The analysis considers several different error classification systems and compares the results to those of an earlier study of field defects in IBM's MVS operating system. The authors compare the error type, defect type, and error trigger distributions of the DB2, IMS, and MVS products; show that there may exist an asymptotic behavior in the error type distribution as a function of a defect type; and discuss the undefined state errors that dominate the error type distribution.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.