By Topic

Fault-Tolerant Computing, 1991. FTCS-21. Digest of Papers., Twenty-First International Symposium

Date 25-27 June 1991

Filter Results

Displaying Results 1 - 25 of 61
  • Software defects and their impact on system availability-a study of field failures in operating systems

    Publication Year: 1991 , Page(s): 2 - 9
    Cited by:  Papers (88)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (680 KB)  

    Defects reported between 1986 and 1989 in the MVS operating system are studied in order to gain the insight needed to provide a clear strategy for avoiding or tolerating them. Typical defects (regular) are compared to those that corrupt a program's memory (overlay), given that overlays are considered by field services to be particularly hard to find and fix. It is shown that the impact of an overl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Error/failure analysis using event logs from fault tolerant systems

    Publication Year: 1991 , Page(s): 10 - 17
    Cited by:  Papers (9)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (661 KB)  

    A methodology for the analysis of automatically generated event logs from fault tolerant systems is presented. The methodology is illustrated using event log data from three Tandem systems. Two are experimental systems, with nonstandard hardware and software components causing accelerated stresses and failures. Errors are identified on the basis of knowledge of the architectural and operational ch... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault tolerance testing in the Advanced Automation System

    Publication Year: 1991 , Page(s): 18 - 25
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (600 KB)  

    Fault tolerance testing of the US Federal Aviation Administration's Advanced Automation System (AAS) is discussed. The relationship to previous work is examined, and a high-level description of AAS and its fault tolerance architecture is given. The techniques and tools used to enable effective fault tolerance testing are presented. The results obtained to date from this testing effort are summariz... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerance experiments of the 'Hiten' onboard space computer

    Publication Year: 1991 , Page(s): 26 - 33
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (567 KB)  

    An interim report on the experimental fault-tolerance verification of the onboard space computer loaded on the artificial satellite Hiten is presented. The Hiten mission and the fault-tolerance technique, stepwise negotiating voting (SNV), on which the computer is based, are described. The intentional fault injection study, field data collection method, and observed faults and results computer beh... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TSUNAMI: a path oriented scheme for algebraic test generation

    Publication Year: 1991 , Page(s): 36 - 43
    Cited by:  Papers (15)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (629 KB)  

    An algorithm is presented for generating tests for single stuck line faults using a combination of algebraic processing and conventional path oriented search. Unlike conventional test generation algorithms, this algorithm uses algebraic methods to determine the complete set of input assignments that will propagate an error signal through a gate in a path to a primary output. The algorithm uses ord... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An architectural level test generator for a hierarchical design environment

    Publication Year: 1991 , Page(s): 44 - 51
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (694 KB)  

    Most state-of-the-art automatic test pattern generators (ATPGs) require a detailed gate level representation for the circuits under test, information that either does not exist or may not be available to the test engineers in a hierarchical design environment. An ATPG methodology working at an architectural level is proposed to exploit the hierarchy of the design and relieve the dependence on the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Test generation for synchronous sequential circuits using multiple observation times

    Publication Year: 1991 , Page(s): 52 - 59
    Cited by:  Papers (33)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (690 KB)  

    The test generation problem for synchronous sequential circuits is considered in the case where hardware reset is not available. The observations which form the motivation for the work are given. On the basis of the observations, the use of multiple fault free responses as well as multiple time units for fault detection is suggested. Application to gate level synchronous sequential circuits is the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Functional test generation for pipelined computer implementations

    Publication Year: 1991 , Page(s): 60 - 67
    Cited by:  Papers (4)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (605 KB)  

    An implementation-dependent functional testing methodology is developed for pipelined CPU implementations. The magnitude of pipeline design errors is established through the study of the design log of a commercial computer system. A model for determining the correctness of the execution of a machine language program is developed. The basis for functional pipeline test generation, the dependency gr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design decisions for the FTM: a general purpose fault tolerant machine

    Publication Year: 1991 , Page(s): 71 - 78
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (661 KB)  

    The main aspects of the FTM (fault tolerant machine) architecture, which has been built by combining stable transactional memory boards with processors of a standard machine, are reviewed, and the design principles are presented. The FTM design is based on GOTHIC, a fault-tolerant distributed system that relies on stable storage technology. A fast stable transactional memory (STM) board, which off... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Stratus architecture

    Publication Year: 1991 , Page(s): 79 - 85
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (496 KB)  

    An overview is given of the architecture of the Stratus fault-tolerant computer systems, which were the first to use hardware alone to provide fault tolerance in the commercial marketplace. The power subsystem, system boards, and off-board I/O interface buses are examined in some detail. Recovery scenarios and the Stratus service approach are described.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Signature analysis with modified linear feedback shift registers (M-LFSRs)

    Publication Year: 1991 , Page(s): 88 - 95
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (629 KB)  

    A signature analysis technique that uses modified linear feedback shift registers (LFSRs) is presented. It is demonstrated that the modified-LFSR-based analyzers can be designed with significantly lower aliasing probability compared to conventional LFSRs of the same size. The methodology for their design is described. Analytic expressions for their aliasing probability, hardware requirements, and ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Signature analysis and test scheduling for self-testable circuits

    Publication Year: 1991 , Page(s): 96 - 103
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (709 KB)  

    In complex circuits the test execution is usually divided into a number of subtasks, each producing a signature in a self-test register. These signatures influence one another. A model that can be used as a basis for test scheduling procedures is presented, and it is shown how test schedules can be constructed, in order to minimize the number of signatures to be evaluated. The error masking probab... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bounds on signature analysis aliasing for random testing

    Publication Year: 1991 , Page(s): 104 - 111
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (659 KB)  

    Simple bounds on the aliasing probability for serial signature analysis are presented. To motivate the study, it is shown that calculation of exact aliasing is NP-hard and that coding theory does not necessarily help. It is shown that the aliasing probability is bounded above by 2/(L+2) for test lengths L less than the period, L/sub c/, of the signature polynomials; for test lengths L that are mul... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Construction and analysis of fault-secure multiprocessor schedules

    Publication Year: 1991 , Page(s): 120 - 127
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (721 KB)  

    Issues involved in the design and analysis of fault-secure schedules for multiprocessor systems are investigated. A formal characterization of fault-security for a single fault is developed and generalized for multiple faults. The single fault characterization is used in the construction of fault-secure schedules for several classes of computation trees. The schemes produce schedules that are eith... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fault-tolerant FFT processor

    Publication Year: 1991 , Page(s): 128 - 135
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (523 KB)  

    A scheme for concurrent fault detection by recomputing and a fault-tolerant FFT processor using the scheme are proposed. An FFT processor with perfect shuffle is considered. The realization of the processor is based on a linear cellular automaton (LCA) model having the constant-weight and equidistance properties. When a fault occurs in the processor, the fault is detected concurrently and the proc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concurrent error detection and fault-tolerance in linear digital state variable systems

    Publication Year: 1991 , Page(s): 136 - 143
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (730 KB)  

    The problem of error detection and correction (both transient and permanent) in linear digital state variable systems, a very large class of circuits used in digital signal processing and control, is considered. The case of single faulty modules (adders, multipliers, shifters, etc.) is studied, and general circuit data flow graphs (with and without fanout) that realize linear digital state variabl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The IBM S/390 Sysplex Timer

    Publication Year: 1991 , Page(s): 144 - 151
    Cited by:  Patents (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (709 KB)  

    The IBM S/390 Sysplex Timer, a centralized fault-tolerant time reference used in maintaining time-of-day synchronism between multiple closely coupled IBM S/390's, is presented. The basic Sysplex Timer organization is quad redundant, and its packaging is duplex. A fully duplicated star interconnect topology provides redundant timer transmissions to every S/390 client system using dedicated fiber op... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bridging, transition, and stuck-open faults in self-testing CMOS checkers

    Publication Year: 1991 , Page(s): 154 - 161
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (826 KB)  

    The consequences of bridging, transition, and stuck-open faults in self-testing checkers designed only for single stuck-at faults are examined. A methodology for design that guarantees that the checkers will be self-testing in the presence of bridging, transition and stuck-open faults is established. This methodology is applied to several implementations of self-testing checkers. Simulations confi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple fault analysis using a fault dropping technique

    Publication Year: 1991 , Page(s): 162 - 169
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (654 KB)  

    A method for analyzing multiple faults in gate-level combinational circuits that does not explicitly enumerate all the multiple stuck-at faults that may be present in a circuit is presented. First, a fault collapsing phase is applied to the network, so that equivalent faults are eliminated. During the analysis, frontier faults where there is at least a normal path from each faulty line to a primar... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VLSI implementation of a self-checking self-exercising memory system

    Publication Year: 1991 , Page(s): 170 - 177
    Cited by:  Papers (4)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (669 KB)  

    A VLSI implementation of a design concept for a self-checking self-exercising (SCSE) memory system described by D. Rennels and S. Chau (see Proc. 16th Int. Symp. on Fault-Tolerant Computing p.358-63 (1986)) is presented. The design, which provides a way of detecting faults and correcting errors in RAMs within milliseconds while concurrently performing normal execution of programs, is reviewed. The... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The UCLA mirror processor: a building block for self-checking self-repairing computing nodes

    Publication Year: 1991 , Page(s): 178 - 185
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (689 KB)  

    The design and implementation of a RISC microprocessor, called the UCLA mirror processor, which is capable of micro rollback, are reported. Two mirror processors operating in lock step achieve concurrent error detection by comparing external signals and a signature of internal signals every clock cycle. A mismatch causes both processors to roll back to the beginning of the cycle in which the error... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load sharing in hypercube multicomputers in the presence of node failures

    Publication Year: 1991 , Page(s): 188 - 195
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (595 KB)  

    Two important issues associated with load sharing (LS) in hypercube multicomputers are discussed and analysed: (i) ordering fault-free nodes as preferred receivers of overflow tasks and (ii) developing an LS mechanism to handle node failures. The authors previously (1989) proposed to order the nodes in each node's proximity into its preferred list of receivers for the purpose of LS in distributed ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost effectiveness analysis of different fault tolerance strategies for hypercube systems

    Publication Year: 1991 , Page(s): 196 - 203
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (584 KB)  

    A general model of a multicomputer system is presented, and an index is calculated for measuring its performance. Since the index depends on the average internode distance, recursive expressions for the evaluation of this distance are derived. A cost-effectiveness figure of merit is defined on the basis of a performability model that uses the performance index. The figure of merit is used to compa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An evaluation of fault-tolerant hypercube architectures for onboard computing

    Publication Year: 1991 , Page(s): 204 - 211
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (712 KB)  

    Four hypercube architectures that are designed to use hardware resources more efficiently and that produce computers with high throughput and high reliability are evaluated. Spare nodes in three of the architectures are configured so that the entire computer has the topology of an incomplete hypercube. Here, the nodes of an incomplete hypercube are capable of providing different levels of fault de... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal algorithm for distributed system level diagnosis

    Publication Year: 1991 , Page(s): 214 - 221
    Cited by:  Papers (26)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (631 KB)  

    A system consisting of n identical processors connected by links in which some processors could be faulty is considered. Initially each unit knows only its own i.d. and the i.d.'s of its immediate neighbors; no unit has any global knowledge about the system. An optimal algorithm for system level diagnosis in such a system that is based on the transmission of packets by fault-free units is presente... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.