By Topic

Reliability, IEEE Transactions on

Issue 2 • Date June 1987

Filter Results

Displaying Results 1 - 25 of 43
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (763 KB)  
    Freely Available from IEEE
  • IEEE Reliability Society

    Page(s): nil1
    Save to Project icon | Request Permissions | PDF file iconPDF (191 KB)  
    Freely Available from IEEE
  • [Breaker page]

    Save to Project icon | Request Permissions | PDF file iconPDF (191 KB)  
    Freely Available from IEEE
  • On Fault-Tolerance and Fault-Avoidance

    Page(s): 161
    Save to Project icon | Request Permissions | PDF file iconPDF (192 KB)  
    Freely Available from IEEE
  • Fault-Tolerant Computing Guest Editor's Preface

    Page(s): 162 - 163
    Save to Project icon | Request Permissions | PDF file iconPDF (1118 KB)  
    Freely Available from IEEE
  • Characterization of Fault Recovery through Fault Injection on FTMP

    Page(s): 164 - 170
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1264 KB)  

    The statistical methods used to collect and analyze fault-recovery data affect directly the credibility of reliability estimation. To provide data on which to base the development of sampling methods and parameter estimation techniques, pin-level fault-injection was conducted on the FTMP computer. Detection time was chosen for statistical analysis because it accounted for most of the variation in total recovery time. Stuck-at-zero, stuck-at-one, and inverted faults were injected on each of six pins, yielding 18 data sets. The data sets fell into groups of detection behavior; however, none of the factors that were varied in the experiment¿fault type, pin, chip, or board¿acounted for the groupings. While no single distribution was shown to be the best fit to all the data sets, of greater importance is that the exponential distribution was a bad fit to all data sets. This refutes a common assumption of reliability modeling that detection times are exponentially distributed. These results suggest that stratified random sampling methods and statistically robust parameter estimation techniques are required to characterize fault detection time. Further experimentation is planned to discover the sources of the variation in detection time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Correlated Failures in Fault-Tolerant Computers

    Page(s): 171 - 175
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (951 KB)  

    In two repairable ground-based fault-tolerant computer systems in which constraints on switchover time permitted manual switching as a back-up the correlated failures were an important cause of system outage. In one of the systems a distinction could be made between outages that occurred when one computer was undergoing scheduled maintenance and outages that occurred while one computer was being repaired. The failure rate of the active computer was at least four times higher in the latter case. Several possible causes are described but could not be confirmed from the available data. In some situations, correlated failures call for a reliability model different than the commonly described models for imperfect coverage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of Typical Fault-Tolerant Architectures using HARP

    Page(s): 176 - 185
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1868 KB)  

    HARP (the Hybrid Automated Reliability Predictor) is a software package that implements advanced reliability modeling techniques. We present an overview of some of the problems that arise in modeling highly reliable fault-tolerant systems; the overview is loosely divided into model construction and model solution problems. We then describe the HARP approach to these difficulties, which is facilitated by a technique called behavioral decomposition. The bulk of this paper presents examples of the dependability evaluation of some typical fault-tolerant systems, including a local-area network, two well-known fault-tolerant computer systems (C.mmp and SIFT), and an example of a flight control system. HARP has been used to solve very large models. A system consisting of 20 components distributed among 7 stages produced a Markov chain with 24 533 states and over 335 000 transitions (without coverage). Depending on the system used to run this example, the run time took anywhere from 4 to 8 hours. HARP is undergoing beta testing at approximately 20 sites. It is written in standard FORTRAN 77, consists of nearly 30000 lines of code and comments, and has been tested under several operating systems. The graphics interface (written in C) runs on an IBM PC AT, and produces text files that can be used to solve the system on the PC (for very small systems), or can be uploaded to a larger machine. HARP is accompanied by an Introduction and Guide for Users. For information on obtaining a copy of HARP, contact one of the authors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability Modeling Using SHARPE

    Page(s): 186 - 193
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1520 KB)  

    Combinatorial models such as fault trees and reliability block diagrams are efficient for model specification and often efficient in their evaluation. But it is difficult, if not impossible, to allow for dependencies (such as repair dependency and near-coincident-fault type dependency), transient and intermittent faults, standby systems with warm spares, and so on. Markov models can capture such important system behavior, but the size of a Markov model can grow exponentially with the number of components in this system. This paper presents an approach for avoiding the large state space problem. The approach uses a hierarchical modeling technique for analyzing complex reliability models. It allows the flexibility of Markov models where necessary and retains the efficiency of combinatorial solution where possible. Based on this approach a computer program called SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) has been written. The hierarchical modeling technique provides a very flexible mechanism for using decomposition and aggregation to model large systems; it allows for both combinatorial and Markov or semi-Markov submodels, and can analyze each model to produce a distribution function. The choice of the number of levels of models and the model types at each level is left up to the modeler. Component distribution functions can be any exponential polynomial whose range is between zero and one. Examples show how combinations of models can be used to evaluate the reliability and availability of large systems using SHARPE. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Workshop: R&M in Computer-Aided Engineering

    Page(s): 193
    Save to Project icon | Request Permissions | PDF file iconPDF (104 KB)  
    Freely Available from IEEE
  • Calculation of Coverage Parameter

    Page(s): 194 - 198
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (865 KB)  

    Programs for calculating the reliability of fault-tolerant systems do not explicitly take into account the effect of faults in the hardware recovery mechanism. This paper shows via an example how to incorporate these failures into the fault-handling (coverage) model of CARE III. A simple fault-tolerant system is described. The required coverage parameters are determined and the reliability is calculated using the models in CARE III. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Free proceedings

    Page(s): 198
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (144 KB)  

    Members, and only members, of the Reliability Society of IEEE and of the Electronics Division of ASQC can receive the following publications free of extra charge. Just write to the place indicated for that group and publication; you MUST state that YOU are a member of the group to which you are writing. Quantities are limited, and are available (ONLY to the above members) on a first-come first-served basis. If you are not a member of either group and would like to join, see the inside front and rear covers for more information on the two groups. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multiple-Disk System for Both Fault Tolerance and Improved Performance

    Page(s): 199 - 201
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (515 KB)  

    Many applications of computer disk systems require extremely stringent levels of reliability, typically achieved through some form of redundancy. This redundancy usually greatly increases the cost of the system. For a certain form of redundancy, the extra expense can be justified in that, not only is system reliability improved, but system performance is improved as well. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability of Systems with Limited Repairs

    Page(s): 202 - 207
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (917 KB)  

    Reliability is the probability that a system functions according to specifications over a given period of time. During this period, system specifications may allow failures and repairs to occur. This paper considers systems with specifications which limit the repair process. Such systems place a limitation on either the repair duration or the number of repairs. For example, a system controlling a real-time process may go down, be repaired, and continue proper control as long as the repair duration does not exceed a specified bound. Otherwise, the system fails. We model and analyze systems with three different types of limited repairs: 1) Bounded repair time, 2) Bounded cumulative repair time, and 3) Bounded number of repairs. Examples of such models exist in real-time process control, shock models, transaction processing, and maintenance models. For each of the three types of systems with limited repairs, we derive the distributions and the mean values of the system lifetime, the cumulative operational time, and the largest continuous operational time before a complete system failure. We also consider the execution of a task on such systems. The task is preempted upon the occurence of a failure, and is resumed or repeated after repair. The probability of completion of a task with a given work requirement in the three limited downtime scenarios is derived. We study the effect of preemptive-resume versus preemptive-repeat failures on the probability of task completion. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Correspondence Items

    Page(s): 207
    Save to Project icon | Request Permissions | PDF file iconPDF (170 KB)  
    Freely Available from IEEE
  • Effect of Maintenance on the Dependability and Performance of Mulitprocessor Systems

    Page(s): 208 - 215
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1261 KB)  

    This paper presents analytic models for dependability and performance evaluation of multiprocessor systems with both on-line and off-line maintenance. Markov models are developed to compute the system reliability and performance availability incorporating the reliability of the maintenance processor. The maintenance processor failure is considered separately in order to emphasize its effect on system performance and dependability. The reliability of the maintenance processor can not be ignored for degradable multiprocessors. Probabilistic models are presented to compute the system downtime and the service cost for three off-line maintenance policies: scheduled maintenance (SM), unscheduled maintenance (UM), and scheduled & unscheduled maintenance (SUM). The SUM policy, that combines both SM and UM, can be used to give a compromise between cost and downtime. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Optimal Redundancy of Multivalue-Output Systems

    Page(s): 216 - 221
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (915 KB)  

    This paper considers improving the reliability of multivalue-output systems by the use of n-redundant systems in which n copies of systems are used redundantly and the output is determined from the outputs of those copies by the voter. A k-out-of-n redundant system minimizes the mean loss caused by the occurrence of output errors under the condition that the voter can be composed of only two kinds of operators, logical sum and logical product. The optimal k depends on the probability and loss matrices, but it can be specified in some special cases. The mean loss of multivalue-output systems with multichannels can be minimized by adopting k-out-of-n redundancy for each channel. The results provide a powerful guide to the improvement of fail-safe characteristics of many systems and the design of fault-tolerant systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Seminar Proceedings

    Page(s): 221
    Save to Project icon | Request Permissions | PDF file iconPDF (103 KB)  
    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Endurance of EEPROMs with On-Chip Error Correction

    Page(s): 222 - 223
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB)  

    This paper presents an endurance model for EEPROMs utilizing an on-chip error-correction code (ECC). This is necessary to determine the effect that ECC schemes have upon endurance (and therefore, reliability) of EEPROMs. EEPROM technology is briefly discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-Tolerant ICs: The Reliability of TMR Yield-Enhanced ICs

    Page(s): 224 - 226
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (501 KB)  

    The use of triple modular redundancy (TMR) for reliability enhancement is well known. This paper presents a simple method' for predicting the reliability of integrated circuits (ICs) which use TMR for yield enhancement. A simple yield-model is included as it is necessary to factor in the effect of consumption of redundancy paths due to wafer fabrication defects. TMR implementation is briefly discussed as well. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Practical Papers

    Page(s): 226
    Save to Project icon | Request Permissions | PDF file iconPDF (134 KB)  
    Freely Available from IEEE
  • Fault-Tolerant System Using 3-Value Logic Circuits

    Page(s): 227 - 231
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (717 KB)  

    A ternary decision circuit implemented in CMOS technology is proposed. It can be used in a duplex binary fault-tolerant system to replace both the matcher and the switch circuit. The resultant system is simpler than the conventional one. The reliable design of the ternary decision circuit is discussed in detail. A duplex 2-of-3-value fault-tolerant system can be formed by two 2-of-3-value processors and a TDC. This system is more powerful than a duplex binary system since it can provide automatic error correcting function for certain faults. All single faults can be divided into self-checked faults and secure faults. For any self-checked faults, the TDC is self-testing, strongly fault secure, and totally self-checking. For any secure faults, the TDC is strongly fault secure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Annual Reliability & Maintainability Symposium Proceedings Price List for 1987

    Page(s): 231
    Save to Project icon | Request Permissions | PDF file iconPDF (160 KB)  
    Freely Available from IEEE
  • From the Editor

    Page(s): 232
    Save to Project icon | Request Permissions | PDF file iconPDF (47 KB)  
    Freely Available from IEEE
  • Our New Associate Editors

    Page(s): 233
    Save to Project icon | Request Permissions | PDF file iconPDF (80 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Reliability is concerned with the problems involved in attaining reliability, maintaining it through the life of the system or device, and measuring it.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Way Kuo
City University of Hong Kong