By Topic

Reliability, IEEE Transactions on

Issue 1 • Date March 2002

Filter Results

Displaying Results 1 - 16 of 16
  • Editorial - fault-trees and cause-consequence charts

    Publication Year: 2002 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (144 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Editorial - belling the cat, revisited

    Publication Year: 2002 , Page(s): 2
    Save to Project icon | Request Permissions | PDF file iconPDF (141 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Analysis of stratified testing for multichip module systems

    Publication Year: 2002 , Page(s): 100 - 110
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (430 KB) |  | HTML iconHTML  

    A stratified technique is proposed for testing multichip module systems. Stratification in multichip modules due to the different nature and procurement of these chips is exploited for achieving a high quality-level at a saving of a significant number of tests during assembly. Unlike conventional random testing, the proposed approach (referred to as the lowest yield-stratum first-testing), takes into account the uneven known-good-yield. In the lowest yield-stratum first-testing approach, the effect of the uneven known-good-yield between strata is analyzed with respect to the variance of known-good-yield and the sample size. The lowest yield-stratum first-testing approach significantly outperforms conventional random testing and random stratified testing. This method is competitive even compared to a conventional exhaustive testing at a very small loss in quality-level by greedy (first) testing the chips in the stratum with lower known-good-yield. A Markov-chain model is developed to analyze these testing approaches under the assumption of physically independent failure of chips in multichip module systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distinguishing between lognormal and Weibull distributions [time-to-failure data]

    Publication Year: 2002 , Page(s): 32 - 38
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (334 KB) |  | HTML iconHTML  

    A previous method for deciding if a set of time-to-fail data follows a lognormal distribution or a Weibull distribution is expanded upon. Pearson's s-correlation coefficient is calculated for lognormal and Weibull probability plots of the time-to-fail data. The test statistic is the ratio of the two s-correlation coefficients. When "standardized", the lognormal and Weibull variables map into 1 of 2 gamma distributions with no dependence on the shape or scaling factors, confirming earlier observations. Using a set of Monte Carlo simulations, the test statistic was found to be s-normally distributed to good approximation. Formulas for estimating the mean and standard deviation of the test statistic were derived, allowing for an estimate of the probability of hypothesis test errors. As anticipated, the test capability increases with increasing sample size, but only if a substantial fraction of the parts actually fail. If less than 10% of the parts are stressed to failure, then it is almost impossible to distinguish between lognormal and Weibull distributions. If all parts are stressed to failure, the probability of making a correct choice is fair for sample sizes as small as 10, and becomes quite good if the sample size is at least 50. The statistical technique for distinguishing lognormal from Weibull distributions is presented. Its theoretical foundation is given at a qualitative level, and the range of useful application is explored. An approximate form for the distribution of the test statistic is inferred from Monte Carlo simulation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parametric inference for step-stress models

    Publication Year: 2002 , Page(s): 27 - 31
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (272 KB) |  | HTML iconHTML  

    Applications of the additive accumulation of damages (AAD) or the accelerated failure time (AFT) and the proportional hazards (PH) models in accelerated life testing with step-stresses are discussed. A new model including AAD and PH models is proposed. It is more reasonable than the PH model and wider then the AAD model. Constructing the maximum likelihood function is discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effectiveness-analysis of real-time data acquisition and processing multichannel systems

    Publication Year: 2002 , Page(s): 91 - 99
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (312 KB) |  | HTML iconHTML  

    Consider a real-time data acquisition and processing multiserver (e.g., unmanned air vehicles and machine controllers) and multichannel (e.g., surveillance regions, communication channels, and assembly lines) system involving maintenance. In this kind of system, jobs are executed immediately upon arrival, conditional on system availability. That part of the job which is not served immediately is lost forever and cannot be processed later. Thus, queuing of jobs in such systems is impossible. The effectiveness analysis of real-time systems in a multichannel environment is important. Several definitions of performance effectiveness index for the real-time system under consideration are suggested. The real-time system is treated with exponentially distributed time-to-failure, maintenance, interarrival and duration times as a Markov chain in order to compute its steady-state probabilities and performance effectiveness index via analytic and numerical methods. Some interesting analytic results are presented concerning a worst-case analysis, which is most typical in high-performance data acquisition and control real-time systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability inferences of modulated power-law process #i

    Publication Year: 2002 , Page(s): 23 - 26
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (260 KB)  

    The PLP (power-law process) is often used to model failure data from repairable systems, when both renewal type behavior and time trends are present. This paper considers the reliability inferences of modulated-PLP models. The maximum likelihood estimate and the uniformly minimum variance unbiased estimate of Modulated PLP #i are derived and computed numerically. Two examples are given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Error detection by duplicated instructions in super-scalar processors

    Publication Year: 2002 , Page(s): 63 - 75
    Cited by:  Papers (145)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (578 KB) |  | HTML iconHTML  

    This paper proposes a pure software technique "error detection by duplicated instructions" (EDDI), for detecting errors during usual system operation. Compared to other error-detection techniques that use hardware redundancy, EDDI does not require any hardware modifications to add error detection capability to the original system. EDDI duplicates instructions during compilation and uses different registers and variables for the new instructions. Especially for the fault in the code segment of memory, formulas are derived to estimate the error-detection coverage of EDDI using probabilistic methods. These formulas use statistics of the program, which are collected during compilation. EDDI was applied to eight benchmark programs and the error-detection coverage was estimated. Then, the estimates were verified by simulation, in which a fault injector forced a bit-flip in the code segment of executable machine codes. The simulation results validated the estimated fault coverage and show that approximately 1.5% of injected faults produced incorrect results in eight benchmark programs with EDDI, while on average, 20% of injected faults produced undetected incorrect results in the programs without EDDI. Based on the theoretical estimates and actual fault-injection experiments, EDDI can provide over 98% fault-coverage without any extra hardware for error detection. This pure software technique is especially useful when designers cannot change the hardware, but they need dependability in the computer system. To reduce the performance overhead, EDDI schedules the instructions that are added for detecting errors such that "instruction-level parallelism" (ILP) is maximized. Performance overhead can be reduced by increasing ILP within a single super-scalar processor. The execution time overhead in a 4-way super-scalar processor is less than the execution time overhead in the processors that can issue two instructions in one cycle View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability analysis of systems with operation-time management

    Publication Year: 2002 , Page(s): 39 - 48
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (323 KB)  

    This paper considers the reliability analysis of systems whose operation time is managed with regard to criticality of faults. Several analysis approaches are proposed for the numerical probability analysis of one such operational scenario: FOTM (flight operation time management). The equations derived for the Markov chain based method are novel and their implementation in the tool SPNP (Stochastic Petri Net Program) could help in assessing the risk reduction due to FOTM. The analysis methods are quite general and can be applied to other similar situations where system-failure risk is decreased by operation-time management View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The EFTOS approach to dependability in embedded supercomputing

    Publication Year: 2002 , Page(s): 76 - 90
    Cited by:  Papers (4)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (410 KB)  

    Industrial embedded supercomputing applications benefit from a systematic approach to fault tolerance. The EFTOS (embedded fault-tolerant supercomputing) framework provides a flexible and adaptable set of fault-tolerance tools from which the application developer can choose to make an embedded application on a parallel or distributed system more dependable. A high-level description (recovery language) helps the developer specify the fault-tolerance strategies of the application as a second application layer; this separates functional from fault-tolerance aspects of an application, thus shortening the development cycle and improving maintainability. The framework incorporates a backbone (to hook a set of fault-tolerance tools onto, and to coordinate the fault-tolerance actions) and a presentation layer (to monitor and test the fault tolerance behavior). A practical implementation is described with its performance evaluation, using an industrial case study from the energy-transport area, as well as an analytic deduction of the appropriateness of fault-tolerance techniques for various application profiles View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A linear-time algorithm for computing K-terminal reliability on proper interval graphs

    Publication Year: 2002 , Page(s): 58 - 62
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (288 KB) |  | HTML iconHTML  

    Consider a probabilistic graph in which the edges are perfectly reliable, but vertices can fail with some known probabilities. The K-terminal reliability of this graph is the probability that a given set of vertices K is connected. This reliability problem is #P-complete for general graphs, and remains #P-complete for chordal graphs and comparability graphs. This paper presents a linear-time algorithm for computing K-terminal reliability on proper interval graphs. A graph G = (V, E) is a proper interval graph if there exists a mapping from V to a class of intervals I of the real line with the properties that two vertices in G are adjacent if their corresponding intervals overlap and no interval in I properly contains another. This algorithm can be implemented in O(|V| + |E|) time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An on-line BIST RAM architecture with self-repair capabilities

    Publication Year: 2002 , Page(s): 123 - 128
    Cited by:  Papers (25)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (222 KB) |  | HTML iconHTML  

    The emerging field of self-repair computing is expected to have a major impact on deployable systems for space missions and defense applications, where high reliability, availability, and serviceability are needed. In this context, RAM (random access memories) are among the most critical components. This paper proposes a built-in self-repair (BISR) approach for RAM cores. The proposed design, introducing minimal and technology-dependent overheads, can detect and repair a wide range of memory faults including: stuck-at, coupling, and address faults. The test and repair capabilities are used on-line, and are completely transparent to the external user, who can use the memory without any change in the memory-access protocol. Using a fault-injection environment that can emulate the occurrence of faults inside the module, the effectiveness of the proposed architecture in terms of both fault detection and repairing capability was verified. Memories of various sizes have been considered to evaluate the area-overhead introduced by this proposed architecture View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A reliability test-plan for series systems with components having stochastic failure rates

    Publication Year: 2002 , Page(s): 17 - 22
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (277 KB) |  | HTML iconHTML  

    This paper proposes a reliability test plan for a series system, by considering the parameter λj of the exponential distribution to be a random variable having uniform distribution over [0, θj], j = 1, 2,..., n. Explicit expressions are obtained for the optimal values of the tj, when the number of components in the system is 2. The general solution, albeit implicit, has also been obtained when the number of components in a given system is ⩾3. Mathematical programming is used to find the optimal solution and to illustrate it with numerical results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved reliability-prediction and field-reliability-data analysis for field-replaceable units

    Publication Year: 2002 , Page(s): 8 - 16
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (286 KB) |  | HTML iconHTML  

    This paper presents a method for analyzing field-replaceable unit (FRU) field-data to obtain accurate field-reliability estimates. The estimates are used to improve reliability prediction and to prioritize the FRU that potentially need design improvement. Application of the method to a large telecommunication project reveals important differences in the age-dependent reliability estimates and the predicted constant estimate. The observed difference is used to measure the statistical bias due to special causes, and a correction term is added to obtain the improved reliability prediction model. The results of applying the priority scheme to FRU data are encouraging and show the importance of including quality-of-information in decision-making View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Control-flow checking by software signatures

    Publication Year: 2002 , Page(s): 111 - 122
    Cited by:  Papers (149)  |  Patents (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (394 KB) |  | HTML iconHTML  

    This paper presents a new signature monitoring technique, CFCSS (control flow checking by software signatures); CFCSS is a pure software method that checks the control flow of a program using assigned signatures. An algorithm assigns a unique signature to each node in the program graph and adds instructions for error detection. Signatures are embedded in the program during compilation time using the constant field of the instructions and compared with run-time signatures when the program is executed. Another algorithm reduces the code size and execution time overhead caused by checking instructions in CFCSS. A "branching fault injection experiment" was performed with benchmark programs. Without CFCSS, an average of 33.7 % of the injected branching faults produced undetected incorrect outputs; however, with CFCSS, only 3.1 % of branching faults produced undetected incorrect outputs. Thus it is possible to increase error detection coverage for control flow errors by an order of magnitude using CFCSS. The distinctive advantage of CFCSS over previous signature monitoring techniques is that CFCSS is a pure software method, i.e., it needs no dedicated hardware such as a watchdog processor for control flow checking. A watchdog task in multitasking environment also needs no extra hardware, but the advantage of CFCSS over a watchdog task is that CFCSS can be used even when the operating system does not support multitasking View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability modeling and assessment of the Star-Graph networks

    Publication Year: 2002 , Page(s): 49 - 57
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (332 KB)  

    The reliability of the Star Graph architecture is discussed. The robustness of the Star Graph network under node failures, link failures, and combined node and link failures is shown. The degradation of the Star Graph into Substar Graphs is used as the measure of system effectiveness in the face of failures. Models are provided for each of the failure and re-mapping modes evaluated herein, and the resilience of the Star Graph to failures is emphasized. This paper defines failure of a Star Graph as being when no fault-free (n - 1)-substars remain operational and the intermediate states are defined by the number of (n - 1)-substars that remain operational. A powerful tool (re-mapping) is introduced in which the number of operational (n 1)-substars can be maintained for longer periods, thus improving the overall MTTF (mean time to failure). For comparison the results of a similar reliability analysis of the hypercube is shown. The comparisons are considered conservative due to the failure model used herein for the star graph. One might apply re-mapping to hypercubes; while it would improve the overall MTTF of hypercubes, the hypercubes would still have an appreciably poorer performance than star graphs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Reliability is concerned with the problems involved in attaining reliability, maintaining it through the life of the system or device, and measuring it.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Way Kuo
City University of Hong Kong