Scheduled System Maintenance:
On April 27th, single article purchases and IEEE account management will be unavailable from 2:00 PM - 4:00 PM ET (18:00 - 20:00 UTC).
We apologize for the inconvenience.
By Topic

Reliability, IEEE Transactions on

Issue 4 • Date Dec. 2004

Filter Results

Displaying Results 1 - 20 of 20
  • Table of contents

    Publication Year: 2004 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Reliability publication information

    Publication Year: 2004 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE
  • Recovery schemes for mesh arrays utilizing dedicated spares

    Publication Year: 2004 , Page(s): 445 - 451
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB) |  | HTML iconHTML  

    In this paper, new schemes are presented in which the spare nodes of a permanent-fault-tolerant processing array are utilized in their idling state to aid in an online transient-error recovery process. Though spares-based methods are well-known solutions to permanent-fault tolerance, the cost of these solutions and the idling spare capacity during normal operation have limited their widespread use. Manufacturers must be offered fault tolerance solutions which provide useful work at all times. We propose the enhanced utility of spares-based methods by commissioning idling spares (those spares remaining after fabrication and subsequent replacement of faulty units) to perform transient-error recovery tasks. Our scheme will commission idling spares to perform periodic on-line testing (verifying whether system is functioning correctly), and recovery point validation during normal operation. When an error occurs, the spare will perform additional testing to select recovery points. Transient-error recovery is required in harsh environments, such as high radiation, where frequent transient errors are unavoidable. In these environments, the cost of job completion can be extremely high without some form of error recovery. Successful job completion can be attained in environments frequented by error bursts by identifying reliable data through the process of periodic on-line testing. We apply our scheme to a mesh array architecture that has applications in digital signal processing. Simulations highlight the overhead of our schemes in terms of job completion time in environments burdened with frequent transient random errors and burst errors. The proposed strategies for recovery are limited to systems of regular structure. There are many applications in signal and image processing that require array processing in which the various nodes perform similar operations with different data sets. Therefore, it is not necessary to switch the application algorithms for the spares when they perform redundant computation in a staggered mode. While this is a significant feature, there is a small cost associated with presenting the same data to a node as well as a spare. With built-in hardware and reconfiguration switches in the fault tolerant arrays, we believe this cost will be insi- gnificant. Extension of our work to more general systems requires consideration of many issues including system timing, and sub-unit communication & dependence. This is a problem for future research. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal replacement policy for a three-state repairable system with a monotone process model

    Publication Year: 2004 , Page(s): 452 - 457
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB) |  | HTML iconHTML  

    In this paper, a deteriorating simple repairable system with three states, including two failure states and one working state, is studied. Assume that the system after repair cannot be "as good as new", and the deterioration of the system is stochastic. Under these assumptions, we use a replacement policy N based on the failure number of the system. Then our aim is to determine an optimal replacement policy N* such that the average cost rate (i.e., the long-run average cost per unit time) is minimized. An explicit expression of the average cost rate is derived. Then, an optimal replacement policy is determined analytically or numerically. Furthermore, we can find that a repair model for the three-state repairable system in this paper forms a general monotone process model. Finally, we put forward a numerical example, and carry through some discussions and sensitivity analysis of the model in this paper. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Human-reliability analysis of cooperative redundancy to support diagnosis

    Publication Year: 2004 , Page(s): 458 - 464
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (328 KB) |  | HTML iconHTML  

    This paper presents a human reliability analysis method of cooperative redundancy in support of diagnosis. We use a particular redundancy on which a diagnosis function is realized jointly by interacting human and automated controllers. The proposed cooperative redundancy supports a retrospective and experiential failure location function. It integrates an automated controller, based on several points of view: on a list of possible system failures, on a model of coherence between points of view, and on different operations such as focusing on or excluding components or cancellation of a previous operation. A particular dialogue interface is specified for an industrial case to join the human and automated controllers' reasoning, in order to optimize their mutual understanding when making inferences on failures. Such a cooperative redundancy is analyzed with a discrete human reliability approach in order to evaluate its efficiency. This new approach is a conditional, multi-objective probabilistic method which takes into account two types of constraints: constraints based on human behavior, e.g., the time consuming human reasoning, and constraints regarding the system being diagnosed, e.g., the quality of the diagnosis. It is based on different modes of reasoning: the normal mode, the degraded mode, the failed mode, and the success mode. Both normal and degraded modes concern the human behavior dependent constraints. The outputs of both modes are either the success mode if system dependent constraints are satisfied, or the failed mode if not. The cooperative redundancy for diagnosis support is applied to the phone network troubleshooting, and experimental results are analyzed by using the defined analysis method. Conclusions have shown the feasibility of applying cooperative redundancy to diagnosis support. However, even if this cooperative redundancy improves the quality of the diagnosis, future researchers have to improve this redundancy in order to reduce the average delay to make a diagnosis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A scenario-based reliability analysis approach for component-based software

    Publication Year: 2004 , Page(s): 465 - 480
    Cited by:  Papers (52)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (608 KB) |  | HTML iconHTML  

    This paper introduces a reliability model, and a reliability analysis technique for component-based software. The technique is named Scenario-Based Reliability Analysis (SBRA). Using scenarios of component interactions, we construct a probabilistic model named Component-Dependency Graph (CDG). Based on CDG, a reliability analysis algorithm is developed to analyze the reliability of the system as a function of reliabilities of its architectural constituents. An extension of the proposed model and algorithm is also developed for distributed software systems. The proposed approach has the following benefits: 1) It is used to analyze the impact of variations and uncertainties in the reliability of individual components, subsystems, and links between components on the overall reliability estimate of the software system. This is particularly useful when the system is built partially or fully from existing off-the-shelf components; 2) It is suitable for analyzing the reliability of distributed software systems because it incorporates link and delivery channel reliabilities; 3) The technique is used to identify critical components, interfaces, and subsystems; and to investigate the sensitivity of the application reliability to these elements; 4) The approach is applicable early in the development lifecycle, at the architecture level. Early detection of critical architecture elements, those that affect the overall reliability of the system the most, is useful in delegating resources in later development phases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sequential capacity determination of subnetworks in network performance analysis

    Publication Year: 2004 , Page(s): 481 - 486
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (216 KB) |  | HTML iconHTML  

    In the process of performance evaluation for a stochastic network whose links are subject to failure, subnetworks are repeatedly generated to reflect various states of the network, and the capacity of each subnetwork is to be determined upon generation. The capacity of a network is the maximum amount of flow which can be transmitted through the network. Although there are existing algorithms for network capacity computation, it would create a great number of repetitions to compute the capacity of each subnetwork anew upon generation in the process. This is true because subnetworks are generated by combining certain links to the current one, and hence each current subnetwork is embedded in those new subnetworks. Recently, a number of methods have been proposed in the context of searching a method which efficiently computes the capacity of subnetworks by utilizing the given information of minimal paths, and preferably without many repetitions in sequential computations. But, most of the methods still have drawbacks of either failing to give correct results in certain situations, or computing the capacity of each subnetwork anew whenever the subnetwork is generated. In this paper, we propose a method based on the concepts of signed minimal path, and unilateral link, as defined in the text. Our method not only computes the capacity of each subnetwork correctly, but also eliminates the repetitive steps in sequential computations, and thereby efficiently reduces the number of subnetworks to consider for capacity computation as well. Numerical examples are presented to illustrate the method. The drawbacks of other methods are also discussed with counter examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sequential diagnosis of processor array systems

    Publication Year: 2004 , Page(s): 487 - 498
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (992 KB) |  | HTML iconHTML  

    We examine the diagnosis of processor array systems formed as two-dimensional arrays, with boundaries, and either four or eight neighbors for each interior processor. We employ a parallel test schedule. Neighboring processors test each other, and report the results. Our diagnostic objective is to find a fault-free processor or set of processors. The system may then be sequentially diagnosed by repairing those processors tested faulty according to the identified fault-free set, or a job may be run on the identified fault-free processors. We establish an upper bound on the maximum number of faults which can be sustained without invalidating the test results under worst case conditions. We give test schedules and diagnostic algorithms which meet the upper bound as far as the highest order term. We compare these near optimal diagnostic algorithms to alternative algorithms, both new and already in the literature, and against an upper bound ideal case algorithm, which is not necessarily practically realizable. For eight-way array systems with N processors, an ideal algorithm has diagnosability 3N23/-2N12/ plus lower-order terms. No algorithm exists which can exceed this. We give an algorithm which starts with tests on diagonally connected processors, and which achieves approximately this diagnosability. So the given algorithm is optimal to within the two most significant terms of the maximum diagnosability. Similarly, for four-way array systems with N processors, no algorithm can have diagnosability exceeding 3N23//213/-2N12/ plus lower-order terms. And we give an algorithm which begins with tests arranged in a zigzag pattern, one consisting of pairing nodes for tests in two different directions in two consecutive test stages; this algorithm achieves diagnosability (3/2)(5/2)13/N23/-(5/4)N12/ plus lower-order terms, which is about 0.85 of the upper bound due to an ideal algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modular solution of dynamic multi-phase systems

    Publication Year: 2004 , Page(s): 499 - 508
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (400 KB) |  | HTML iconHTML  

    Binary Decision Diagram (BDD)-based solution approaches and Markov chain based approaches are commonly used for the reliability analysis of multi-phase systems. These approaches either assume that every phase is static, and thus can be solved with combinatorial methods, or assume that every phase must be modeled via Markov methods. If every phase is indeed static, then the combinatorial approach is much more efficient than the Markov chain approach. But in a multi-phased system, using currently available techniques, if the failure criteria in even one phase is dynamic, then a Markov approach must be used for every phase. The problem with Markov chain based approaches is that the size of the Markov model can expand exponentially with an increase in the size of the system, and therefore becomes computationally intensive to solve. Two new concepts, phase module and module joint probability, are introduced in this paper to deal with the s-dependency among phases. We also present a new modular solution to nonrepairable dynamic multi-phase systems, which provides a combination of BDD solution techniques for static modules, and Markov chain solution techniques for dynamic modules. Our modular approach divides the multi-phase system into its static and dynamic subsystems, and solves them independently; and then combines the results for the solution of the entire system using the module joint probability method. A hypothetical example multi-phase system is given to demonstrate the modular approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dependability modeling and evaluation of multiple-phased systems using DEEM

    Publication Year: 2004 , Page(s): 509 - 522
    Cited by:  Papers (23)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (592 KB) |  | HTML iconHTML  

    Multiple-Phased Systems (MPS), i.e., systems whose operational life can be partitioned in a set of disjoint periods, called "phases", include several classes of systems such as Phased Mission Systems and Scheduled Maintenance Systems. Because of their deployment in critical applications, the dependability modeling and analysis of Multiple-Phased Systems is a task of primary relevance. The phased behavior makes the analysis of Multiple-Phased Systems extremely complex. This paper describes the modeling methodology and the solution procedure implemented in DEEM, a dependability modeling and evaluation tool specifically tailored for Multiple Phased Systems. It also describes its use for the solution of representative MPS problems. DEEM relies upon Deterministic and Stochastic Petri Nets as the modeling formalism, and on Markov Regenerative Processes for the model solution. When compared to existing general-purpose tools based on similar formalisms, DEEM offers advantages on both the modeling side (sub-models neatly model the phase-dependent behaviors of MPS), and on the evaluation side (a specialized algorithm allows a considerable reduction of the solution cost and time). Thus, DEEM is able to deal with all the scenarios of MPS which have been analytically treated in the literature, at a cost which is comparable with that of the cheapest ones, completely solving the issues posed by the phased-behavior of MPS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combined k-out-of-n:G, and consecutive kc-out-of-n:G systems

    Publication Year: 2004 , Page(s): 523 - 531
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (368 KB) |  | HTML iconHTML  

    Qualification tests for a system are normally carried out according to either a k-out-of-n:G scheme, or a consecutive kc-out-of-n:G structure. The reliability of a combination of the two systems is evaluated, showing its benefit over each of the individual structures. As expected, the mean time to failure of the combined system is larger than any of them. Generalizations of the analysis are presented for tests with multi-state results, and for dependent tests. Illustrative numerical results are presented to substantiate the theory. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Telecommunication access network design with reliability constraints

    Publication Year: 2004 , Page(s): 532 - 541
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (480 KB) |  | HTML iconHTML  

    In this paper, we study the problem of the design of telecommunication access networks with reliability constraints. These networks form an important part of the telecommunications infrastructure of large organizations, such as banks. Using data patterned after an actual bank network in the U.S., we formulate an optimization model for this problem which specifically takes into account the various cost, and discount structures offered by telecommunication carriers. We then develop dedicated solution procedures for obtaining solutions. Starting from a cluster solution, we then use perturbation techniques which we developed specifically for this problem within an overall simulated annealing solution algorithm. We show how to make the solution procedure more efficient by implicitly determining the values for many variables. We then report the results of our computational testing for a variety of problems. We compare our solution to a lower bound obtained using a linear programming relaxation. We show that substantial cost savings can be realized with our model, and solution procedure. Finally, we discuss which types of annealing steps in the simulated annealing algorithm are important. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bounding the times to failure of 2-component systems

    Publication Year: 2004 , Page(s): 542 - 550
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (424 KB) |  | HTML iconHTML  

    Characterizing the distribution of times to failure in 2-component systems is an important special case of a more general problem, finding the distribution of a function of random variables. Advances in this area are relevant to reliability as well as other fields, and influential papers on the topic have appeared in the reliability field over a span of many years. Using failure times of 2-component systems as a vehicle, this report begins by reviewing a technique for characterizing distributions of functions of random variables when the dependency relationship between the random variables used as inputs to the function is unknown. The technique addressed is called Distribution Envelope determination (DEnv). Using this review as a foundation, an extension to DEnv is described which applies to cases where means and variances of the input distributions are known, and partial information about dependency is available in the form of a value for correlation. Pearson correlation is used because it is the most commonly encountered correlation measure. This reason is important because the assumption of independence, while common, is frequently problematic. Yet the opposite extreme of no assumption about dependency may mean ignoring available information which could affect the analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability improvement of Internet web servers through statistical process control

    Publication Year: 2004 , Page(s): 551 - 556
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB) |  | HTML iconHTML  

    The proposed method, General Cross-Correlation Process Control (GCCPC), is intended for timely detection of hacker attacks and faults, before they become critical and disrupt the normal activity of a web server. It is based on observation of the correlations between the variables measured in the course of server activity. Some of the stronger of these correlations undergo drastic weakening at the onset of faults. A methodology is presented for choice of the variables to be observed, and of the monitoring parameters. To be noted are the robustness of the method, and the low cost of installation and maintenance of the system. Recommendations are given on real-world implementation of the method, as a basis for an automated protection system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • EWMA forecast of normal system activity for computer intrusion detection

    Publication Year: 2004 , Page(s): 557 - 566
    Cited by:  Papers (18)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB) |  | HTML iconHTML  

    Intrusions into computer systems have caused many quality/reliability problems. Detecting intrusions is an important part of assuring the quality/reliability of computer systems by quickly detecting intrusions and associated quality/reliability problems in order to take corrective actions. In this paper, we present and compare two methods of forecasting normal activities in computer systems for intrusion detection. One forecasting method uses the average of long-term normal activities as the forecast. Another forecasting method uses the EWMA (exponentially weighted moving average) one-step-ahead forecast. We use a Markov chain model to learn and predict normal activities used in the EWMA forecasting method. A forecast of normal activities is used to detect a large deviation of the observed activities from the forecast as a possible intrusion into computer systems. A Chi square distance metric is used to measure the deviation of the observed activities from the forecast of normal activities. The two forecasting methods are tested on computer audit data of normal and intrusive activities for intrusion detection. The results indicate that the Chi square distance measure with the EWMA forecasting provides better performance in intrusion detection than that with the average-based forecasting method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal design of k-out-of-n:G subsystems subjected to imperfect fault-coverage

    Publication Year: 2004 , Page(s): 567 - 575
    Cited by:  Papers (21)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB) |  | HTML iconHTML  

    Systems subjected to imperfect fault-coverage may fail even prior to the exhaustion of spares due to uncovered component failures. This paper presents optimal cost-effective design policies for k-out-of-n:G subsystems subjected to imperfect fault-coverage. It is assumed that there exists a k-out-of-n:G subsystem in a nonseries-parallel system and, except for this subsystem, the redundancy configurations of all other subsystems are fixed. This paper also presents optimal design polices which maximize overall system reliability. As a special case, results are presented for k-out-of-n:G systems subjected to imperfect fault-coverage. Examples then demonstrate how to apply the main results of this paper to find the optimal configurations of all subsystems simultaneously. In this paper, we show that the optimal n which maximizes system reliability is always less than or equal to the n which maximizes the reliability of the subsystem itself. Similarly, if the failure cost is the same, then the optimal n which minimizes the average system cost is always less than or equal to the n which minimizes the average cost of the subsystem. It is also shown that if the subsystem being analyzed is in series with the rest of the system, then the optimal n which maximizes subsystem reliability can also maximize the system reliability. The computational procedure of the proposed algorithms is illustrated through the examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Reliability information for authors

    Publication Year: 2004 , Page(s): 576 - 577
    Save to Project icon | Request Permissions | PDF file iconPDF (51 KB)  
    Freely Available from IEEE
  • Have you visited lately? www.ieee.org [advertisement]

    Publication Year: 2004 , Page(s): 578
    Save to Project icon | Request Permissions | PDF file iconPDF (220 KB)  
    Freely Available from IEEE
  • 2004 Index

    Publication Year: 2004 , Page(s): 579 - 587
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | PDF file iconPDF (76 KB)  
    Freely Available from IEEE
  • IEEE Member Digital Library [advertisement]

    Publication Year: 2004 , Page(s): 588
    Save to Project icon | Request Permissions | PDF file iconPDF (179 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Reliability is concerned with the problems involved in attaining reliability, maintaining it through the life of the system or device, and measuring it.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Way Kuo
City University of Hong Kong