Proceedings of IEEE International Computer Performance and Dependability Symposium

4-6 Sept. 1996

Filter Results

Displaying Results 1 - 25 of 41
  • Proceedings of IEEE International Computer Performance and Dependability Symposium

    Publication Year: 1996
    Request permission for commercial reuse | |PDF file iconPDF (613 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1996
    Request permission for commercial reuse | |PDF file iconPDF (74 KB)
    Freely Available from IEEE
  • Using time to improve the performance of coordinated checkpointing

    Publication Year: 1996, Page(s):282 - 291
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (840 KB)

    This paper describes and evaluates a coordinated checkpoint protocol that uses time to eliminate several performance overheads that are present in traditional protocols. The time-based protocol does not have to exchange coordination messages, does not need to add information to the processes' messages, and only accesses stable storage when checkpoints are saved. This protocol uses a simple initial... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance and availability evaluation of NUMA architectures

    Publication Year: 1996, Page(s):271 - 280
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (636 KB)

    A new approximation technique is proposed for obtaining analytic estimates of the performance of computing systems exhibiting non-uniform memory access (NUMA) times, where components are subject to failure and repair. The technique uses a multi-chain mean-value analysis together with a service time scaling factor derived to correct for the non-exponential (deterministic) service times found in mos... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MultiKron: performance measurement instrumentation

    Publication Year: 1996
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (88 KB)

    The focus of our instrumentation work at NIST is to provide hardware support in obtaining performance measurement data of parallel computers, as well as uniprocessors, with tolerable perturbation to both the executing processes and the architecture on which they are executing. Tracing events and counting are two basic concepts of performance measurement-both should be controlled from within the co... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A non-homogeneous Markov software reliability model with imperfect repair

    Publication Year: 1996, Page(s):262 - 270
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (616 KB)

    This paper reviews existing non-homogeneous Poisson process (NHPP) software reliability models and their limitations, and proposes a more powerful non-homogeneous Markov model for the fault detection/removal problem. In addition, this non-homogeneous Markov model allows for the possibility of a finite time to repair a fault and for imperfections in the repair process. The proposed scheme provides ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SHARPE: a modeler's toolkit

    Publication Year: 1996
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (92 KB)

    SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) is a program that supports the specification and automated solution of reliability and performance models. It contains support for fault trees, reliability block diagrams, reliability graphs, Markov and semi-Markov chains, generalized stochastic Petri nets, product-form queueing networks, and acyclic task graphs. These ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exact and approximate analysis of ATM cell loss correlation

    Publication Year: 1996, Page(s):120 - 129
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (816 KB)

    Cell loss behavior is analyzed for an ATM switch with homogeneous Markov sources. While prior work in this regard has focused mainly on measures such as the steady-state loss probability and conditional loss probability, we consider a more refined measure, namely the probability distribution of a cell loss period. Evaluation is based on a discrete-time, finite-state Markov process where cell arriv... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DEPEND: a simulation environment for system dependability modeling and evaluation

    Publication Year: 1996
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (88 KB)

    DEPEND is an evolving simulation-based environment for the evaluation of designs from functional and dependability viewpoints. DEPEND supports the VHDL hardware descriptive language as well as the C++ programming language. In addition, DEPEND provides a graphical modeling facility to allow interactive model construction. Designs under evaluation can be functional or structural descriptions of the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-line recovery for rediscovered software problems

    Publication Year: 1996, Page(s):78 - 87
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (864 KB)

    This paper discusses a method that can allow a system to avoid or recover from the exercise of certain known faults at runtime and thus make certain software upgrades unnecessary. The method uses the knowledge of the characteristic symptoms of a software fault and the appropriate recovery action for the fault to detect and recover from the future exercise of the fault. An analysis of field data sh... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dependability of fault-tolerant systems-explicit modeling of the interactions between hardware and software components

    Publication Year: 1996, Page(s):252 - 261
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (980 KB)

    This paper addresses the dependability modeling of hardware and software fault-tolerant systems taking into account explicitly the interactions between the various components. It presents a framework for modeling these interactions based on Generalized Stochastic Petri Nets (GSPNs). The modeling approach is modular: the behavior of each component and each interaction is represented by its own GSPN... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • QNAUT: approximately analyzing networks of PH|PH|1|K queues

    Publication Year: 1996
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (92 KB)

    With QNAUT we can perform the (approximate) analysis of possibly large, open networks of PH|PH|1 and PH|PH|1|K queues. Since up till now there are no exact means available to study such queueing networks (QNs), the approach supported by QNAUT is currently the best alternative. Starting point of our approach is the analysis of large open QNs as proposed by W. Whitt (known as QNA) in which large QNs... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SMART: simulation and Markovian analyzer for reliability and timing

    Publication Year: 1996
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (88 KB)

    SMART is a new tool for performance, reliability, availability, and performability modeling. Numerical solution algorithms are available for both continuous- and discrete-time Markov chains. Mixed-time non-Markovian models can be studied using simulation. Multiple interacting models and fixed-point iterative techniques for the decomposition and solution of complex models can be easily specified. T... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A structured approach to redundant disk array implementation

    Publication Year: 1996, Page(s):11 - 20
    Cited by:  Papers (3)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (784 KB)

    Error recovery in redundant disk arrays is typically performed in an ad hoc fashion, requiring architecture-specific code which limits extensibility and is difficult to verify. In this paper, we describe a technique for automating the execution of redundant disk array operations, including recovery from errors, independent of array architecture. Our approach employs a graphical representation of a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Steady state solution of MRSPN with mixed preemption policies

    Publication Year: 1996, Page(s):106 - 115
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (828 KB)

    Markov Regenerative Stochastic Petri Nets (MRSPN) have been recently recognized as a valuable tool to model systems with non-exponential timed activities. The usual assumption in the implementation of such models is that at most a single non-exponential transition, with associated enabling memory policy, can be enabled in each marking. More recently, new memory policies have been studied in order ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal routing for distributed computing systems with data replication

    Publication Year: 1996, Page(s):42 - 51
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (720 KB)

    In a Distributed Computing System (DCS), data files are often replicated several copies and stored on different computers to improve reliability and availability. The particular program and the data files required for its execution may thus be placed on different nodes. Successfully executing the program, however, often requires the access of data files from remote machines. Thus, the program exec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Conjoint simulation-a technique for the combined performance and dependability analysis of large-scale computer systems

    Publication Year: 1996, Page(s):68 - 77
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (1108 KB)

    In this paper we propose an approach which seamlessly integrates two modeling techniques to facilitate the simulation and analysis of performance and dependability of large-scale parallel computer systems. The approach, called Conjoint Simulation, combines object-oriented, process-based simulation with Petri net modeling. The approach splits a system into a performance model and a dependability mo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synchronisation delay in hardware fault-tolerance techniques

    Publication Year: 1996, Page(s):240 - 249
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (880 KB)

    This work deals with analytical modelling of synchronisation delay experienced in multiprocessing systems which execute redundant software to improve dependability figures. We propose a stochastic model based on a queuing network approach of system behaviour. We derive a bounded approximation of the distribution function of synchronisation delay by applying the matrix-geometric technique to analys... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance visualization: 2-D, 3-D, and beyond

    Publication Year: 1996, Page(s):188 - 197
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (2016 KB)

    During the past ten years, performance data visualization techniques have evolved from static, two-dimensional graphics to dynamic graphics and immersive virtual environments. We sketch the domain of applicability for each visualization technique using analysis of input/output behavior and WWW traffic as example problem domains. With this background, we describe experiences with virtual environmen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of integrated system-level checks for on-line error detection

    Publication Year: 1996, Page(s):292 - 301
    Cited by:  Papers (20)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (832 KB)

    This paper evaluates the capabilities of an integrated system level error detection technique using fault and error injection. This technique is comprised of two software level mechanisms for concurrent error detection, control flow checking using assertions (CCA) and data error checking using application specific data checks. Over 300,000 faults and errors were injected and the analysis of the re... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ORCHESTRA: a probing and fault injection environment for testing protocol implementations

    Publication Year: 1996
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (96 KB)

    Ensuring that a distributed system meets its prescribed specification is a growing challenge that confronts software developers and system engineers. Meeting this challenge is particularly important for applications with strict dependability and/or timeliness constraints. We have developed a software fault injection tool, called ORCHESTRA, for testing dependability and timing properties of distrib... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SIMPLE: a universal tool box for event trace analysis

    Publication Year: 1996
    Cited by:  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (120 KB)

    The event trace analysis system SIMPLE allows the evaluation of arbitrarily formatted event traces. SIMPLE is designed as a software package which comprises independent tools that are all based on a new kind of event trace access: the trace format is described in a trace description language (TDL) and evaluation tools access the event trace through a standardized problem-oriented event trace inter... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and simulation of a reliable token-ring protocol for realtime communications

    Publication Year: 1996, Page(s):130 - 138
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (808 KB)

    The key to the successful use of computer networks to support realtime communications is an adequate set of network communication protocols that ensure not only timely but also reliable message transmission. Traditional fault tolerance techniques may become invalid in a realtime environment as the latency introduced by those error recovery mechanisms may become unacceptably high. This paper presen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computation of the distribution of accumulated reward with Fluid Stochastic Petri-Nets

    Publication Year: 1996, Page(s):90 - 95
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (584 KB)

    We describe the recently introduced Fluid Stochastic Petri-Nets as a means of computing the distribution of the accumulated rate reward in a GSPN. In practice, it is the expected value of a reward which is computed, a quantity which is dependent solely on the solution of the underlying Markov chain. Until now, the instantaneous reward rates have been a function of the GSPN marking only, and the Ma... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SURF-2-a tool for modeling and evaluation of dependability measures

    Publication Year: 1996
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (108 KB)

    SURF-2 is a software tool for evaluating system dependability. It is especially designed for an evaluation based system design approach in which multiple design solutions need to be compared from the dependability viewpoint. SURF-2 allows the user to describe the model behavior using generalized stochastic Petri nets (GSPNs) whose reachability graph is then interpreted as a Markov chain, or enter ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.