Proceedings 1999 Pacific Rim International Symposium on Dependable Computing

16-17 Dec. 1999

Filter Results

Displaying Results 1 - 25 of 34
  • Proceedings 1999 Pacific Rim International Symposium on Dependable Computing

    Publication Year: 1999
    Request permission for commercial reuse | PDF file iconPDF (183 KB)
    Freely Available from IEEE
  • Index of authors

    Publication Year: 1999, Page(s): 277
    Request permission for commercial reuse | PDF file iconPDF (9 KB)
    Freely Available from IEEE
  • FBD: a fault-tolerant buffering disk system for improving write performance of RAID5 systems

    Publication Year: 1999, Page(s):95 - 102
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (108 KB)

    The parity calculation technique of the RAID5 provides high reliability, efficient disk space usage, and good read performance for parallel-disk-array configurations. However, it requires four disk accesses for each write request. The write performance of a RAID5 is therefore poor compared with its read performance. We propose a buffering system to improve write performance while maintaining the r... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Empirical-Bayesian availability indices of safety and time critical software systems with corrective maintenance

    Publication Year: 1999, Page(s):84 - 91
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (216 KB)

    If the recovery or remedial time is not incorporated in the reliability of a software module in a safety and time-critical integrated system operation, then a mere reliability index based on failure characteristics is simply not adequate and realistic. In deriving the probability density function (pdf) of the software availability, empirical Bayesian procedures will be used to employ expert engine... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using physical and simulated fault injection to evaluate error detection mechanisms

    Publication Year: 1999, Page(s):186 - 192
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (48 KB)

    Effective error detection is paramount for building highly dependable computing systems. A new methodology, based on physical and simulated fault injection, is developed for evaluating error detection mechanisms. Our approach consists of two steps. First, transient faults are physically injected at the IC pin level of a prototype server. Experiments are carried our in a three dimensional space of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Measurement and modeling of burst packet losses in Internet end-to-end communications

    Publication Year: 1999, Page(s):260 - 267
    Cited by:  Papers (17)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (152 KB)

    We have measured the packet loss ratio, its time dependency, and the frequency of burst packet losses in Internet end-to-end communications. To do this, we developed a tool that sends and receives UDP (User Datagram Protocol) packets. Our measurements showed that long burst losses are more likely when the packet loss ratio is high. We then examined two models for calculating the burst packet loss,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Testing-resource allocation for redundant software systems

    Publication Year: 1999, Page(s):78 - 83
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (136 KB)

    For many safety critical systems, redundancy is the only acceptable method to achieve high operational reliability as individual modules can hardly be certified to have reached that level. When limited resources are available in the testing of a redundant software system, it is important to allocate the testing-time efficiently so that the maximum reliability of the complete system is achieved. In... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Networked Windows NT system field failure data analysis

    Publication Year: 1999, Page(s):178 - 185
    Cited by:  Papers (28)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (84 KB)

    This paper presents a measurement-based dependability study of a Networked Windows NT system based on field data collected from NT System Logs from 503 servers running in a production environment over a four-month period. The event logs at hand contains only system reboot information. We study individual server failures and domain behavior in order to characterize failure behavior and explore erro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfiguration of two-dimensional meshes embedded in hypercubes

    Publication Year: 1999, Page(s):234 - 241
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (136 KB)

    Proposes a method of reconfiguring 2D meshes embedded in hypercubes. Our reconfiguration for link failures consists of two stages. The first stage assigns the d dimensions of the hypercubes to two directions with respect to rows and columns in the mesh, so that the number of disconnected pairs with adjacent rows and columns becomes smaller. The second stage re-establishes the mesh communication by... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fuzzy based approach for the design and evaluation of dependable systems using the Markov model

    Publication Year: 1999, Page(s):112 - 119
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (172 KB)

    Dependability is a subject of great importance in the development of critical systems and may be quantified in terms of various factors, such as reliability, maintainability and availability, whose significance may vary between different applications. A generic technique based on the Markov model is proposed in this paper, using fuzzy theory, for the reliability and safety assessment of fault-tole... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of message logging protocols for NOWs with MPI

    Publication Year: 1999, Page(s):252 - 259
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (72 KB)

    Among the various systems developed for parallel and distributed computing, networks of workstations (NOWs) based on the Message Passing Interface (MPI) have been recognized as an efficient platform. In this paper, we implement and compare two important message logging protocols, pessimistic and optimistic, for a NOW employing MPI. An experiment reveals that the total execution time is not signifi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combining methods for the analysis of a fault-tolerant system

    Publication Year: 1999, Page(s):135 - 142
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (112 KB)

    This paper presents experiences gained from the verification of a large-scale real-world embedded system by means of formal methods. This industrial verification project was performed for a fault-tolerant system designed and implemented by DaimlerChrysler Aerospace for the International Space Station ISS. The verification involved various aspects of system correctness, like deadlock and livelock a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware fault tolerance in arithmetic coding for data compression

    Publication Year: 1999, Page(s):70 - 77
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (80 KB)

    New fault tolerance techniques are presented for protecting a lossless compression algorithm, arithmetic coding, whose recursive nature makes it vulnerable to temporary hardware failures. The fundamental arithmetic operations are protected by low-cost residue codes, employing fault tolerance in multiplications and additions. Additional fault-tolerant design techniques are developed to protect othe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A simulated fault injection tool for dependable VoD application design

    Publication Year: 1999, Page(s):170 - 177
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (280 KB)

    This work presents a simulation-based tool for dependability-oriented design of Video on Demand (VoD) applications. The tool is organized in a layered architecture, so that simulation models can be built and detailed according to a hierarchical and modular approach. The higher layer, namely the Application Level, provides a variety of objects to rapidly model fundamental components typically found... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant routing algorithms based on optimal path matrices

    Publication Year: 1999, Page(s):227 - 233
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (88 KB)

    Presents a new concept - optimal path matrices (OPMs) - for fault-tolerant routing on hypercube multicomputers. OPMs stored on each node of a hypercube hold the fault information and indicate whether there is an optimal path from the node to a destination. Two fault-tolerant routing algorithms based on OPMs are proposed in order to maintain the matrices and to route messages from sources to destin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An automatic testing and diagnosis for FPGAs

    Publication Year: 1999, Page(s):45 - 52
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (240 KB)

    This paper presents a new design for testing and diagnosing the SRAM-based field programmable gate arrays (FPGA). By slightly modifying the original FPGA's SRAM memory, the new architecture permits the configuration data to be looped on a chip. Then the full testing and diagnosing of the FPGA are achieved by loading typically only one testing configuration datum (carefully chosen) instead of loadi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The effect of interconnect schemes on the dependability of a modular multi-processor system with shared resources

    Publication Year: 1999, Page(s):103 - 110
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (180 KB)

    AlliedSignal's Avionics & Lighting business unit is expanding the performance of its flight safety avionics by means of functional integration (added functionality enabled by exchanging information between traditionally stand-alone subsystems), as well as physical integration (sharing of system resources) and full dual redundancy. Major performance goals of this integrated modular architecture... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost of ensuring safety in distributed database management systems

    Publication Year: 1999, Page(s):193 - 200
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (152 KB)

    Generally, applications employing database management systems (DBMS) require that the integrity of the data stored in the database be preserved during normal operation as well as after crash recovery. Preserving database integrity and availability needs extra safety measures in the form of consistency checks. Increased safety measures inflict adverse effect on performance by reducing throughput an... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fault-tolerant data communication setup to improve reliability and performance for Internet based distributed applications

    Publication Year: 1999, Page(s):268 - 275
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (184 KB)

    The proposed fault-tolerant data communication setup has two main features: a consecutive transmission scheme that improves the reliability of message transmission, and an adaptive buffer management scheme that prevents message losses due to buffer overflow. These two features together reduce message retransmissions and produce better channel reliability and system performance. Simulation data con... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new placement algorithm dedicated to parallel computers: bases and application

    Publication Year: 1999, Page(s):242 - 249
    Cited by:  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (112 KB)

    One way to improve reliability in parallel computers consists of adding supplementary processors and interconnections to the functional structure in order to replace faulty processors with respect to the network structure. This approach is named structural fault tolerance (SFT). Very integrated parallel computers are one way to implement a parallel structure. The material structure is then compose... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancing dependability via parameterized refinement

    Publication Year: 1999, Page(s):120 - 127
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (164 KB)

    A probabilistic extension of the refinement calculus has been successfully applied in the design of safety-critical systems. The approach is based on a firm mathematical foundation within which the reasoning about correctness and behavior of the system under construction is carried out. The framework allows us also to obtain a quantitative assessment of the attributes of system dependability. We p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An architecture-based software reliability model

    Publication Year: 1999, Page(s):143 - 150
    Cited by:  Papers (34)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (136 KB)

    We present an analytical model for estimating architecture-based software reliability, according to the reliability of each component, the operational profile, and the architecture of software. Our approach is based on Markov chain properties and architecture view to state view transformations to perform reliability analysis on heterogeneous software architectures. We demonstrate how this analytic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parity sensitive comparators

    Publication Year: 1999, Page(s):53 - 59
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (100 KB)

    Parity sensitive comparators are a new type of comparators designed to take advantage of the parity information present in most buses. Instead of simply comparing the signals carried by the buses, parity information is used to select the probably correct output in case of mismatch, thus avoiding an important percentage of errors to stop system functioning. These devices verify parity for each pair... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Experimental assessment of COTS DBMS robustness under transient faults

    Publication Year: 1999, Page(s):201 - 208
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (144 KB)

    This paper evaluates the behavior of a common off-the-shelf (COTS) database management system (DBMS) in presence of transient faults. Database applications have traditionally been a field with fault-tolerance needs, concerning both data integrity and availability. While most of the commercially available DBMS provide support for data recovery and fault-tolerance, very limited knowledge was availab... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliable probabilistic checkpointing

    Publication Year: 1999, Page(s):153 - 160
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (208 KB)

    Recently proposed probabilistic checkpointing has one drawback, naming aliasing. When analyzed, 64-bit signatures show negligible possibility of aliasing. But in practice, the shift-XOR signature generation function used with probabilistic checkpointing shows a high aliasing rate, which limits the practicality of probabilistic checkpointing. In this paper, two enhancements are considered to make p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.