Scheduled System Maintenance
On Wednesday, December 20, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET.
During this time, there may be intermittent impact on performance. We apologize for any inconvenience.

International Conference on Dependable Systems and Networks, 2004

28 June-1 July 2004

Filter Results

Displaying Results 1 - 25 of 114
  • The effect of testing on reliability of fault-tolerant software

    Publication Year: 2004, Page(s):265 - 274
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (327 KB) | HTML iconHTML

    Previous models have investigated the impact upon diversity - and hence upon the reliability of fault-tolerant software built from 'diverse' versions - of the variation in 'difficulty' of demands over the demand space. These models are essentially static, taking a single snapshot view of the system. In this paper, we consider a generalisation in which the individual versions are allowed to evolve ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints

    Publication Year: 2004, Page(s):347 - 356
    Cited by:  Papers (27)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (424 KB) | HTML iconHTML

    Multi-criteria scheduling problems, involving optimization of more than one criterion, are subject to a growing interest. In this paper, we present a new bi-criteria scheduling heuristic for scheduling data-flow graphs of operations onto parallel heterogeneous architectures according to two criteria: first the minimization of the schedule length, and second the maximization of the system reliabili... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Caching-enhanced scalable reliable multicast

    Publication Year: 2004, Page(s):253 - 262
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (369 KB) | HTML iconHTML

    We present the caching-enhanced scalable reliable multicast (CESRM) protocol. CESRM augments the scalable reliable multicast (SRM) protocol (S. Floyd et al., 1995 and 1997) with a caching-based expedited recovery scheme. CESRM exploits the packet loss locality occurring in IP multicast transmissions in order to expeditiously recover from losses in the manner in which recent losses were recovered. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dependable initialization of large-scale distributed software

    Publication Year: 2004, Page(s):335 - 344
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (361 KB) | HTML iconHTML

    Most documented efforts in fault-tolerant computing address the problem of recovering from failures that occur during normal system operation. To bring a system to a point where it can begin performing its duties first requires that the system successfully complete initialization. Large-scale distributed systems may take hours to initialize. For such systems, a key challenge is tolerating failures... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High throughput Byzantine fault tolerance

    Publication Year: 2004, Page(s):575 - 584
    Cited by:  Papers (23)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (347 KB) | HTML iconHTML

    This paper argues for a simple change to Byzantine fault tolerant (BFT) state machine replication libraries. Traditional BFT state machine replication techniques provide high availability and security but fail to provide high throughput. This limitation stems from the fundamental assumption of generalized state machine replication techniques that all replicas execute requests sequentially in the s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Timed uniform consensus resilient to crash and timing faults

    Publication Year: 2004, Page(s):243 - 252
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1075 KB) | HTML iconHTML

    Δ-timed uniform consensus is a stronger variant of the traditional consensus and it satisfies the following additional property: The correct process terminates its execution within a constant time Δ (Δ-timeliness), and no two processes decide differently (uniformity). In this paper, we consider the Δ-timed uniform consensus problem in presence of ft crash process... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for dynamic Byzantine storage

    Publication Year: 2004, Page(s):325 - 334
    Cited by:  Papers (11)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (326 KB) | HTML iconHTML

    We present a framework for transforming several quorum-based protocols so that they can dynamically adapt their failure threshold and server count, allowing them to be reconfigured in anticipation of possible failures or to replace servers as desired. We demonstrate this transformation on the dissemination quorum protocol. The resulting system provides confirmable wait-free atomic semantics while ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intrusion tolerance and anti-traffic analysis strategies for wireless sensor networks

    Publication Year: 2004, Page(s):637 - 646
    Cited by:  Papers (75)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (448 KB) | HTML iconHTML

    Wireless sensor networks face acute security concerns in applications such as battlefield monitoring. A central point of failure in a sensor network is the base station, which acts as a collection point of sensor data. In this paper, we investigate two attacks that can lead to isolation or failure of the base station. In one set of attacks, the base station is isolated by blocking communication be... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dependable adaptive real-time applications in wormhole-based systems

    Publication Year: 2004, Page(s):567 - 572
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (425 KB) | HTML iconHTML

    This paper describes and discusses the work carried on in the context of the CORTEX project, for the development of adaptive real-time applications in wormhole based systems. The architecture of CORTEX relies on the existence of a timeliness wormhole, called timely computing base (TCB), which we have described in previous papers. Here we focus on the practical demonstration of the wormhole concept... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data-aware multicast

    Publication Year: 2004, Page(s):233 - 242
    Cited by:  Papers (22)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (490 KB) | HTML iconHTML

    This paper presents a multicast algorithm for peer-to-peer dissemination of events in a distributed topic-based publish-subscribe system, where processes publish events of certain topics, organized in a hierarchy, and expect events of topics they subscribed to. Our algorithm is "data-aware" in the sense that it exploits information about process subscriptions and topic inclusion relationships to b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tolerating hard faults in microprocessor array structures

    Publication Year: 2004, Page(s):51 - 60
    Cited by:  Papers (46)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (315 KB) | HTML iconHTML

    In this paper, we present a hardware technique, called self-repairing array structures (SRAS), for masking hard faults in microprocessor array structures, such as the reorder buffer and branch history table. SRAS masks errors that could otherwise lead to slow system recoveries. To detect row errors, every write to a row is mirrored to a dedicated "check row". We then read out both the written row ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The join problem in dynamic network algorithms

    Publication Year: 2004, Page(s):315 - 324
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (419 KB) | HTML iconHTML

    Distributed algorithms in dynamic networks often employ communication patterns whose purpose is to disseminate information among the participants. Gossiping is one form of such communication pattern. In dynamic settings, the set of participants can change substantially as new participants join, and as failures and voluntary departures remove those who have joined previously. A natural question for... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discovering 1-FT routes in mobile ad hoc networks

    Publication Year: 2004, Page(s):627 - 636
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1141 KB) | HTML iconHTML

    Transmitting messages in mobile wireless networks typically involves on-demand route discovery implemented via network-wide broadcast. Due to the dynamic nature of the network topology the life-time of a route is very short, so a source frequently requires a new route to an old destination. Simultaneous discovery of multiple routes can reduce the overhead due to repeated route discovery broadcasts... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Byzantine-tolerant erasure-coded storage

    Publication Year: 2004, Page(s):135 - 144
    Cited by:  Papers (49)  |  Patents (14)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (540 KB) | HTML iconHTML

    This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchronous environments with Byzantine failures of clients and servers. By exploiting versioning storage-nod... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Why PCs are fragile and what we can do about it: a study of Windows registry problems

    Publication Year: 2004, Page(s):561 - 566
    Cited by:  Papers (6)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (300 KB) | HTML iconHTML

    Software configuration problems are a major source of failures in computer systems. In this paper, we present a new framework for categorizing configuration problems. We apply this categorization to Windows registry-related problems obtained from various internal as well as external sources. Although infrequent, registry-related problems are difficult to diagnose and repair. Consequently they frus... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A method for performance analysis of earliest-deadline-first scheduling policy

    Publication Year: 2004, Page(s):826 - 834
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (354 KB) | HTML iconHTML

    This paper introduces an analytical method for approximating the fraction of jobs that miss their deadlines in a real-time system when earliest-deadline-first scheduling policy (EDF) is used. In the system, jobs either all have deadlines until the beginning of service or deadlines until the end of service. In the former case, EDF is known to be optimal and, in the latter case, it is optimal if pre... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exposing and eliminating vulnerabilities to denial of service attacks in secure gossip-based multicast

    Publication Year: 2004, Page(s):223 - 232
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (363 KB) | HTML iconHTML

    We propose a framework and methodology for quantifying the effect of denial of service (DoS) attacks on a distributed system. We present a systematic study of the resistance of gossip-based multicast protocols to DoS attacks. We show that even distributed and randomized gossip-based protocols, which eliminate single points of failure, do not necessarily eliminate vulnerabilities to DoS attacks. We... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault detection and isolation techniques for quasi delay-insensitive circuits

    Publication Year: 2004, Page(s):41 - 50
    Cited by:  Papers (26)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (516 KB) | HTML iconHTML

    This paper presents a circuit fault detection and isolation technique for quasi delay-insensitive asynchronous circuits. We achieve fault isolation by a combination of physical layout and circuit techniques. The asynchronous nature of quasi delay-insensitive circuits combined with layout techniques makes the design tolerant to delay faults. Circuit techniques are used to make sections of the desig... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cheap Paxos

    Publication Year: 2004, Page(s):307 - 314
    Cited by:  Papers (15)  |  Patents (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (289 KB) | HTML iconHTML

    Asynchronous algorithms for implementing a fault-tolerant distributed system, which can make progress despite the failure of any F processors, require 2F + 1 processors. Cheap Paxos, a variant of the Paxos algorithm, guarantees liveness under the additional assumption that the set of nonfaulty processors does not "jump around" too fast, but uses only F + 1 main processors that actually execute the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault diversity among off-the-shelf SQL database servers

    Publication Year: 2004, Page(s):389 - 398
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (337 KB) | HTML iconHTML

    Fault tolerance is often the only viable way of obtaining the required system dependability from systems built out of "off-the-shelf" (OTS) products. We have studied a sample of bug reports from four off-the-shelf SQL servers so as to estimate the possible advantages of software fault tolerance - in the form of modular redundancy with diversity - in complex off-the-shelf software. We checked wheth... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Customizing dependability attributes for mobile service platforms

    Publication Year: 2004, Page(s):617 - 626
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (364 KB) | HTML iconHTML

    Mobile service platforms are used to facilitate access to enterprise services such as email, product inventory, or design drawing databases by a wide range of mobile devices using a variety of access protocols. This paper presents a quality of service (QoS) architecture that allows flexible combinations of dependability attributes such as reliability, timeliness, and security to be enforced on a p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A decentralized algorithm for erasure-coded virtual disks

    Publication Year: 2004, Page(s):125 - 134
    Cited by:  Papers (24)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (394 KB) | HTML iconHTML

    A federated array of bricks is a scalable distributed storage system composed from inexpensive storage bricks. It achieves high reliability with low cost by using erasure coding across the bricks to maintain data reliability in the face of brick failures. Erasure coding generates n encoded blocks from m data blocks (n > m) and permits the data blocks to be reconstructed from any m of these encoded... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • In advance activation of backup channels for real-time transmission

    Publication Year: 2004, Page(s):555 - 560
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (347 KB) | HTML iconHTML

    Real-time transmission implies guaranteeing a given quality of service (QoS), requiring large use of network resources. Backup channels introduce the notion of availability to real-time transmission at the cost of increasing the use of network resources. However, this over-provisioning of resources is potentially wasted, since fault rate is very low. This paper introduces a new failure detection s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for evaluating storage system dependability

    Publication Year: 2004, Page(s):877 - 886
    Cited by:  Papers (14)  |  Patents (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (438 KB) | HTML iconHTML

    Designing storage systems to provide business continuity in the face of failures requires the use of various data protection techniques, such as backup, remote mirroring, point-in-time copies and vaulting, often in concert. Predicting the dependability provided by such compositions of techniques is difficult, yet necessary for dependable system design. We present a framework for evaluating the dep... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical computation of interval availability and related metrics

    Publication Year: 2004, Page(s):693 - 698
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (338 KB) | HTML iconHTML

    As the new generation high-availability commercial computer systems incorporate deferred repair service strategies, steady-state availability metrics may no longer reflect reality. Transient solution of availability models for such systems to calculate interval availability over shorter time horizon is desirable. While many solution methods for transient analysis have been proposed, how to apply t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.