Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281)

20-23 Oct. 1998

Filter Results

Displaying Results 1 - 25 of 71
  • Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281)

    Publication Year: 1998
    Request permission for commercial reuse | PDF file iconPDF (301 KB)
    Freely Available from IEEE
  • Programming the grid: component systems for distributed applications

    Publication Year: 1998
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (42 KB)

    Summary form only given. The traditional model of software design for large scale scientific problem solving is outdated. The emphasis is now on large teams that must build simulation software that integrates physical systems from multiple scientific disciplines. In addition to the problem of multi-disciplinary physics, the computational environment is now a grid of distributed resources consistin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Security in mobile systems

    Publication Year: 1998, Page(s):407 - 412
    Cited by:  Papers (1)  |  Patents (35)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1055 KB)

    Mobile computing has become popular over the last few years. Users need to have continuous access to information even when they are mobile, e.g., a doctor may need to constantly monitor a patient's health or a stock broker may need periodic information about the stock market, etc. Communication in such cases is typically over wireless links and it becomes critical to ensure secure message exchange... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 1998, Page(s):493 - 494
    Request permission for commercial reuse | PDF file iconPDF (262 KB)
    Freely Available from IEEE
  • Dependability analysis of a cache-based RAID system via fast distributed simulation

    Publication Year: 1998, Page(s):254 - 260
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (192 KB)

    We propose a new speculation-based, distributed simulation method for dependability analysis of complex systems in which a detailed functional simulation of a system component is essential to obtain an accurate overall result. Our target example is a networked cluster with compute nodes and a single I/O node. Accurate system dependability characterization is achieved via a combination of detailed ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • AQuA: an adaptive architecture that provides dependable distributed objects

    Publication Year: 1998, Page(s):245 - 253
    Cited by:  Papers (92)  |  Patents (28)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (92 KB)

    Dependable distributed systems are difficult to build. This is particularly true if they have dependability requirements that change during the execution of an application, and are built with commercial off-the-shelf hardware. In that case, fault tolerance must be achieved using middleware software, and mechanisms must be provided to communicate the dependability requirements of a distributed appl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture for group communication in mobile systems

    Publication Year: 1998, Page(s):235 - 242
    Cited by:  Papers (18)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (172 KB)

    In mobile computing systems the network configuration changes due to node mobility. The paper identifies the issues a group communication service has to take into account in order to handle node mobility. These include the need to identify the location of a node, and the ability to cope with inaccuracies in the determination of a group membership. A multi level architecture for group communication... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evolving distributed software engineering environments

    Publication Year: 1998, Page(s):151 - 157
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1432 KB)

    For a software engineering environment to be useful for the development process, it must provide a complete set of tools to assist the software development tasks. The tools focus on separate issues of a highly integrated problem and, in general, must be capable of assisting one another in the midst of intelligently pursuing their own goals for improving the software products. These tools must addr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant Total Order Multicast to asynchronous groups

    Publication Year: 1998, Page(s):228 - 234
    Cited by:  Papers (19)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (128 KB)

    While Total Order Broadcast (or Atomic Broadcast) primitives have received a lot of attention, the paper concentrates on Total Order Multicast to Multiple Groups in the context of asynchronous distributed systems in which processes may suffer crash failures. “Multicast to Multiple Groups” means that each message is sent to a subset of the process groups composing the system, distinct m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Look-ahead traffic distribution in wormhole-routed networks

    Publication Year: 1998, Page(s):318 - 323
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (172 KB)

    A new approach, look-ahead traffic distribution, is proposed in this paper to improve the performance of adaptive routing algorithms in wormhole routed networks. In most adaptive routing algorithms, a packet changes its forwarding direction only if its requesting channel is not available, i.e., the routing decision is based only on the traffic information (buffer availability) in adjacent nodes. T... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On patterns for practical fault tolerant software in Java

    Publication Year: 1998, Page(s):144 - 150
    Cited by:  Papers (2)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (144 KB)

    Fault tolerance is important for both sequential and distributed software, and particularly so for long-running applications. The ability to stop an application and restart it, with minimal lost work, is especially useful. If components of the application can be restarted on arbitrary hosts, so much the better. In this paper, we explore Java's potential to support fault tolerant software design. W... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A randomized algorithm for distributed consensus

    Publication Year: 1998, Page(s):287 - 292
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (44 KB)

    We describe a randomized, fully distributed algorithm for distributed consensus and evaluate its performance assuming probabilistically bounded message delay. Each node randomly contacts a few other nodes and incorporates their values into its own value. All the nodes are able to reach consensus in this manner after a few rounds. The results show that the randomized algorithm is flexible, efficien... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High throughput networks for petaflops computing

    Publication Year: 1998, Page(s):312 - 317
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (72 KB)

    The smallest networks that can connect eight thousand processing elements and memory interfaces in a petaflops cryocomputer contain hundreds of thousands of 2×2 switching nodes. We have determined circuit costs, maximal throughput and average latency for feasible multistage banyan and multidimensional pruned ring mesh networks. Each can deliver 20000 single-word packets every 30 picoseconds,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ROI: an invocation mechanism for replicated objects

    Publication Year: 1998, Page(s):29 - 35
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (24 KB)

    The reliable object invocation mechanism provided by HIDRA for the coordinator-cohort and the passive replication models offers support to ensure that all the replicas of the object being invoked are correctly updated before such an invocation is terminated. This mechanism also ensures that if a primary or coordinator replica crashes, the client is able to reconnect to the previously initiated inv... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Safe and efficient active network programming

    Publication Year: 1998, Page(s):135 - 143
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (84 KB)

    Active networks are aimed at incorporating programmability into the network to achieve extensibility. One approach to obtaining extensibility is to download router programs into network nodes. This programmability is critical to allow multipoint distributed systems to adapt to network conditions and individual clients' needs. Although promising, this approach raises critical issues such as safety ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tolerating client and communication failures in distributed groupware systems

    Publication Year: 1998, Page(s):221 - 227
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (84 KB)

    If a groupware system is to be effectively used, especially over a wide area network such as the Internet, where the quality of networking and computing resources are unpredictable, it should allow clients to tolerate client, link, and server failures. In particular, clients should be able to join groups and transfer groups' current state in the presence of most client and link failures. In order ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Consensus in asynchronous systems where processes can crash and recover

    Publication Year: 1998, Page(s):280 - 286
    Cited by:  Papers (19)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (124 KB)

    The consensus problem is now well identified as being one of the most important problems encountered in the design and the construction of fault-tolerant distributed systems. This problem is defined as follows: processes have to reach a common decision, which depends on their inputs, despite failures. We consider the consensus problem in asynchronous distributed systems augmented with unreliable f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An integration of the primary-shadow TMO replication scheme with a supervisor-based network surveillance scheme and its recovery time bound analysis

    Publication Year: 1998, Page(s):168 - 176
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (256 KB)

    The time-triggered message-triggered object (TMO) scheme was formulated a few years ago (K.H. Kim et al., 1994; K.H. Kim and C. Subbaraman, 1997), as a major extension of the conventional object structuring schemes with the idealistic goal of facilitating general form design and timeliness-guaranteed design of complex real time application systems. Recently, as a new scheme for realizing TMO-struc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load balancing of dynamic and adaptive mesh-based computations

    Publication Year: 1998
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (76 KB)

    One ingredient which is viewed as vital to the successful conduct of many large-scale numerical simulations is the ability to dynamically repartition the underlying adaptive finite element mesh among the processors so that the computations are balanced and interprocessor communication is minimized. We present two new schemes for adaptive repartitioning: Locally-Matched Multilevel Scratch-Remap (or... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient algorithm for causal message logging

    Publication Year: 1998, Page(s):19 - 25
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (108 KB)

    Causal message logging has many good properties such as nonblocking message logging and no rollback propagation. However, it requires a large amount of information to be piggybacked on each message, which may incur severe performance degradation. This paper presents an efficient causal logging algorithm based on the new message log structure, LogOn, which represents the causal interprocess depende... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Checkpoint-recovery protocol for reliable mobile systems

    Publication Year: 1998, Page(s):93 - 99
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (176 KB)

    Information systems consist of mobile stations and fixed stations. Mission critical applications are required to be executed fault-tolerantly in these systems. However, mobile stations support neither enough volume of storage and processing power nor enough capacity of battery to do reliable, long-term communications. Moreover, wireless channels are less reliable. Hence, the channels with the mobi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System-level versus user-defined checkpointing

    Publication Year: 1998, Page(s):68 - 74
    Cited by:  Papers (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (56 KB)

    Checkpointing and rollback recovery is a very effective technique to tolerate transient faults and preventive shutdowns. In the past, most of the checkpointing schemes published in the literature were supposed to be transparent to the application programmer and implemented at the operating-system level. In recent years, there has been some work on higher-level forms of checkpointing. In this secon... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A metaobject protocol for fault-tolerant CORBA applications

    Publication Year: 1998, Page(s):127 - 134
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (188 KB)

    The use of metalevel architectures for the implementation of fault-tolerant systems is today very appealing. Nevertheless, all such fault-tolerant systems have used a general-purpose metaobject protocol (MOP) or are based on restricted reflective features of some object-oriented language. According to our past experience, we define in this paper a suitable metaobject protocol, called FT-MOP for bu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Survivable consensus objects

    Publication Year: 1998, Page(s):271 - 279
    Cited by:  Papers (3)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (160 KB)

    Reaching consensus among multiple processes in a distributed system is fundamental to coordinating distributed actions. We present a new approach to building survivable consensus objects in a system consisting of a (possibly large) collection of persistent object servers and a transient population of clients. Our consensus object implementation requires minimal support from servers, but at the sam... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing join index based join processing: a graph partitioning approach

    Publication Year: 1998, Page(s):302 - 308
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (112 KB)

    The cost of join computation, which uses a join index in a sequential system with limited buffer space, depends primarily on the page access sequence used to fetch the pages of the base relations. We introduce a graph partitioning model that will minimize the length of the page access sequence thus minimizing the redundant I/O, given a fixed buffer. Experiments with Sequoia 2000 data sets show tha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.