By Topic

Hot Topics in Operating Systems, 2001. Proceedings of the Eighth Workshop on

Date 20-22 May 2001

Filter Results

Displaying Results 1 - 25 of 56
  • Proceedings Eighth Workshop on Hot Topics in Operating Systems

    Publication Year: 2001
    Save to Project icon | Request Permissions | PDF file iconPDF (257 KB)  
    Freely Available from IEEE
  • Notes from the HotOS-VIII workshop hot topics in operating systems: 21-23 May 2001, Schloss Elmau, Germany

    Publication Year: 2001 , Page(s): xv - xxx
    Save to Project icon | Request Permissions | PDF file iconPDF (1726 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Author index

    Publication Year: 2001 , Page(s): 191 - 192
    Save to Project icon | Request Permissions | PDF file iconPDF (118 KB)  
    Freely Available from IEEE
  • Using abstraction to improve fault tolerance

    Publication Year: 2001 , Page(s): 27 - 32
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (693 KB) |  | HTML iconHTML  

    Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. The paper describes a replication technique, BFTA, which uses abstraction to reduce the cost of Byzantine fault tolerance and to improve its ability to mask software errors. BFTA reduces cost because it enables reuse of off-the-shelf service implementations. It improves availability because each replica can be repaired periodically using an abstract view of the state stored by correct replicas, and because each replica can run distinct or non-deterministic service implementations, which reduces the probability of common mode failures. We built an NFS service that allows each replica to run a different operating system. This example suggests that BFTA can be used in practice; the replicated file system required only a modest amount of new code, and preliminary performance results indicate that it performs comparably to the off-the-shelf implementations that it wraps. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fail-stutter fault tolerance

    Publication Year: 2001 , Page(s): 33 - 38
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (754 KB) |  | HTML iconHTML  

    Traditional fault models present system designers with two extremes: the Byzantine fault model, which is general and therefore difficult to apply, and the fail-stop fault model, which is easier to employ but does not accurately capture modern device behavior To address this gap, we introduce the concept of fail-stutter fault tolerance, a realistic and yet tractable fault model that accounts for both absolute failure and a new range of performance failures common in modern components. Systems built under the fail-stutter model will likely perform well, be highly reliable and available, and be easier to manage when deployed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Secure OS extensibility needn't cost an arm and a leg

    Publication Year: 2001
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    This paper makes the claim that secure extensibility of operating systems is not only desirable but also achievable. We claim that OS extensibility should be done at user-level to avoid the security problems inherent in other approaches. We furthermore claim (backed up by some initial results) that user-level extensibility is possible at a performance that is similar to in-kernel extensions. Finally, user-level extensions allow the use of modern software engineering techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mansion, a distributed multi-agent system

    Publication Year: 2001
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (206 KB)  

    In this paper we present work in progress on a worldwide, scalable multi-agent system, based on a paradigm of hyperlinked rooms. The framework offers facilities for managing distribution, security and mobility aspects for both active elements (agents) and passive elements (objects) in the system. Our framework offers separation of logical concepts from physical representation, distribution support, mobility support, and a security architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Position summary: hinting for goodness' sake

    Publication Year: 2001
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (223 KB)  

    Modern operating systems and adaptive applications offer an overwhelming number of parameters affecting application latency, throughput, image resolution, audio quality, and so on. We are designing a system to automatically tune resource allocation and application parameters at runtime, with the aim of maximizing user happiness or goodness. Consider a 3D graphics application that operates at variable resolution, trading output fidelity for processor time. Simultaneously, a data mining application adapts to network and processor load by migrating computation between the client and storage node. We must allocate resources between these applications and select their adaptive parameters to meet the user's overall goals. Since the user lacks the time and expertise to translate his preferences into parameter values, we would like the system to do this. Existing systems lack the right abstractions for applications to expose information for automated parameter tuning. Goodness hints are the solution to this problem. Applications use these hints to tell the operating system how resource allocations will affect their goodness (utility). Goodness hints are used by the operating system to make resource allocation decisions and by applications to tune their adaptive parameters. Our contribution is a decomposition of goodness hints into manageable and independent pieces and a methodology to automatically generate them. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • eOS - the dawn of the resource economy

    Publication Year: 2001
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (227 KB)  

    Summary form only given. We believe that achieving the benefits of a resource economy, which supports the execution of services wherever and whenever is most convenient, cost-effective, and trustworthy, represents the next big computer systems research opportunity. That is, the emphasis in the operating research community should move away from extracting a few more percentage points of speed from individual computing resources, and focus instead on how to size, provision, and manage those resources to serve the needs of a rapidly diversifying set of services. HP Laboratories are embarking on a major endeavor to pursue this, and are actively seeking research partners to collaborate with us. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Smart Messages: a system architecture for large networks of embedded systems

    Publication Year: 2001
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB)  

    We propose a system architecture and a computing model, based on Smart Messages (SMs), for computation and communication in large networks of embedded systems. In this model, communication is realized by sending SMs in the network. These messages are comprised of code, which is executed at each hop in the path of the message, and data which the message carries in the network. The execution at each hop determines the next hop in the message's path - SMs are responsible for their own routing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Virtualization considered harmful: OS design directions for well-conditioned services

    Publication Year: 2001 , Page(s): 139 - 144
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (692 KB) |  | HTML iconHTML  

    We argue that existing OS designs are ill-suited for the needs of Internet service applications. These applications demand massive concurrency (supporting a large number of requests per second) and must be well-conditioned to load (avoiding degradation of performance and predictability when demand exceeds capacity). The transparency and virtualization provided by existing operating systems leads to limited concurrency and lack of control over resource usage. We claim that Internet services would be far better supported by operating systems by reconsidering the role of resource virtualization. We propose a new design for server applications, the staged event-driven architecture (SEDA). In SEDA, applications are constructed as a set of event driven stages separated by queues. We present the SEDA architecture and its consequences for operating system design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Position summary: applying the VVM kernel to flexible Web caches

    Publication Year: 2001
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (230 KB) |  | HTML iconHTML  

    The VVM (virtual virtual machine) is a systematic approach to adaptability and reconfigurability for portable, object-oriented applications based on byte-coded languages such as Java and Smalltalk. The main objectives of the VVM are (i) to allow adaptation of language and system according to a particular application domain; (ii) to provide extensibility by allowing a live execution environment to evolve according to new protocols or language standards; and (iii) to provide a common substrate on which to achieve true interoperability between different languages. On the way to implement a VVM we have already implemented VVM1 (and its application to active networks) and VVM2 (and its application to flexible Web cache and distributed observation). The VVM2 is a highly-flexible language kernel which consists of a minimal, complete programming language in which the most important goal is to maximise the amount of reflective access and intercession that are possible, at the lowest possible software level. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards global storage management and data placement

    Publication Year: 2001
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (231 KB)  

    As users and companies increasingly depend on shared, networked information services, we continue to see growth in data centers and service providers. This happens as services and servers are consolidated (for ease of management and reduced duplication), while also being distributed (for fault tolerance and to accommodate the global reach of customers). Since access to data is the lifeblood of any organization, a global storage system is a core element in such an infrastructure. Based on success in automatically managing local storage, we believe that the key attribute of such a system is the ability to flexibly adapt to a variety of application semantics and requirements as they arise and as they change over time. Our work has shown that it is possible to automatically design and configure a storage system of one or more disk arrays to meet a set of application requirements and to dynamically reconfigure as needs change, all without human intervention. Work on global data placement expands the scope of this system to a world of distributed data centers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Position summary: toward a rigorous data type model for HTTP

    Publication Year: 2001
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (225 KB)  

    The HTTP protocol depends on a structure of several data types, such as messages and resources. The current ad hoc data type model has served to support a huge variety of HTTP-based applications, but its weaknesses have been exposed in attempts to formalize and (especially) to extend the protocol. These weaknesses particularly affect the semantics of caching within the HTTP distributed system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Don't trust your file server

    Publication Year: 2001 , Page(s): 113 - 118
    Cited by:  Papers (10)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (645 KB) |  | HTML iconHTML  

    All too often, decisions about whom to trust in computer systems are driven by the needs of system management rather than data security. In particular data storage is often entrusted to people who have no role in creating or using the data-through outsourcing of data management, hiring of outside consultants to administer servers, or even collocation servers in physically insecure machine rooms to gain better network, connectivity. This paper outlines the design of SUNDR, a network file system designed to run on untrusted servers. SUNDR servers can safely be managed by people who have no permission to read or write data stored in the file system. Thus, people can base their trust decisions on who needs to use data and their administrative decisions on how best to manage the data. Moreover, with SUNDR, attackers will no longer be able to wreak havoc by compromising servers and tampering with data. They will need to compromise clients while legitimate users are logged on. Since clients do not need to accept incoming network connections, they can more easily be firewalled and protected from compromise than servers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Authentication confidences

    Publication Year: 2001
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (214 KB) |  | HTML iconHTML  

    "Over the Internet, no one knows you're a dog," goes the joke. Yet, in most systems, a password submitted over the Internet gives one the same access rights as one typed at the physical console. We promote an alternate approach to authentication, in which a system fuses observations about a user into a probability (an authentication confidence) that the user is who they claim to be. Relevant observations include password correctness, physical location, activity patterns, and biometric readings. Authentication confidences refine current yes-or-no authentication decisions, allowing systems to cleanly provide partial access rights to authenticated users whose identities are suspect. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A backup appliance composed of high-capacity disk drives

    Publication Year: 2001
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (215 KB)  

    Disk drives are now available with capacity and price per capacity comparable to nearline tape systems. Because disks have superior performance, density and maintainability characteristics, it seems likely that they will soon overtake tapes as the backup medium of choice. The authors outline the potential advantages of a backup system composed of high-capacity disk drives and describe what implications such a system would have for backup software. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Protium, an infrastructure for partitioned applications

    Publication Year: 2001 , Page(s): 47 - 52
    Cited by:  Papers (3)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (765 KB) |  | HTML iconHTML  

    Remote access feels different from local access. The major issues are consistency (machines vary in GUIs, applications, and devices) and responsiveness (the user must wait for network and server delays), Protium attacks these by partitioning programs into local viewers that connect to remote services using application-specific protocols. Partitioning allows viewers to be customized to adapt to local features and limitations. Services are responsible for maintaining long-term state. Viewers manage the user interface and use state to reduce communication between viewer and service, reducing latency whenever possible. System infrastructure sits between the viewer and service, supporting replication, consistency, session management, and multiple simultaneous viewers. The prototype system includes an editor, a draw program, a PDF viewer, a map database, a music jukebox, and windowing system support. It runs on servers, workstations, PCs, and PDAs under Plan 9, Linux, and Windows; services and viewers have been written in C, Java, and Concurrent ML. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Supporting coordinated adaptation in networked systems

    Publication Year: 2001
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (218 KB) |  | HTML iconHTML  

    Summary form only given. Our position is that the true potential of adaptation can only be realized if support is provided for more general solutions, including adaptations that span multiple hosts and multiple system components, and algorithmic adaptations that involve changing the underlying algorithms used by the system at runtime. Such a general solution must, however, address the difficult issues related to these types of adaptations. Adaptation by multiple related components, for example, must be coordinated so that these adaptations work together to implement consistent adaptation policies. Likewise, large-scale algorithmic adaptations need to be coordinated using graceful adaptation strategies in which as much normal processing as possible continues during the changeover. Here, we summarize our approach to addressing these problems in Cactus, a system for constructing highly-configurable distributed services and protocols. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recursive restartability: turning the reboot sledgehammer into a scalpel

    Publication Year: 2001 , Page(s): 125 - 130
    Cited by:  Papers (25)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (727 KB) |  | HTML iconHTML  

    Even after decades of software engineering research, complex computer systems still fail, primarily due to nondeterministic bugs that are typically resolved by rebooting. Conceding that Heisenbugs will remain a fact of life, we propose a systematic investigation of restarts as "high availability medicine." In this paper we show how recursive restartability (RR) - the ability of a system to gracefully tolerate restarts at multiple levels improves fault tolerance, reduces time-to-repair and enables system designers to build flexible, highly available software infrastructures. Using several examples of widely deployed software systems, we identify properties that are required of RR systems and outline an agenda for turning the recursive restartability philosophy into a practical software structuring tool. Finally, we describe infrastructural support for RR systems, along with initial ideas on how to analyze and benchmark such systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Position summary: separating mobility from mobile agents

    Publication Year: 2001
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB)  

    The reasons for using mobile agents are well-known: moving computation to data to avoid transferring large amounts of data; supporting disconnected operation by, for example, moving a computation to a network that has better connectivity; supporting autonomous distributed computation by, for example, deploying a personalized filter near a real-time data source. Many mobile agent systems have been constructed and are in the public domain. But, despite these well-known advantages and widely available software, mobile agents are not yet being used as a common programming abstraction. We have been working since 1993, under the name of TACOMA, on operating system support and application of mobile agents. We have addressed issues including fault-tolerance, security, efficiency, and runtime structures and services. We have built a series of mobile agent middleware systems and evaluated them by building realistic and deployed applications. We have found that mobile agents are especially useful for large-scale systems configuration and deployment, system and service extensibility, and distributed application self-management. The programming model TACOMA supports has changed over these years to reflect our experience with writing real applications. Like other mobile agent systems, TACOMA started with a programming model that resembled the characterization given above of mobile agents being processes with explicit control over where they execute. We call this the traditional model of mobile agents. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Supporting hot-swappable components for system software

    Publication Year: 2001
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (209 KB)  

    Summary form only given. A hot-swappable component is one that can be replaced with a new or different implementation while the system is running and actively using the component. For example, a component of a TCP/IP protocol stack, when hot-swappable, can be replaced (perhaps to handle new denial-of-service attacks or improve performance), without disturbing existing network connections. The capability to swap components offers a number of potential advantages such as: online upgrades for high availability systems, improved performance due to dynamic adaptability and simplified software structures by allowing distinct policy and implementation options to be implemented in separate components (rather than as a single monolithic component) and dynamically swapped as needed. In order to hot-swap a component, it is necessary to (i) instantiate a replacement component; (ii) establish a quiescent state in which the component is temporarily idle; (iii) transfer state from the old component to the new component; (iv) swap the new component for the old; and (v) deallocate the old component. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Aspect-oriented system structure

    Publication Year: 2001
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (222 KB) |  | HTML iconHTML  

    Operating system structure is important; it leads to understandable, maintainable, 'pluggable' code. But despite our best efforts, some system elements have been difficult to structure. We propose a new analysis of this problem, and a new technology that can structure these elements. Aspect-oriented programming (AOP) (G. Kiczales et al., 1997) uses linguistic mechanisms to support the separation of crosscutting elements, or aspects of the system, from primary functionality. We have developed a proof-of-concept AOP implementation of prefetching in FreeBSD (www.cs.ubc.ca/labs/spl/aspects/aspectc.html). In our implementation, we have been able to modularize prefetching. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robustness in complex systems

    Publication Year: 2001 , Page(s): 21 - 26
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (611 KB) |  | HTML iconHTML  

    The paper argues that a common design paradigm for systems is fundamentally flawed, resulting in unstable, unpredictable behavior as the complexity of the system grows. In this flawed paradigm, designers carefully attempt to predict the operating environment and failure modes of the system in order to design its basic operational mechanisms. However, as a system grows in complexity, the diffuse coupling between the components in the system inevitably leads to the butterfly effect, in which small perturbations can result in large changes in behavior We explore this in the context of distributed data structures, a scalable, cluster-based storage server We then consider a number of design techniques that help a system to be robust in the face of the unexpected, including overprovisioning, admission control, introspection, adaptivity through closed control loops. Ultimately, however, all complex systems eventually must contend with the unpredictable. Because of this, we believe systems should be designed to cope with failure gracefully. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The case for resilient overlay networks

    Publication Year: 2001 , Page(s): 152 - 157
    Cited by:  Papers (199)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (668 KB) |  | HTML iconHTML  

    This paper makes the case for Resilient Overlay Networks (RONs), an application-level routing and packet forwarding service that gives end-hosts and applications the ability to take advantage of network paths that traditional Internet routing cannot make use of, thereby improving their end-to-end reliability and performance. Using RON, nodes participating in a distributed Internet application configure themselves into an overlay network and cooperatively forward packets for each other. Each RON node monitors the quality of the links in the underlying Internet and propagates this information to the other nodes; this enables a RON to detect and react to path failures within several seconds rather than several minutes, and allows it to select application-specific paths based on performance. We argue that RON has the potential to substantially improve the resilience of distributed Internet applications to path outages and sustained overload. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.