By Topic

Mass Storage Systems, 1995. 'Storage - At the Forefront of Information Infrastructures', Proceedings of the Fourteenth IEEE Symposium on

Date 11-14 Sept. 1995

Filter Results

Displaying Results 1 - 25 of 37
  • Proceedings of IEEE 14th Symposium on Mass Storage Systems

    Save to Project icon | Request Permissions | PDF file iconPDF (193 KB)  
    Freely Available from IEEE
  • Data management at CERN: current status and future trends

    Page(s): 174 - 181
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (845 KB)  

    The European Laboratory for Particle Physics (CERN) straddles the Franco-Swiss border near Geneva. The accelerator that is currently in operation, LEP, entered service in 1989 and is expected to run until the end of the current millennium. The recently approved Large Hadron Collider, or LHC, which will coexist with LEP in the existing tunnel, is scheduled to start operation in 2004. This new facility will generate many or even many tens of PB of new data per year. Even the calibration and monitoring data will be in the 100 GB/year range! (Today, we have a few hundred TB of event data and 10-100 MB of calibration data.) We describe the evolution of the CERN-developed mass storage system originally built for LEP, the impact of the IEEE MSS reference model on this evolution, and our plans for the future. We also comment on the evolution, and state of the MSS reference model itself, and on response (or lack of response) from industry to the mass storage challenge that is facing many sites today. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Save to Project icon | Request Permissions | PDF file iconPDF (49 KB)  
    Freely Available from IEEE
  • Distributed access system for uniform and scalable data and service access

    Page(s): 284 - 292
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (756 KB)  

    Computational modeling systems (CMS) are designed to resolve many of the shortcomings associated with systems currently employed in providing support for a wide range of scientific modeling applications. We identify the requirements of a “reasonable” CMS and identify the requirements of Amazonia, a CMS intended to support modeling in large-scale earth science research. Amazonia has been implemented as an open and layered architecture. In this paper we discuss the design and implementation of the distributed access system, a key component of the Amazonia Kernel that supports the organization of and access to data and services in a distributed environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for understanding large scale digital storage systems

    Page(s): 293 - 304
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1256 KB)  

    The digital revolution is now underway. The use of binary zeros and ones to store data is increasing at a steady rate. They may represent text, images, pictures, sounds, maps, books, music, instructions, programs, or just about anything else which can be represented digitally. As the sizes of the digital data holdings have continued to grow, so too has the need to provide meaningful access to this data. There are a number of efforts now underway to provide such access. In most cases the efforts have been domain specific and progress in one area has been hard to replicate in a different domain. Part of this difficulty has been the lack of a general set of concepts and vocabulary that are sufficiently broad enough to bridge the gaps. The paper presents a general taxonomy of knowledge that is independent of subject matter domain. It begins with knowledge as the most general class and then proceeds to subdivide knowledge into its constituent parts: factual knowledge, procedural knowledge, and judgmental knowledge. Definitions of each type of knowledge are given along with examples sufficient to understand each subclass. A vocabulary is introduced that provides a means to discuss the topic in a manner independent of a specific problem domain. Understanding of the differences between different types or classes of knowledge is necessary if a person or an organization is to begin to build systems that acquire, organize, store, and retrieve various types of knowledge. The paper concludes with a discussion of some tools that are currently available to assist in the building and maintaining of a knowledge resource View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Client/server data serving for high-performance computing

    Page(s): 107 - 119
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1176 KB)  

    This paper will attempt to examine the industry requirements for shared network data storage and sustained highspeed (tens to thousands of megabytes per second) network data serving via the NFS and FTP protocol suite. It will discuss the current structural and architectural impediments to achieving these sorts of data rates cost-effectively on many general-purpose servers, and will describe an architecture and resulting product family that addresses these problems. We will show the sustained-performance levels that were achieved in the lab and discuss early customer experiences utilizing both the HIPPI-IP and ATM OC3-IP network interfaces View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Database systems for efficient access to tertiary memory

    Page(s): 120 - 126
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (608 KB)  

    Tertiary storage devices have long been in use for storing massive amounts of data in file-oriented mass storage systems. However, their use in database systems is relatively new. Database systems associate more structure to the data than just raw sequence of bytes. Hence, if they are allowed control of the tertiary memory devices, they can greatly reduce access cost by performing informed caching, query optimization, and query scheduling. However, most conventional database systems are designed for data stored on magnetic disks. Accesses to tertiary storage devices are slow and nonuniform compared to secondary storage devices. Therefore, inclusion of tertiary memory as an active part of the storage hierarchy requires a rethinking of conventional query processing techniques. In this project, our aim is to design a database system that can use its knowledge of the data layout on storage devices to increase the speed of running queries View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Caching and migration for multilevel persistent object stores

    Page(s): 127 - 135
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (760 KB)  

    We propose an architecture for scalable persistent object managers that provide access to large numbers of objects distributed over a variety of physical media. Our approach is lightweight in that we are interested in providing direct support for the creation, access, and updating of persistent objects, but only indirect support for the other functions traditionally associated with an object oriented database, such as transactions, back up, recovery, or a query language. This design allows application programmers access to the productivity and performance of using objects, while relying on an underlying hierarchical storage system to manage the large amounts of data. Our design is layered and multilevel in that it caches and migrates large-grained physical collections of objects called folios from tape to networked disks. Separately, it also caches and migrates smaller-grained physical collections of objects called segments between nodes on a network. Segments are then moved into memory as usual for persistent object managers. In this paper, we also describe the implementation of a system called PTool based upon this design and give a description of preliminary performance results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Changing horses in mid-stream (or, how do you follow one of the most successful acts in mass storage?)

    Page(s): 2 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB)  

    The European Centre for Medium-Range Weather Forecasts (ECMWF) has for many years operated a large mass-storage system based on CFS/DataTree. The CFS system will need replacement before the end of the decade, and ECMWF accordingly embarked on a study of available systems which might be suitable. What we found was that, for various reasons, none of the systems available on the market met our needs. Three particular areas stood out in which commercial systems either could not meet our end-of-decade requirement or in which the existing CFS system was already superior. Over the past two years, during our active pursuit of a system that might satisfactorily follow in the footsteps of CFS, we have learned in fact just how good the existing system was. Our initial assumptions that the market would provide, and clearly indicate, an appropriate successor, were quickly proven over-optimistic and we were forced to make the necessary plans to extend the life of CFS. By making common cause with a number of ether sites with a similar background, we were able to indicate more forcibly to the suppliers the direction in which our dissatisfaction lay; although the time scales of the intended installation meant that we were unable to await the delivery of systems with superior specifications, it also meant that we went into a tendering exercise with the knowledge that we might well have to install a noncompliant system that the supplier would undertake to enhance. By the time of the Monterey Symposium, the selected successor system will be only weeks away from acceptance; and we shall be able to present the decisions and plans that have led to the selection of that solution. The remaining tasks will be to commission the system, develop and integrate the necessary services, and to phase over operations from the old CFS system to the new one, which includes the copying of up to 60 terabytes of old data View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and implementation of a network-wide concurrent file system in a workstation cluster

    Page(s): 239 - 245
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (500 KB)  

    We estimate the performance of a network-wide concurrent file system implemented using conventional disks as disk arrays. Tests were carried out on both single system and network-wide environments. On single systems, a file was split across several disks to test the performance of file I/O operations. We concluded that performance was proportional to the number of disks, up to four, on a system with high computing power. Performance of a system with low computing power, however, did not increase, even with more than two disks. When we split a file across disks in a network-wide system called the Network-wide Concurrent File System (N-CFS), we found performance similar to or slightly higher than that of disk arrays on single systems. Since file access through N-CFS is transparent, this system enables traditional disks on single and networked systems to be used as disk arrays for I/O intensive jobs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of a campus-wide distributed mass storage service: the dream vs. reality

    Page(s): 190 - 199
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (988 KB)  

    In 1990, a technical team at NASA Lewis Research Center, Cleveland, Ohio, began defining a mass storage service to provide long-term archival storage, short-term storage for very large files, distributed NFS access, and backup services for critical data that resides on workstations and PCs. Because of software availability and budgets, the total service was phased in over three years. During the process of building the service from the commercial technologies available, our mass storage team refined the original vision and learned from the problems and mistakes that had occurred. We also enhanced some technologies to better meet the needs of users and system administrators. This paper describes our team's journey from dream to reality, outlines some of the problem areas that still exist, and suggests some solutions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing a shared file system on a HIPPI disk array

    Page(s): 77 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1072 KB)  

    Shared file systems which use a physically shared mass storage device have existed for many years, although not on UNIX based operating systems. This paper describes a shared file system (SFS) that was implemented first as a special project on the Gray Research Inc. (CRI) UNICOS operating system. A more general product was then built on top of this project using a HIPPI disk array for the shared mass storage. The design of SFS is outlined, as well as some performance experiences with the product. We describe how SFS interacts with the OSF distributed file service (DFS) and with the CRI data migration facility (DMF). We also describe possible development directions for the SFS product View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IMACTS: an interactive, multiterabyte image archive

    Page(s): 146 - 161
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1728 KB)  

    Efficient softcopy access to future intelligence imagery archives will require novel applications of a wide range of imaging technologies for mass storage, communications, pixel management, and database access. Many of the necessary hardware technologies to realize massive all-softcopy archives are just now emerging. Examples include hierarchical mass storage systems capable of storing multipetabytes of imagery, high-speed communication networks with bandwidth scalable to gigabits/second, high-performance servers and clients with intelligent image caching strategies and embedded image compression, and powerful image and metadata query, browse, and access methods. We describe a testbed designed to investigate a multiterabyte “archive-to-the-desktop” all-softcopy environment for remote sensing imagery. This paper presents an overview of the IMACTS system including the functional architecture, data caching model, concept of operations, and software architecture. It focuses on the key issues, challenges, and the solutions developed for each of the IMACTS technologies including hierarchical mass storage, image communications, pixel management, object management, image access, and performance monitoring View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data retrieval from climate model archives

    Page(s): 258 - 262
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (540 KB)  

    Starting from an accumulated amount of climate model data of 7 TByte at the end of 1994, a magnitude of 60 TByte is expected at the end of 1996. There is probably no physical problem in storing the data on available sequential mass storage devices. The problem is the organization of the data mining View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multikey index support for tuple sets on parallel mass storage systems

    Page(s): 136 - 145
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (700 KB)  

    The development and evaluation of a tuple set manager (TSM) based on multikey index data structures is a main part of the PARABASE project at the University of Vienna. The TSM provides access to parallel mass storage systems using tuple sets instead of conventional files as the central data structure for application programs. A proof-of-concept prototype TSM is already implemented and operational on an iPSC/2. It supports tuple insert and delete operations as well as exact match, partial match, and range queries at system call level. Available results are from this prototype on the one hand and from various performance evaluation figures. The evaluation results demonstrate the performance gain achieved by the implementation of the tuple set management concept on a parallel mass storage system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Storage systems for movies-on-demand video servers

    Page(s): 246 - 256
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (976 KB)  

    We evaluate storage system alternatives for movies-on-demand video servers. We begin by characterizing the movies-on-demand workload. We briefly discuss performance in disk arrays. First, we study disk farms in which one movie is stored per disk. This is a simple scheme, but it wastes substantial disk bandwidth, because disks holding less popular movies are underutilized; also, good performance requires that movies be replicated to reflect the user request pattern. Next, we examine disk farms in which movies are striped across disks, and find that striped video servers offer nearly full utilization of the disks by achieving better load balancing. For the remainder of the paper, we concentrate on tertiary storage systems. We evaluate the use of storage hierarchies for video service. These hierarchies include a tertiary library along with a disk farm. We examine both magnetic tape libraries and optical disk jukeboxes. We show that, unfortunately, the performance of neither tertiary system performs adequately as part of a storage hierarchy to service the predicted distribution of movie accesses. We suggest changes to tertiary libraries that would make them better-suited to these applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Data Management Applications Programming Interface

    Page(s): 327 - 335
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (748 KB)  

    Many data management applications such as hierarchical storage management, file backup, and others often require special hooks in the operating system to function. Most third-party vendors of these applications have had no choice but to modify the OS kernel on each platform they wish to support. To address this problem, the Data Management Interfaces Group (DMIG), a collection of OS, filesystem, and third-party application vendors, has been working to develop the Data Management Applications Programming Interface (DMAPI). The DMAPI is designed to enhance the portability of applications across different platforms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Managing and serving a multiterabyte data set at the Fermilab DØ experiment

    Page(s): 200 - 208
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (812 KB)  

    The DØ experiment at Fermilab is accumulating data from the electronic detection of collisions between protons and anti-protons. The presentation describes the data structure, data cataloging and serving of the multiterabyte data set to a user community. The current data consists of over 85 terabytes stored in a hierarchy of data sets with various latencies and frequencies of use. The primary data storage is on some 40,000 8-mm tapes while the most frequently used data is on nearly 300 Gigabytes of SCSI disks. Data is served to VMS and UNIX analysis clusters over an FDDI network from a centralized file server. We also describe plans for handling a future data set anticipated to be an order of magnitude larger. Some of the ideas being considered are alternative data structures, parallel disk access, automated tape libraries, and centralized analysis servers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An architecture for a scalable, high-performance digital library

    Page(s): 89 - 98
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (916 KB)  

    Requirements for a high-performance, scalable digital library of multimedia data are presented together with a layered architecture for a system that addresses the requirements. The approach is to view digital data as persistent collections of complex objects and to use lightweight object management to manage this data. To scale as the amount of data increases, the object management component is layered over a storage management component. The storage management component supports hierarchical storage, third-party data transfer and parallel input-output. Several issues that arise from the interface between the storage management and object management components are discussed. The authors have developed a prototype of a digital library using this design. Two key components of the prototype are AIM Net and HPSS. AIM Net is a persistent object manager and is a product of Oak Park Research. HPSS is the High Performance Storage System, developed by a collaboration including IBM Government Systems and several national labs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An approximate performance model of a Unitree mass storage system

    Page(s): 210 - 224
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1172 KB)  

    Mass storage systems are finding greater use in scientific computing research environments for retrieving and archiving the large volumes of data generated and manipulated by scientific computation. The paper presents a queuing network model that can be used to carry out capacity planning studies of the Unitree mass storage system. Measurements taken on an existing system and a detailed workload characterization provided the workload intensity and resource demand parameters for the various types of read and write requests. The performance model developed here is based on approximations to multi-class mean value analysis of queuing networks. The approximations were validated through the use of discrete event simulation. The resulting baseline model was used to predict the performance of the system as the workload intensity increases View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The IEEE Storage System Standards Working Group overview and status

    Page(s): 306 - 311
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (488 KB)  

    A brief description is presented of the IEEE Reference Model for Open Storage Systems Interconnection (OSSI), approved for public review by vote of the IEEE Storage System Standards Working Group (SSSWG, Project 1244) in September 1994. Minor differences between the OSSI Model and the Mass Storage System Reference Model (MSSRM) Version 5 introduced in April 1993 at the Twelfth IEEE Symposium on Mass Storage Systems are summarized. The status of the SSSWG and its plans for production of standards are discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Silicon microstructures and microactuators for compact computer disk drives

    Page(s): 350 - 356
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (680 KB)  

    It is projected that in another five years, the industry will be capable of delivering credit-card size gigabyte disk drive cartridges at about 10 cents per megabyte. At UCLA and Caltech, we believe silicon micromachining technology will play an important role in the fabrication of high-bandwidth, servo-controlled miniaturized microelectromechanical components for such super-high-capacity, super-compact computer disk drives. For the past four years, we have been collaborating on a number of industry and government supported joint research projects to develop the necessary technology building blocks for design of a low-cost integrated drive of the future. These efforts include the design and fabrication of a silicon read/write head, microgimbaled with integrated electrical and mechanical interconnects, which targets the next-generation, 30 percent form factor pico-sliders. The efforts also include an electromagnetic piggyback planar microactuator for super-high-track-density applications. Both efforts utilize state-of-the-art silicon micromachining fabrication techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Physical volume library deadlock avoidance in a striped media environment

    Page(s): 54 - 64
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (968 KB)  

    Most modern high performance storage systems store data in large repositories of removable media volumes. Management of the removable volumes is performed by a software module known as a physical volume library (PVL). To meet performance and scalability requirements, a PVL can be asked to mount multiple removable media volumes for use by a single client for parallel data transfer. Mounting sets of volumes creates an environment in which it is possible for multiple client requests to deadlock while attempting to gain access to storage resources. Scenarios leading to deadlock in a PVL include multiple client requests that contend for the same cartridge(s), and client requests that vie for a limited set of drive resources. These deadlock scenarios are further complicated by the potential for volumes to be mounted out-of-order (for example, by automatic cartridge loaders or human operators). This paper begins by introducing those PVL requirements which create the possibility of deadlock resolution and how they might be applied in a PVL. This leads to a design for a PVL that addresses deadlock scenarios. Following the design presentation is a discussion of possible design enhancements. We end with a case study of an actual implementation of the PVL design in the high performance storage system (HPSS) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A knowledge-based system approach for scientific data analysis and the notion of metadata

    Page(s): 274 - 283
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (868 KB)  

    Over the last few years, dramatic increases and advances in mass storage for both secondary and tertiary storage made possible the handling of big amounts of data (for example, satellite data, complex scientific experiments, and so on). However, to the full use of these advances, metadata for data analysis and interpretation, as well as the complexity of managing and accessing large datasets through intelligent and efficient methods, are still considered to be the main challenges to the information-science community when dealing with large databases. Scientific data must be analyzed and interpreted by metadata, which has a descriptive role for the underlying data. Metadata can be, partly, a priori definable according to the domain of discourse under consideration (for example, atmospheric chemistry) and the conceptualization of the information system to be built. It may also be extracted by using learning methods from time-series measurement and observation data. In this paper, a knowledge-based management system (KBMS) is presented for the extraction and management of metadata in order to bridge the gap between data and information. The KBMS is a component of an intelligent information system based upon a federated architecture, also including a database management system for time-series-oriented data and a visualization system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scientific data management in the Environmental Molecular Sciences Laboratory

    Page(s): 162 - 172
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (852 KB)  

    The Environmental Molecular Sciences Laboratory (EMSL) is currently under construction at Pacific Northwest Laboratory (PNL) for the US Department of Energy (DOE). This laboratory will be used for molecular and environmental sciences research to identify comprehensive solutions to DOE's environmental problems. Major facilities within the EMSL include the Molecular Sciences Computing Facility (MSCF), a laser-surface dynamics laboratory, a high-field nuclear magnetic resonance (NMR) laboratory, and a mass spectrometry laboratory. The EMSL is scheduled to open early in 1997 and will house about 260 resident and visiting scientists. It is anticipated that at least six (6) terabytes of data will be archived in the first year of operation. Both the size of individual datasets and the total amount of data each researcher will manage is expected to become unwieldy and overwhelming for researchers and archive administrators. An object-oriented database management system (OODBMS) and a mass storage system will be integrated to provide an intelligent, automated mechanism to manage data. The resulting system, called the DataBase Computer System (DBCS), will provide total scientific data management capabilities to EMSL users. This paper describes all efforts associated with DBCS-0 and DBCS-1, including software development, key lessons learned, and long-term goals View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.