By Topic

Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on

Date 24-27 Sept. 2007

Filter Results

Displaying Results 1 - 25 of 36
  • 24th IEEE Conference on Mass Storage Systems and Technologies-Title

    Page(s): i - iii
    Save to Project icon | Request Permissions | PDF file iconPDF (37 KB)  
    Freely Available from IEEE
  • 24th IEEE Conference on Mass Storage Systems and Technologies-Copyright

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (43 KB)  
    Freely Available from IEEE
  • 24th IEEE Conference on Mass Storage Systems and Technologies - TOC

    Page(s): v - vii
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Freely Available from IEEE
  • Message from the Chairs

    Page(s): viii
    Save to Project icon | Request Permissions | PDF file iconPDF (32 KB)  
    Freely Available from IEEE
  • Conference and Program Committees

    Page(s): ix
    Save to Project icon | Request Permissions | PDF file iconPDF (31 KB)  
    Freely Available from IEEE
  • Preservation DataStores: Architecture for Preservation Aware Storage

    Page(s): 3 - 15
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (381 KB) |  | HTML iconHTML  

    The volumes of digital information are growing continuously and most of today's information is "born digital". Alongside this trend, business, scientific, artistic and cultural needs require much of this information to be kept for decades, centuries or longer. The convergence of these two trends implies the need for storage systems that support very long term preservation for digital information. We describe Preservation DataStores, a novel storage architecture to support digital preservation. It is a layered architecture that builds upon open standards, along with the OAIS, XAM and OSD standards. This new architecture transforms the logical information-object, a basic concept in preservation systems, into a physical storage object. The transformation allows more robust and optimized implementations for preservation aware storage. The architecture of Preservation DataStores is being developed as an infrastructure component of the CASPAR project and will be tested in the context of this project using scientific, cultural, and artistic data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GreenStor: Application-Aided Energy-Efficient Storage

    Page(s): 16 - 29
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (421 KB) |  | HTML iconHTML  

    The volume of online data content has shown an unprecedented growth in recent years. Fueling this growth are new federal regulations which warrant longer data retention and a general increase in the richness of data content. To cope with this growth, high performance computing and enterprise environments are making use of large disk-based solutions that consume power all the time, unlike tape-based solutions. As a consequence, the energy consumption of the storage solutions has grown significantly. In this work we propose a storage solution called GreenStor, which makes use of application hinting on top of massive arrays of idle disks (MAID) to improve energy efficiency. GreenStor is centered on MAID, but with more efficient data movement to aid in energy conservation. Specifically, we propose an extent-based metadata manager that achieves better space efficiency without sacrificing cache utilization and an opportunistic scheduling scheme that helps provide better use of application hints in a MAID system. Results show that our proposed opportunistic scheme for application hint scheduling consumes up to 40% less energy compared to traditional non-MAID storage solutions, whereas use of standard schemes for scheduling application hints on typical MAID systems is only able to achieve a smaller energy savings of about 25% versus non-MAID storage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling the Impact of Checkpoints on Next-Generation Systems

    Page(s): 30 - 46
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (300 KB) |  | HTML iconHTML  

    The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands of processors. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a solution that scales to next-generation systems. We demonstrate this by using mathematical modeling to compute a lower bound of the impact of these approaches on the performance of applications executed on three massive-scale, in-production, DOE systems and a theoretical petaflop system. We also adapt the model to investigate a proposed optimization that makes use of "lightweight" storage architectures and overlay networks to overcome the storage system bottleneck. Our results indicate that (1) as we approach the scale of next-generation systems, traditional checkpoint/restart approaches will increasingly impact application performance, accounting for over 50% of total application execution time; (2) although our alternative approach improves performance, it has limitations of its own; and (3) there is a critical need for new approaches to fault tolerance that allow continuous computing with minimal impact on application scalability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations

    Page(s): 47 - 59
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (963 KB) |  | HTML iconHTML  

    Storage management is one of the most important enabling technologies for large-scale scientific investigations. Having to deal with multiple heterogeneous storage and file systems is one of the major bottlenecks in managing, replicating, and accessing files in distributed environments. Storage resource managers (SRMs), named after their Web services control protocol, provide the technology needed to manage the rapidly growing distributed data volumes, as a result of faster and larger computational facilities. SRMs are grid storage services providing interfaces to storage resources, as well as advanced functionality such as dynamic space allocation and file management on shared storage systems. They call on transport services to bring files into their space transparently and provide effective sharing of files. SRMs are based on a common specification that emerged over time and evolved into an international collaboration. This approach of an open specification that can be used by various institutions to adapt to their own storage systems has proven to be a remarkable success - the challenge has been to provide a consistent homogeneous interface to the grid, while allowing sites to have diverse infrastructures. In particular, supporting optional features while preserving interoperability is one of the main challenges we describe in this paper. We also describe using SRM in a large international high energy physics collaboration, called WLCG, to prepare to handle the large volume of data expected when the Large Hadron Collider (LHC) goes online at CERN. This intense collaboration led to refinements and additional functionality in the SRM specification, and the development of multiple interoperating implementations of SRM for various complex multi- component storage systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Grid-Enabled Standards-based Data Management

    Page(s): 60 - 71
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1099 KB) |  | HTML iconHTML  

    The world's largest scientific machine - the large hadron collider (LHC), situated outside Geneva, Switzerland - will generate some 15PB of data at rates up to 1.5 GB/s (in the case of the heavy-ion experiment, ALICE) to tape per year of operation. The processing of this data will be performed using a world-wide grid, the (worldwide) LHC computing grid built on top of the enabled grid for e-science and open science grid infrastructures. The LHC computing grid, which has offered a service for over two years now, is based upon a tier model comprising some 150 sites in tens of countries. In this paper, we describe the data management middleware stack - one of the key services provided by data grids. We give an overview of the different services implemented, a disk-based storage system which can support encryption, tools to manage the storage system and access files, the LCG file catalogue, and the file transfer service. We also review the relationship between these services. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Quota enforcement for high-performance distributed storage systems

    Page(s): 72 - 86
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (416 KB) |  | HTML iconHTML  

    Storage systems manage quotas to ensure that no one user can use more than their share of storage, and that each user gets the storage they need. This is difficult for large, distributed systems, especially those used for high- performance computing applications, because resource allocation occurs on many nodes concurrently. While quota management is an important problem, no robust scalable solutions have been proposed to date. We present a solution that has less than 0.2% performance overhead while the system is below saturation, compared with not enforcing quota at all. It provides byte-level accuracy at all times, in the absence of failures and cheating. If nodes fail or cheat, we recover within a bounded period. In our scheme quota is enforced asynchronously by intelligent storage servers: storage clients contact a shared management service to obtain vouchers, which the clients can spend like cash at participating storage servers to allocate storage space. Like a digital cash system, the system periodically reconciles voucher usage to ensure that clients do not cheat by spending the same voucher at multiple storage servers. We report on a simulation study that validates this approach and evaluates its performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing and Evaluating Security Controls for an Object-Based Storage System

    Page(s): 87 - 99
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (454 KB) |  | HTML iconHTML  

    This paper presents the implementation and performance evaluation of a real, secure object-based storage system compliant to the T10 OSD standard. In contrast to previous work, our system implements the entire three security methods of the OSD security protocol defined in the standard, namely CAPKEY, CMDRSP and ALLDATA, and an Oakley-based authentication protocol by which the Metadata Server (MDS) and client can be sure of each other's identities. Moreover, our system supports concurrent operations from multiple clients to multiple OSDs. The MDS, a combination of security manager and storage/policy manager, performs access control, global namespace management, and concurrency control. We also evaluate the performance and scalability of our implementation and compare it with iSCSI, NFS and Lustre storage configurations. The overhead of access control is small: compared with the same system without any security mechanism, bandwidth for reads and writes with the CAPKEY and CMDRSP methods decreases by less than 5%, while latency for metadata operations with any of the security methods increases by less than 0.3 ms (5%). The system with the ALLDATA method suffers a higher performance penalty: large sequential accesses run at 46% and 52% of the maximum bandwidth of unsecured storage for reads and writes respectively. The aggregate throughput scales with the number of OSDs (up to 8 in our experiments). The overhead of the SET KEY commands for partition and working keys refreshed frequently is less than 2 ms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trustworthy Migration and Retrieval of Regulatory Compliant Records

    Page(s): 100 - 113
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (776 KB) |  | HTML iconHTML  

    Compliance storage servers are designed to meet organizational needs for trustworthy records retention, largely mandated by recent legislations such as HIPAA, SEC Rule 17a, and the Sarbanes-Oxley Act. These devices export a file-system-level interface, and enforce write-once read- many (WORM) semantics for file access. Compliance storage protects records from alteration, as long as they remain on the same storage server. However, the decades-long records retention requirements of recent legislation mean that a compliance storage server will often be obsolete long before the documents it contains can be destroyed. Unfortunately, records will be vulnerable to change during migration to a new server. Records are also vulnerable during retrieval, when they are taken off the server and "migrated" to the person or organization who needs them. In this paper, we propose techniques for trustworthy document migration and retrieval, by enhancing the storage servers with the capability to sign their files and directories. The proposed techniques can be used to verify that a migration was carried out properly, even across multiple migrations, deletions of expired documents, and changes in the content and structure of migrated directories. In our approach, file writers incur no performance penalty, which is important since compliance workloads are write-intensive. Migration incurs a reasonable 5-10% space overhead and requires 24 msec processing time per file. The result of the migration can be verified at a rate of 24 msec per file by a trustworthy auditor (or ordinary user), who can then generate a certificate attesting to the correctness of the migration. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Capability based Secure Access Control to Networked Storage Devices

    Page(s): 114 - 128
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (318 KB) |  | HTML iconHTML  

    Today, access control security for storage area networks (zoning and masking) is implemented by mechanisms that are inherently insecure, and are tied to the physical network components. However, what we want to secure is at a higher logical level independent of the transport network; raising security to a logical level simplifies management, provides a more natural fit to a virtualized infrastructure, and enables a finer grained access control. In this paper, we describe the problems with existing access control security solutions, and present our approach which leverages the OSD (Object-based Storage Device) security model to provide a logical, cryptographically secured, in-band access control for today's existing devices. We then show how this model can easily be integrated into existing systems and demonstrate that this in-band security mechanism has negligible performance impact while simplifying management, providing a clean match to compute virtualization and enabling fine grained access control. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enabling database-aware storage with OSD

    Page(s): 129 - 142
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (419 KB) |  | HTML iconHTML  

    The ANSI object-based storage device (OSD) standard is a major step toward enabling explicit application-awareness in storage systems behind a standard, fully- interoperable interface [3]. In this paper, we explore a particular flavor of application-awareness, that of database applications. We describe the design and implementation of a database-aware storage system that uses the OSD interface not only as a means to access data, but also to permit explicit communication between the application and the storage system. This communication is significant, as it enables our storage system to transparently optimize data placement and request scheduling. We demonstrate that OSD makes it practical to improve storage performance in these ways without exposing proprietary disk drive parameters to application code, and without labor-intensive, fragile parameter measurement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Implementation of a Network Aware Object-based Tape Device

    Page(s): 143 - 156
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (392 KB) |  | HTML iconHTML  

    Data storage requirements have constantly evolved over time. In recent times, the rate of increase in volume of data has been exponential, partly because of regulations and partly because of increase in richness of data. This trend has led to an equally explosive increase in cost of management. Intelligent storage devices built using object- based storage (OSD) interfaces have gained increased acceptance due to the benefits of reduced management costs. The command set of the current OSD standard does not work well with tape-based storage solutions. In this work we propose a few extensions to the OSD standard in order to facilitate easier integration of tape devices into the object storage ecosystem. Further, we propose an intelligent buffering mechanism to maximize the utility of network attached tape devices, and we also propose mechanisms to make tape cartridges more portable within any object storage ecosystem. Our results for the intelligent buffering mechanism show that our scheme adapts better to mismatches in network and tape drive bandwidths. Specifically, our scheme helps minimize the repositioning of the tape, leading to better utilization and increased lifetime of the tape. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Providing Quality of Service Support in Object-Based File System

    Page(s): 157 - 170
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (818 KB) |  | HTML iconHTML  

    Bourbon is a quality of service framework designed to work with the Ceph object-based storage system. Ceph is a highly scalable distributed file system that can scale up to tens of thousands of object-based storage devices (OSDs). The Bourbon framework enables Ceph to become QoS-aware by providing the capability to isolate performance between different classes of workloads. The Bourbon framework is enabled by Q-EBOFS, a QoS-aware enhancement of the EBOFS object-based file system. Q-EBOFS allows individual OSDs to become QoS-aware, and by leveraging on the random element of the CRUSH data distribution algorithm employed by Ceph, it is possible for a collection of independent QoS-aware OSDs to provide class-based performance isolation at the global level. This preserves the highly scalable nature of Ceph by avoiding the introduction of any centralized components or the need to collect and propagate global state information. This paper presents the Bourbon framework by first describing Q-EBOFS, and then examines how a collection of OSDs running Q-EBOFS can work together to provide global-level QoS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Logging and Replication Techniques for Comprehensive Data Protection

    Page(s): 171 - 184
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (811 KB) |  | HTML iconHTML  

    Mariner is an iSCSI-based storage system that is designed to provide comprehensive data protection on commodity ATA disk and gigabit Ethernet technologies while offering the same performance as those without any such protection. In particular, Mariner supports continuous data protection (CDP) that allows every disk update within a time window to be undoable, and local/remote mirroring to guard data against machine/site failures. To minimize the performance overhead associated with CDP, Mariner employs a modified track-based logging technique that unifies the long-term logging required for CDP and short-term logging for low-latency disk writes. This new logging technique strikes an optimal balance among log space utilization, disk write latency, and ease of historical data access. To reduce the performance penalty of physical data replication used in local/remote mirroring, Mariner features a modified two-phase commit protocol that in turn is built on top of a novel transparent reliable multicast (TRM) mechanism specifically designed for Ethernet-based storage area networks. Without flooding the network, TRM is able to keep the network traffic load of reliable N-way replication roughly at the same level as the no-replication case, regardless of the value of N. Empirical performance measurements on the first Mariner prototype, which is built from gigabit Ethernet and ATA disks, shows that the average end-to-end latency for a 4KByte iSCSI write is under 1.2 msec when data logging and replication are both turned on. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partial Disk Failures: Using Software to Analyze Physical Damage

    Page(s): 185 - 198
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (683 KB) |  | HTML iconHTML  

    A good understanding of disk failures is crucial to ensure a reliable storage of data. There have been numerous studies characterizing disk failures under the common assumption that failed disks are generally unusable. Contrary to this assumption, partial disk failures are very common, e.g., caused by a head crash resulting in a small number of inaccessible disk sectors. Nevertheless, the damage can sometimes be catastrophic if the file system meta-data were among the affected sectors. As disk density rapidly increases, the likelihood of losing data also rises. This paper describes our experience in analyzing partial disk failures using the physical locations of damaged disk sectors to assess the extent and characteristics of the damage on disk platter surfaces. Based on our findings, we propose several fault-tolerance techniques to proactively guard against permanent data loss due to partial disk failures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RAIF: Redundant Array of Independent Filesystems

    Page(s): 199 - 214
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (310 KB) |  | HTML iconHTML  

    Storage virtualization and data management are well known problems for individual users as well as large organizations. Existing storage-virtualization systems either do not support a complete set of possible storage types, do not provide flexible data-placement policies, or do not support per-file conversion (e.g., encryption). This results in suboptimal utilization of resources, inconvenience, low reliability, and poor performance. We have designed a stackable file system called redundant array of independent filesystems (RAIF). It combines the data survivability and performance benefits of traditional RAID with the flexibility of composition and ease of development of stackable file systems. RAIF can be mounted on top of directories and thus on top of any combination of network, distributed, disk-based, and memory-based file systems. Individual files can be replicated, striped, or stored with erasure-correction coding on any subset of the underlying file systems. RAIF has similar performance to RAID. In configurations with parity, RAIF's write performance is better than the performance of driver-level and even entry-level hardware RAID systems. This is because RAIF has better control over the data and parity caching. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TPT-RAID: a High Performance Box-Fault Tolerant Storage System

    Page(s): 215 - 220
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (374 KB) |  | HTML iconHTML  

    TPT-RAID is a multi-box RAID wherein each ECC group comprises at most one block from any given storage box, and can thus tolerate a box failure. It extends the idea of an out-of band SAN controller into the RAID: data is sent directly between hosts and targets and among targets, and the RAID controller supervises ECC calculation by the targets. By preventing a communication bottleneck in the controller, excellent scalability is achieved while retaining the simplicity of centralized control. TPT-RAID, whose controller can be a software module within an out-of-band SAN controller, moreover conforms to a conventional switched network architecture, whereas an in-band RAID controller would either constitute a communication bottleneck or would have to also be a full-fledged router. The design is validated in an InfiniBand-based prototype using /SCSI and /SER, and required changes to relevant protocols are introduced. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tornado Codes for MAID Archival Storage

    Page(s): 221 - 226
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (277 KB) |  | HTML iconHTML  

    This paper examines the application of Tornado codes, a class of low density parity check (LDPC) erasure codes, to archival storage systems based on massive arrays of idle disks (MAID). We present a log- structured extent-based archival file system based on Tornado Coded stripe storage. The file system is combined with a MAID simulator to emulate the behavior of a large-scale storage system with the goal of employing Tornado Codes to provide fault tolerance and performance in a power-constrained environment. The effect of power conservation constraints on system throughput is examined, and a policy of placing multiple data nodes on a single device is shown to increase read throughput at the cost of a measurable, but negligible, decrease in fault tolerance. Finally, a system prototype is implemented on a 100 TB Lustre storage cluster, providing GridFTP accessible storage with higher reliability and availability than the underlying storage architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cryptographic Security for a High-Performance Distributed File System

    Page(s): 227 - 232
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB) |  | HTML iconHTML  

    Storage systems are increasingly subject to attacks. Cryptographic file systems mitigate the danger of exposing data by using encryption and integrity protection methods and guarantee end-to-end security for their clients. This paper describes a generic design for cryptographic file systems and its realization in a distributed storage-area network (SAN) file system. Key management is integrated with the meta-data service of the SAN file system. The implementation supports file encryption as well as integrity protection through hash trees. Both techniques have been implemented in the client file system driver. Benchmarks demonstrate that the overhead is noticeable for some artificially constructed use cases, but that it is very small for typical file system applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation and Evaluation of a Popularity-Based Reconstruction Optimization Algorithm in Availability-Oriented Disk Arrays

    Page(s): 233 - 238
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (245 KB) |  | HTML iconHTML  

    In this paper, we implement the incorporation of a popularity-based multi-threaded reconstruction optimization algorithm, PRO, into the recovery mechanism of the Linux software RAID (MD), which is a well-known and widely-used availability-oriented disk array scheme. To evaluate the impact of PRO on RAID- structured storage systems such as MD, we conduct extensive trace-driven experiments. Our results demonstrate PRO's significant performance advantage over the existing reconstruction schemes, especially on a RAID-5 disk array, in terms of the measured reconstruction time and response time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance Evaluation of Multiple TCP connections in iSCSI

    Page(s): 239 - 244
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1292 KB) |  | HTML iconHTML  

    Storage area networks (SANs) based on fibre channel have been used extensively in the last decade while iSCSI is fast becoming a serious contender due to its reduced costs and unified infrastructure. This work examines the performance of iSCSI with multiple TCP connections. Multiple TCP connections are often used to realize higher bandwidth but there may be no fairness in how bandwidth is distributed. We propose a mechanism to share congestion information across multiple flows in "Fair-TCP" for improved performance. Our results show that Fair-TCP significantly improves the performance for I/O intensive workloads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.