By Topic

Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on

Date 3-7 May 2010

Filter Results

Displaying Results 1 - 25 of 33
  • [Front cover]

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (65 KB)  
    Freely Available from IEEE
  • [Front matter]

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (94 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): 1 - 3
    Save to Project icon | Request Permissions | PDF file iconPDF (85 KB)  
    Freely Available from IEEE
  • An adaptive partitioning scheme for DRAM-based cache in Solid State Drives

    Page(s): 1 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2160 KB) |  | HTML iconHTML  

    Recently, NAND flash-based Solid State Drives (SSDs) have been rapidly adopted in laptops, desktops, and server storage systems because their performance is superior to that of traditional magnetic disks. However, NAND flash memory has some limitations such as out-of-place updates, bulk erase operations, and a limited number of write operations. To alleviate these unfavorable characteristics, various techniques for improving internal software and hardware components have been devised. In particular, the internal device cache of SSDs has a significant impact on the performance. The device cache is used for two main purposes: to absorb frequent read/write requests and to store logical-to-physical address mapping information. In the device cache, we observed that the optimal ratio of the data buffering and the address mapping space changes according to workload characteristics. To achieve optimal performance in SSDs, the device cache should be appropriately partitioned between the two main purposes. In this paper, we propose an adaptive partitioning scheme, which is based on a ghost caching mechanism, to adaptively tune the ratio of the buffering and the mapping space in the device cache according to the workload characteristics. The simulation results demonstrate that the performance of the proposed scheme approximates the best performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High performance solid state storage under Linux

    Page(s): 1 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (195 KB) |  | HTML iconHTML  

    Solid state drives (SSDs) allow single-drive performance that is far greater than disks can produce. Their low latency and potential for parallel operations mean that they are able to read and write data at speeds that strain operating system I/O interfaces. Additionally, their performance characteristics expose gaps in existing benchmarking methodologies. We discuss the impact on Linux system design of a prototype PCI Express SSD that operates at least an order of magnitude faster than most drives available today. We develop benchmarking strategies and focus on several areas where current Linux systems need improvement, and suggest methods of taking full advantage of such high-performance solid state storage. We demonstrate that an SSD can perform with high throughput, high operation rates, and low latency under the most difficult conditions. This suggests that high-performance SSDs can dramatically improve parallel I/O performance for future high performance computing (HPC) systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Achieving page-mapping FTL performance at block-mapping FTL cost by hiding address translation

    Page(s): 1 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (534 KB) |  | HTML iconHTML  

    Flash Translation Layer (FTL) is one of the most important components of SSD, whose main purpose is to perform logical to physical address translation in a way that is suitable to the unique physical characteristics of the Flash memory technology. The pure page-mapping FTL scheme, arguably the best FTL scheme due to its ability to map any logical page number (LPN) to any physical page number (PPN) to minimize erase operations, cannot be practically deployed since it consumes a prohibitively large RAM (SRAM or DRAM) space to store the page-mapping table for an SSD of moderate to large size. Alternatives to the pure page-mapping FTL, such as block-mapping FTLs, hybrid FTLs (e.g., FAST) and the latest demand-based page-mapping FTLs (e.g., DFTL), require significantly less RAM space but suffer from a few performance issues. Block-mapping FTLs perform poorly with higher erasure counts, particularly under random write workloads. Hybrid FTL schemes incur costly merge operations that hurt performance and increase the erasure counts. Performances of demand-based FTLs heavily depend on workload characteristics such as access locality, read/write ratio and request arrival interval time. This paper proposes a new FTL scheme, called HAT, to achieve the performance of a pure page-mapping FTL at the RAM cost of a block-mapping FTL while consuming lower energy, by hiding the address translation (HAT). The basic idea behind our scheme is to create a separate access path to read/write the address mapping information to significantly Hide the Address-Translation latency by incorporating a low energy-consuming solid-state memory device that stores the entire page mapping table. We implement an SSD simulator, SSDsim, to validate our HAT design and evaluate its performance. The extensive trace-driven simulation results show that the performance of HAT is within 0.8% of the pure page-mapping FTL, while consuming about 50% of the energy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Block storage listener for detecting file-level intrusions

    Page(s): 1 - 12
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (193 KB) |  | HTML iconHTML  

    An intrusion detection system (IDS) is usually located and operated at the host, where it captures local suspicious events, or at an appliance that listens to the network activity. Providing an online IDS to the storage controller is essential for dealing with compromised hosts or coordinated attacks by multiple hosts. SAN block storage controllers are connected to the world via block-level protocols, such as iSCSI and Fibre Channel. Usually, block-level storage systems do not maintain information specific to the file-system using them. The range of threats that can be handled at the block level is limited. A file system view at the controller, together with the knowledge of which arriving block belongs to which file or inode, will enable the detection of file-level threats. In this paper, we present IDStor, an IDS for block-based storage. IDStor acts as a listener to storage traffic, out of the controller's I/O path, and is therefore attractive for integration into existing SAN-based storage solutions. IDStor maintains a block-to-file mapping that is updated online. Using this mapping, IDStor infers the semantics of file-level commands from the intercepted block-level operations, thereby detecting file-level intrusions by merely observing the block read and write commands passing between the hosts and the controller. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs

    Page(s): 1 - 14
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (466 KB) |  | HTML iconHTML  

    Large scale storage systems require multi-disk fault tolerant erasure codes. Replication and RAID extensions that protect against two- and three-disk failures offer a stark tradeoff between how much data must be stored, and how much data must be read to recover a failed disk. Flat XOR-codes-erasure codes in which parity disks are calculated as the XOR of some subset of data disks-offer a tradeoff between these extremes. In this paper, we describe constructions of two novel flat XOR-code, Stepped Combination and HD-Combination codes. We describe an algorithm for flat XOR-codes that enumerates recovery equations, i.e., sets of disks that can recover a failed disk. We also describe two algorithms for flat XOR-codes that generate recovery schedules, i.e., sets of recovery equations that can be used in concert to achieve efficient recovery. Finally, we analyze the key storage properties of many flat XOR-codes and of MDS codes such as replication and RAID 6 to show the cost-benefit tradeoff gap that flat XOR-codes can fill. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • S2-RAID: A new RAID architecture for fast data recovery

    Page(s): 1 - 9
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (354 KB) |  | HTML iconHTML  

    As disk volume grows rapidly with terabyte disk becoming a norm, RAID reconstruction time in case of a failure takes prohibitively long time. This paper presents a new RAID architecture, S2-RAID, allowing the disk array to reconstruct very quickly in case of a disk failure. The idea is to form skewed sub RAIDs (S2-RAID) in the RAID structure so that reconstruction can be done in parallel dramatically speeding up data reconstruction time and hence minimizing the chance of data loss. To make such parallel reconstruction conflict-free, each sub-RAID is formed by selecting one logic partition from each disk group with size being a prime number. We have implemented a prototype S2-RAID system in Linux operating system for the purpose of evaluating its performance potential. SPC IO traces and standard benchmarks have been used to measure the performance of S2-RAID as compared to existing baseline software RAID, MD. Experimental results show that our new S2-RAID speeds up data reconstruction time by a factor of 3 to 6 compared to the traditional RAID. At the same time, S2-RAID shows similar or better production performance than baseline RAID while online RAID reconstruction is in progress. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Security Aware Partitioning for efficient file system search

    Page(s): 1 - 14
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (446 KB) |  | HTML iconHTML  

    Index partitioning techniques-where indexes are broken into multiple distinct sub-indexes-are a proven way to improve metadata search speeds and scalability for large file systems, permitting early triage of the file system. A partitioned metadata index can rule out irrelevant files and quickly focus on files that are more likely to match the search criteria. Also, in a large file system that contains many users, a user's search should not include confidential files the user doesn't have permission to view. To meet these two parallel goals, we propose a new partitioning algorithm, Security Aware Partitioning, that integrates security with the partitioning method to enable efficient and secure file system search. In order to evaluate our claim of improved efficiency, we compare the results of Security Aware Partitioning to six other partitioning methods, including implementations of the metadata partitioning algorithms of SmartStore and Spyglass, two recent systems doing partitioned search in similar environments. We propose a general set of criteria for comparing partitioning algorithms, and use them to evaluate the partitioning algorithms. Our results show that Security Aware Partitioning can provide excellent search performance at a low computational cost to build indexes, O(n). Based on metrics such as information gain, we also conclude that expensive clustering algorithms do not offer enough benefit to make them worth the additional cost in time and memory. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Linear Tape File System

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (156 KB) |  | HTML iconHTML  

    While there are many financial and practical reasons to prefer tape storage over disk for various applications, the difficultly of using tape in a general way is a major inhibitor to its wider usage. We present a file system that takes advantage of a new generation of tape hardware to provide efficient access to tape using standard, familiar system tools and interfaces. The Linear Tape File System (LTFS) makes using tape as easy, flexible, portable, and intuitive as using other removable and sharable media, such as a USB drive. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Hadoop Distributed File System

    Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (776 KB) |  | HTML iconHTML  

    The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A study of self-similarity in parallel I/O workloads

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (101 KB) |  | HTML iconHTML  

    A challenging issue in performance evaluation of parallel storage systems through trace-driven simulation is to accurately characterize and emulate I/O behaviors in real applications. The correlation study of inter-arrival times between I/O requests, with an emphasis on I/O-intensive scientific applications, shows the necessity to further study the self-similarity of parallel I/O arrivals. This paper analyzes several I/O traces collected in large-scale supercomputers and concludes that parallel I/Os exhibit statistically self-similar like behavior. Instead of Markov model, a new stochastic model is proposed and validated in this paper to accurately model parallel I/O burstiness. This model can be used to predicting I/O workloads in real systems and generate reliable synthetic I/O sequences in simulation studies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • BPAC: An adaptive write buffer management scheme for flash-based Solid State Drives

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1268 KB) |  | HTML iconHTML  

    Solid State Drives (SSD's) have shown promise to be a candidate to replace traditional hard disk drives, but due to certain physical characteristics of NAND flash, there are some challenging areas of improvement and further research. We focus on the layout and management of the small amount of RAM that serves as a cache between the SSD and the system that uses it. Of the techniques that have previously been proposed to manage this cache, we identify several sources of inefficient cache space management due to the way pages are clustered in blocks and the limited replacement policy. We develop a hybrid page/block architecture along with an advanced replacement policy, called BPAC, or Block-Page Adaptive Cache, to exploit both temporal and spatial locality. Our technique involves adaptively partitioning the SSD on-disk cache to separately hold pages with high temporal locality in a page list and clusters of pages with low temporal but high spatial locality in a block list. We run trace-driven simulations to verify our design and find that it outperforms other popular flash-aware cache schemes under different workloads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automated lookahead data migration in SSD-enabled multi-tiered storage systems

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (241 KB) |  | HTML iconHTML  

    The significant IO improvements of Solid State Disks (SSD) over traditional rotational hard disks makes it an attractive approach to integrate SSDs in tiered storage systems for performance enhancement. However, to integrate SSD into multi-tiered storage system effectively, automated data migration between SSD and HDD plays a critical role. In many real world application scenarios like banking and supermarket environments, workload and IO profile present interesting characteristics and also bear the constraint of workload deadline. How to fully release the power of data migration while guaranteeing the migration deadline is critical to maximizing the performance of SSD-enabled multi-tiered storage system. In this paper, we present an automated, deadline-aware, lookahead migration scheme to address the data migration challenge. We analyze the factors that may impact on the performance of lookahead migration efficiency and develop a greedy algorithm to adaptively determine the optimal lookahead window size to optimize the effectiveness of lookahead migration, aiming at improving overall system performance and resource utilization while meeting workload deadlines. We compare our lookahead migration approach with the basic migration model and validate the effectiveness and efficiency of our adaptive lookahead migration approach through a trace driven experimental study. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Delayed partial parity scheme for reliable and high-performance flash memory SSD

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (195 KB) |  | HTML iconHTML  

    The I/O performances of flash memory solid-state disks (SSDs) are increasing by exploiting parallel I/O architectures. However, the reliability problem is a critical issue in building a large-scale flash storage. We propose a novel Redundant Arrays of Inexpensive Disks (RAID) architecture which uses the delayed parity update and partial parity caching techniques for reliable and high-performance flash memory SSDs. The proposed techniques improve the performance of the RAID-5 SSD by 38% and 30% on average in comparison to the original RAID-5 technique and the previous delayed parity update technique, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Deferred updates for flash-based storage

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (251 KB) |  | HTML iconHTML  

    The NAND flash memory based storage has faster read, higher power savings, and lower cooling cost compared to the conventional rotating magnetic disk drive. However, in case of flash memory, read and write operations are not symmetric. Write operations are much slower than read operations. Moreover, frequent update operations reduce the lifetime of the flash memory. Due to the faster read performance, flash-based storage is particularly attractive for the read-intensive database workloads, while it can produce poor performance when used for the update-intensive database workloads. This paper aims to improve write performance and lifetime of flash-based storage for the update-intensive workloads. In particular, we propose a new hierarchical approach named as deferred update methodology. Instead of directly updating the data records, first we buffer the changes due to update operations as logs in two intermediate in-flash layers. Next, we apply multiple update logs in bulk to the data records. Experimental results show that our proposed methodology significantly improves update processing overhead and longevity of the flash-based storages. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • dedupv1: Improving deduplication throughput using solid state drives (SSD)

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (142 KB) |  | HTML iconHTML  

    Data deduplication systems discover and remove redundancies between data blocks. The search for redundant data blocks is often based on hashing the content of a block and comparing the resulting hash value with already stored entries inside an index. The limited random IO performance of hard disks limits the overall throughput of such systems, if the index does not fit into main memory. This paper presents the architecture of the dedupv1 dedupli-cation system that uses solid-state drives (SSDs) to improve its throughput compared to disk-based systems. dedupv1 is designed to use the sweet spots of SSD technology (random reads and sequential operations), while avoiding random writes inside the data path. This is achieved by using a hybrid deduplication design. It is an inline deduplication system as it performs chunking and fingerprinting online and only stores new data, but it is able to delay much of the processing as well as IO operations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Red: An efficient replacement algorithm based on REsident Distance for exclusive storage caches

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (545 KB) |  | HTML iconHTML  

    This paper presents our replacement algorithm named RED for storage caches. RED is exclusive. It can eliminate the duplications between a storage cache and its client cache. RED is high performance. A new criterion Resident Distance is proposed for making an efficient replacement decision instead of Recency and Frequency. Moreover, RED is non-intrusive to a storage client. It does not need to change client software and could be used in a real-life system. Previous work on the management of a storage cache can attain one or two of above benefits, but not all of them. We have evaluated the performance of RED by using simulations with both synthetic and real-life traces. The simulation results show that RED significantly outperforms LRU, ARC, MQ, and is better than DEMOTE, PROMOTE for a wide range of cache sizes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A performance model and file system space allocation scheme for SSDs

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (289 KB) |  | HTML iconHTML  

    Solid State Drives (SSDs) are now becoming a part of main stream computers. Even though disk scheduling algorithms and file systems of today have been optimized to exploit the characteristics of hard drives, relatively little attention has been paid to model and exploit the characteristics of SSDs. In this paper, we consider the use of SSDs from the file system standpoint. To do so, we derive a performance model for the SSDs. Based on this model, we devise a file system space allocation scheme, which we call Greedy-Space, for block or hybrid mapping SSDs. From the Postmark benchmark results, we observe substantial performance improvements when employing the Greedy-Space scheme in ext3 and Reiser file systems running on three SSDs available in the market. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Observations made while running a multi-petabyte storage system

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (204 KB) |  | HTML iconHTML  

    We take an overview of the CERN Advanced Storage (CASTOR) version 2 system and its usage at CERN while serving the High Energy Physics community. We further explore some of the observations made between 2005 and 2010 while managing this multi-petabyte distributed storage system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Write amplification reduction in NAND Flash through multi-write coding

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (129 KB) |  | HTML iconHTML  

    The block erase requirement in NAND Flash devices leads to the need for garbage collection. Garbage collection results in write amplification, that is, to an increase in the number of physical page programming operations. Write amplification adversely impacts the limited lifetime of a NAND Flash device, and can add significant system overhead unless a large spare factor is maintained. This paper proposes a NAND Flash system which uses multi-write coding to reduce write amplification. Multi-write coding allows a NAND Flash page to be written more than once without requiring an intervening block erase. We present a novel two-write coding technique based on enumerative coding, which achieves linear coding rates with low computational complexity. The proposed technique also seeks to minimize memory wear by reducing the number of programmed cells per page write. We describe a system which uses lossless data compression in conjunction with multi-write coding, and show through simulations that the proposed system has significantly reduced write amplification and memory wear. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Disk-enabled authenticated encryption

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (216 KB) |  | HTML iconHTML  

    Storage is increasingly becoming a vector for data compromise. Solutions for protecting on-disk data confidentiality and integrity to date have been limited in their effectiveness. Providing authenticated encryption, or simultaneous encryption with integrity information, is important to protect data at rest. In this paper, we propose that disks augmented with non-volatile storage (e.g., hybrid hard disks) and cryptographic processors (e.g., FDE drives) may provide a solution for authenticated encryption, storing security metadata within the drive itself to eliminate dependences on other parts of the system. We augment the DiskSim simulator with a flash simulator to evaluate the costs associated with managing operational overheads. These experiments show that proper tuning of system parameters can eliminate many of the costs associated with managing security metadata, with less than a 2% decrease in IOPS versus regular disks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable storage support for data stream processing

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (228 KB) |  | HTML iconHTML  

    Continuous data stream processing systems have offered limited support for data persistence in the past, for three main reasons: First, online, real-time queries examine current streaming data and (under the assumption of no server failures) do not require access to past data; second, stable storage devices are commonly thought to be constraining system throughput and response times when compared to main memory, and are thus kept off the common path; finally, the use of scalable storage solutions which would be required to sustain high data streaming rates have not been thoroughly investigated in the past. Our work advances the state of the art by providing data streaming systems with a scalable path to persistent storage. This path has low impact in the performance properties of a scalable streaming system and allows two fundamental enhancements to their capabilities: First, it allows stream persistence for reference/archival purposes (in other words, queries can now be applied on past data on-demand); second, fault tolerance is achievable by checkpointing and stream replay schemes that are not constrained by the size of main memory. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Leveraging disk drive acoustic modes for power management

    Page(s): 1 - 9
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1314 KB) |  | HTML iconHTML  

    Reduction of disk drive power consumption is a challenging task, particularly since the most prevalent way of achieving it, powering down idle disks, has many undesirable side-effects. Some hard disk drives support acoustic modes, meaning they can be configured to reduce the acceleration and velocity of the disk head. This reduces instantaneous power consumption but sacrifices performance. As a result, input/output (I/O) operations run longer at reduced power. This is useful for power capping since it causes significant reduction in peak power consumption of the disks. We conducted experiments on several disk drives that support acoustic management. Most of these disk drives support only two modes - quiet and normal. We ran different I/O workloads, including SPC-1 to simulate a real-world online transaction processing workload. We found that the reduction in peak power can reach up to 23% when using quiet mode. We show that for some workloads this translates into a reduction of 12.5% in overall energy consumption. In other workloads we encountered the opposite phenomenon-an increase of more than 6% in the overall energy consumption. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.