By Topic

Petascale Data Storage Workshop (PDSW), 2010 5th

Date 15-15 Nov. 2010

Filter Results

Displaying Results 1 - 11 of 11
  • [Title page]

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (265 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (111 KB)  
    Freely Available from IEEE
  • Self-adjusting two-failure tolerant disk arrays

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (238 KB) |  | HTML iconHTML  

    We have presented a representation for a storage system with two failure tolerance based on flat XOR codes. We argue that this representation allows us to implement fast algorithm for the layout of very large, evolving disk arrays. Much needs to be done. Fast, but efficient algorithms for major changes in the disk array such as rack failure or insertion of new disks still need to be implemented and tested. Our goal is usually not to find an optimal layout (in a sense to be defined precisely), but one that is close to optimal. To assert that our algorithms perform at this level involves a more mathematical analysis of the consequences of failures in such an array to derive bounds on the robustness of optimal layouts, a task we have barely started. Nevertheless, the results we have indicate that the algorithms are quite effective and certainly fast and easy to implement. This presents definite progress over the true optimization (including looking for proven optimal designs) that can be done only for special, small cases and supports our pragmatic attitude. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using a shared storage class memory device to improve the reliability of RAID arrays

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (114 KB) |  | HTML iconHTML  

    Storage class memories (SCMs) constitute an emerging class of non-volatile storage devices that promise to be significantly faster and more reliable than magnetic disks. We propose to add one of these devices to each group of two or three RAID level arrays and store on it additional parity data. We show that the new organization can tolerate all double disk failures, between 75 and 90 percent of all triple disk failures and between 50 and 70 percent of all failures involving two disks and the SCM device without incurring any data loss. As a result, the additional parity device increases the mean time to data loss of the arrays in the group it protects by at least 200-fold. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic data placement for power management in archival storage

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (698 KB) |  | HTML iconHTML  

    Power is the greatest lifetime cost in an archival system, and, as decreasing costs make disks more attractive than tapes, spinning disks account for the majority of power drawn. To reduce this cost, we propose reducing the number of times disks have to spin up by grouping together files such that a typical spin-up handles several file accesses. For a typical system, we show that if only 30% of total accesses occur while disks are still spinning, we can conserve 12% of the power cost. We classify files according to directory structure and see access hit rates of up to 66% for a power savings of up to 52% of the power cost of spinning up for every read in easily-separable workloads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Workload characterization of a leadership class storage cluster

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1344 KB) |  | HTML iconHTML  

    Understanding workload characteristics is critical for optimizing and improving the performance of current systems and software, and architecting new storage systems based on observed workload patterns. In this paper, we characterize the scientific workloads of the world's fastest HPC (High Performance Computing) storage cluster, Spider, at the Oak Ridge Leadership Computing Facility (OLCF). Spider provides an aggregate bandwidth of over 240 GB/s with over 10 petabytes of RAID 6 formatted capacity. OLCFs flagship petascale simulation platform, Jaguar, and other large HPC clusters, in total over 250 thousands compute cores, depend on Spider for their I/O needs. We characterize the system utilization, the demands of reads and writes, idle time, and the distribution of read requests to write requests for the storage system observed over a period of 6 months. From this study we develop synthesized workloads and we show that the read and write I/O bandwidth usage as well as the inter-arrival time of requests can be modeled as a Pareto distribution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of commodity and enterprise class flash devices

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (819 KB) |  | HTML iconHTML  

    Five different flash-based storage devices were evaluated, two commodity SATA attached MLC ones and three enterprise PCIe attached SLC ones. Specifically, their peak bandwidth and IOPS capabilities were measured. The results show that the PCI attached devices have a significant performance advantage over the SATA ones, by a factor of between four and six in read and write bandwidth respectively, and by a factor of eight for random-read and a factor of 80 for random-write IOPS. The performance degradation that occurred when the drives were already partially filled with data was recorded. These measurements show that significant bandwidth degradation occurred for all the devices, whereas only one of the PCIe and one of the SATA drives showed any IOPS performance degradation. Across these tests no single device consistently out performs the others, therefore these results indicate that there is no one size fits all flash solution currently on the market and that devices should be evaluated carefully with I/O usage patterns as close as possible to the ones they are expected to encounter in a production environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Virtualization-based bandwidth management for parallel storage systems

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (141 KB) |  | HTML iconHTML  

    This paper presents a new parallel storage management approach which supports the allocation of shared storage bandwidth on a per-application basis. Existing parallel storage systems are unable to differentiate I/Os from different applications and meet per-application bandwidth requirement. This limitation presents a hurdle for applications to achieve their desired performance, which will become even more challenging as high-performance computing (HPC) systems continue to scale up with respect to both the amount of available resources and the number of concurrent applications. This paper proposes a novel solution to address this challenge through the virtualization of parallel file systems (PFSes). Such PFS virtualization is achieved with user-level PFS proxies, which interpose between native PFS clients and servers and schedule the I/Os from different applications according to the resource sharing algorithm (e.g., SFQ(D)). In this way, virtual PFSes can be created on a perapplication basis, each with a specific bandwidth share allocated according to its I/O requirement. This approach is applicable to different PFS-based parallel storage systems and can be transparently integrated with existing as well as future HPC systems. A prototype of this approach is implemented upon PVFS2, a widely used PFS, and evaluated with experiments using a typical parallel I/O benchmark (IOR). Results show that this approach's overhead is very small and it achieves effective proportional sharing under different usage scenarios. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extracting information ASAP!

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (139 KB) |  | HTML iconHTML  

    Designing I/O systems capable of scaling up to deal with the next generation of extreme scale scientific environments is a significant challenge. Scientific applications already strain the capabilities of current filesystems and storage systems. This work presents a middleware-based approach that generalizes from previous work on staging areas to focus more generally on staging resources. Exploiting the steady increase of the ratio of compute capability to I/O bandwidth, the EnStage middleware system allows for metadata characterization and I/O processing to occur when as where appropriate. This includes in reserved staging areas, buffered memory, and even in the writing processes' execution context. Using the EnStage extension to previous work, we find a 1.4% increase in runtime due to additional functionality resulted in I/O time in the staging area dropping to only 16% of the non-reduced output. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Collective prefetching for parallel I/O systems

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (260 KB) |  | HTML iconHTML  

    Data prefetching can be beneficial for improving parallel I/O system performance, but the amount of benefit depends on how efficiently and swiftly prefetches can be done. In this study, we propose a new prefetching strategy, called collective prefetching. The idea is to exploit the correlation among I/O accesses of multiple processes of a parallel application and carry out prefetches collectively, instead of the traditional strategy of carrying out prefetches by each process individually. The rationale behind this new collective prefetching strategy is that the concurrent processes of the same parallel application have strong correlation with respect to their I/O requests. We present the idea, initial design and implementation of the new collective prefetching strategy in this study. The preliminary experimental results show that this new collective prefetching strategy holds promise for improving parallel I/O performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards parallel access of multi-dimensional, multi-resolution scientific data

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3359 KB) |  | HTML iconHTML  

    Large-scale scientific simulations routinely produce data of increasing resolution. Analyzing this data is key to scientific discovery. A critical bottleneck facing data analysis is the I/O time to access the data due to the disparity between a simulation's data layout and the data layout requirements of analysis applications. One method of addressing this problem is to reorganize the data in a manner that makes it more amenable to analysis and visualization. The IDX file format is one example of this approach. It orders data points so that they can be accessed at multiple resolution levels with favorable spatial locality and caching properties. IDX has been used successfully in fields such as digital photography and visualization of large scientific data, and is a promising approach for analysis of HPC data. Unfortunately, the existing tools for writing data in this format only provide a serial interface. HPC applications must therefore either write all data from a single process or convert existing data as a post-processing step, in either case failing to utilize available parallel I/O resources. In this work, we provide an overview of the IDX file format and the existing ViSUS library that provides serial access to IDX data. We investigate methods for writing IDX data in parallel and demonstrate that it is possible for HPC applications to write data directly into IDX format with scalable performance. Our preliminary results demonstrate 60% of the peak I/O throughput when reorganizing and writing the data from 512 processes on an IBM BG/P system. We also analyze the performance bottlenecks and propose future work towards a flexible and efficient implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.