Scheduled System Maintenance:
On May 6th, system maintenance will take place from 8:00 AM - 12:00 PM ET (12:00 - 16:00 UTC). During this time, there may be intermittent impact on performance. We apologize for the inconvenience.
By Topic

Petascale Data Storage Workshop, 2008. PDSW '08. 3rd

Date 17-17 Nov. 2008

Filter Results

Displaying Results 1 - 13 of 13
  • Proceedings of the 2008 3rd Petascale Data Storage Workshop (PDSW '08) [front matter]

    Publication Year: 2008 , Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (792 KB)  
    Freely Available from IEEE
  • Input/output APIs and data organization for high performance scientific computing

    Publication Year: 2008 , Page(s): 1 - 6
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (84 KB) |  | HTML iconHTML  

    Scientific Data Management has become essential to the productivity of scientists using ever larger machines and running applications that produce ever more data. There are several specific issues when running on petascale (and beyond) machines. One is the need for massively parallel data output, which in part, depends on the data formats and semantics being used. Here, the inhibition of parallelism by file system notions of strict and immediate consistency can be addressed with ldrdelayed data consistencypsila methods. Such methods can also be used to remove the runtime coordination steps required for immediate consistency from machine resources like Bluegene's separate networks for barrier calls and its dedicated IO nodes, thereby freeing them to instead, perform alternate tasks that enhance data output performance and/or richness. Second, once data is generated, it is important to be able to efficiently access it, which implies the need for rapid data characterization and indexing. This can be achieved by adding small amounts of metadata to the output process, thereby permitting scientists to quickly make informed decisions about which files to process from large-scale science runs. Third, failure probabilities increase with an increasing number of nodes, which suggests the need for organizing output data to be resilient to failures in which the output from a single or from a small number of nodes is lost or corrupted. This paper demonstrates the utility of using delayed consistency methods for the process of data output from the compute nodes of petascale machines. It also demonstrates the advantages derived from resilient data organization coupled with lightweight methods for data indexing. An implementation of these techniques is realized in ADIOS, the Adaptable IO System, and its BP intermediate file format. The implementation is designed to be compatible with existing, well-known file formats like HDF-5 and NetCDF, thereby permitting end users to exploit th- - e rich tool chains for these formats. Initial performance evaluations of the approach exhibit substantial performance advantages over using native parallel HDF-5 in the Chimera supernova code. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast log-based concurrent writing of checkpoints

    Publication Year: 2008 , Page(s): 1 - 4
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6191 KB) |  | HTML iconHTML  

    This report describes how a file system level log-based technique can improve the write performance of many-to-one write checkpoint workload typical for high performance computations. It is shown that a simple log-based organization can provide for substantial improvements in the write performance while retaining the convenience of a single flat file abstraction. The improvement of the write performance comes at the cost of degraded read performance however. Techniques to alleviate the read performance penalty, such as file reconstruction on the first read, are discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Zest Checkpoint storage system for large supercomputers

    Publication Year: 2008 , Page(s): 1 - 5
    Cited by:  Papers (8)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (90 KB) |  | HTML iconHTML  

    The PSC has developed a prototype distributed file system infrastructure that vastly accelerates aggregated write bandwidth on large compute platforms. Write bandwidth, more than read bandwidth, is the dominant bottleneck in HPC I/O scenarios due to writing checkpoint data, visualization data and post-processing (multi-stage) data. We have prototyped a scalable solution that will be directly applicable to future petascale compute platforms having of order 10^6 cores. Our design emphasizes high-efficiency scalability, low-cost commodity components, lightweight software layers, end-to-end parallelism, client-side caching and software parity, and a unique model of load-balancing outgoing I/O onto high-speed intermediate storage followed by asynchronous reconstruction to a 3rd-party parallel file system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable full-text search for petascale file systems

    Publication Year: 2008 , Page(s): 1 - 7
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (301 KB) |  | HTML iconHTML  

    As file system capacities reach the petascale, it is becoming increasingly difficult for users to organize, find, and manage their data. File system search has the potential to greatly improve how users manage and access files. Unfortunately, existing file system search is designed for smaller scale systems, making it difficult for existing solutions to scale to petascale files systems. In this paper, we motivate the importance of file system search in petascale file systems and present a new full text file system search design for petascale file systems. Unlike existing solutions, our design exploits file system properties. Using a novel index partitioning mechanism that utilizes file system namespace locality, we are able to improve search scalability and performance and we discuss how such a design can potentially improve search security and ranking.We describe how our design can be implemented within the Ceph petascale file system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of RDMA-capable storage protocols on wide-area network

    Publication Year: 2008 , Page(s): 1 - 5
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3857 KB) |  | HTML iconHTML  

    Because of its high throughput, low CPU utilization, and direct data placement, RDMA (Remote Direct Memory Access) has been adopted for transport in a number of storage protocols, such as NFS and iSCSI. In this presentation, we provide a performance evaluation of RDMA-based NFS and iSCSI on Wide-Area Network (WAN). We show that these protocols, though benefit from RDMA on Local Area Network (LAN) and on WAN of short distance, are faced with a number of challenges to achieve good performance on long distance WAN. This is because of (a) the low performance of RDMA reads on WAN, (b) the small 4 KB chunks used in NFS over RDMA, and(c)the lack of RDMA capability in handling discontinuous data. Our experimental results document the performance behavior of these RDMA-based storage protocols on WAN. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparing performance of solid state devices and mechanical disks

    Publication Year: 2008 , Page(s): 1 - 7
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (97 KB) |  | HTML iconHTML  

    In terms of performance, solid state devices promise to be superior technology to mechanical disks. This study investigates performance of several up-to-date high-end consumer and enterprise Flash solid state devices (SSDs) and relates their performance to that of mechanical disks. For the purpose of this evaluation, the IOZone benchmark is run in single-threaded mode with varying request size and access pattern on an ext3 filesystem mounted on these devices. The price of the measured devices is then used to allow for comparison of price per performance. Measurements presented in this study offer an evaluation of cost-effectiveness of a Flash based SSD storage solution over a range of workloads. In particular, for sequential access pattern the SSDs are up to 10 times faster for reads and up to 5 times faster than the disks. For random reads, the SSDs provide up to 200times performance advantage. For random writes the SSDs provide up to 135times performance advantage. After weighting these numbers against the prices of the tested devices, we can conclude that SSDs are approaching price per performance of magnetic disks for sequential access patterns workloads and are superior technology to magnetic disks for random access patterns. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Arbitrary dimension Reed-Solomon coding and decoding for extended RAID on GPUs

    Publication Year: 2008 , Page(s): 1 - 3
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (83 KB) |  | HTML iconHTML  

    Reed-Solomon coding is a method of generating arbitrary amounts of checksum information from original data via matrix-vector multiplication in finite fields. Previous work has shown that CPUs are not well-matched to this type of computation, but recent graphical processing units (GPUs) have been shown through a case study to perform this encoding quickly for the 3 + 3 (three data + three parity) case. In order to be utilized in a true RAID-like system, it is important to understand how well this computation can scale in the number of data disks supported. This paper details the performance of a general Reed-Solomon encoding and decoding library that is suitable for use in RAID-like systems. Both generation and recovery are performance-tested and discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pianola: A script-based I/O benchmark

    Publication Year: 2008 , Page(s): 1 - 6
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6681 KB) |  | HTML iconHTML  

    Script-based I/O benchmarks record the I/O behavior of applications by using an instrumentation library to trace I/O events and their timing. A replay engine can then reproduce these events from the script in the absence of the original application. This type of benchmark reproduces real-world I/O workloads without the need to distribute, build, or run complex applications. However, faithfully recreating the I/O behavior of the original application requires careful design in both the instrumentation library and the replay engine. This paper presents the Pianola script-based benchmarking system, which includes an accurate and unobtrusive instrumentation system and a simple-to-use replay engine, along with some additional utility programs to manage the creation and replay of scripts. We show that for some sample applications, Pianola reproduces the qualitative features of the I/O behavior. Moreover, the overall replay time and the cumulative read and write times are usually within 10% of the values measured for the original applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Introducing map-reduce to high end computing

    Publication Year: 2008 , Page(s): 1 - 6
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2991 KB) |  | HTML iconHTML  

    In this work we present an scientific application that has been given a Hadoop MapReduce implementation. We also discuss other scientific fields of supercomputing that could benefit from a MapReduce implementation. We recognize in this work that Hadoop has potential benefit for more applications than simply data mining, but that it is not a panacea for all data intensive applications. We provide an example of how the halo finding application, when applied to large astrophysics datasets, benefits from the model of the Hadoop architecture. The halo finding application uses a friends of friends algorithm to quickly cluster together large sets of particles to output files which a visualization software can interpret. The current implementation requires that large datasets be moved from storage to computation resources for every simulation of astronomy data. Our Hadoop implementation allows for an in-place halo finding application on the datasets, which removes the time consuming process of transferring data between resources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Logan: Automatic management for evolvable, large-scale, archival storage

    Publication Year: 2008 , Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (146 KB) |  | HTML iconHTML  

    Archival storage systems designed to preserve scientific data, business data, and consumer data must maintain and safeguard tens to hundreds of petabytes of data on tens of thousands of media for decades. Such systems are currently designed in the same way as higher-performance, shorter-term storage systems, which have a useful lifetime but must be replaced in their entirety via a ldquofork-liftrdquo upgrade. Thus, while existing solutions can provide good energy efficiency and relatively low cost, they do not adapt well to continuous improvements in technology, becoming less efficient relative to current technology as they age. In an archival storage environment, this paradigm implies an endless series of wholesale migrations and upgrades to remain efficient and up to date. Our approach, Logan, manages node addition, removal, and failure on a distributed network of intelligent storage appliances, allowing the system to gradually evolve as device technology advances. By automatically handling most of the common administration chores-integrating new devices into the system, managing groups of devices that work together to provide redundancy, and recovering from failed devices-Logan reduces management overhead and thus cost. Logan can also improve cost and space efficiency by identifying and decommissioning outdated devices, thus reducing space and power requirements for the archival storage system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Just-in-time staging of large input data for supercomputing jobs

    Publication Year: 2008 , Page(s): 1 - 5
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (110 KB) |  | HTML iconHTML  

    High performance computing is facing a data deluge from state-of-the-art colliders and observatories. Large data-sets from these facilities, and other end-user sites, are often inputs to intensive analyses on modern supercomputers. Timely staging in of input data at the supercomputer's local storage can not only optimize space usage, but also protect against delays due to storage system failures. To this end, we propose a just-in-time staging framework that uses a combination of batch-queue predictions, user-specified intermediate nodes, and decentralized data delivery to coincide input data staging with job startup. Our preliminary prototype has been integrated with widely used tools such as the PBS job submission system, BitTorrent data delivery, and Network Weather Service network monitoring facility. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Revisiting the metadata architecture of parallel file systems

    Publication Year: 2008 , Page(s): 1 - 9
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (168 KB) |  | HTML iconHTML  

    As the types of problems we solve in high-performance computing and other areas become more complex, the amount of data generated and used is growing at a rapid rate. Today many terabytes of data are common; tomorrow petabytes of data will be the norm. Much work has been put into increasing capacity and I/O performance for large-scale storage systems. However, one often ignored area is metadata management. Metadata can have a significant impact on the performance of a system. Past approaches have moved metadata activities to a separate server in order to avoid potential interference with data operations. However, with the advent of object-based storage technology, there is a compelling argument to re-couple metadata and data. In this paper we present two metadata management schemes, both of which remove the need for a separate metadata server and replace it with object-based storage. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.