2014 International Workshop on Data Intensive Scalable Computing Systems

16-16 Nov. 2014

Filter Results

Displaying Results 1 - 15 of 15
  • [Title page iii]

    Publication Year: 2014, Page(s): i
    Request permission for reuse | PDF file iconPDF (85 KB)
    Freely Available from IEEE
  • Copyright Page

    Publication Year: 2014, Page(s): ii
    Request permission for reuse | PDF file iconPDF (57 KB)
    Freely Available from IEEE
  • Table of Contents

    Publication Year: 2014, Page(s):iii - iv
    Request permission for reuse | PDF file iconPDF (126 KB)
    Freely Available from IEEE
  • Workshop Organization

    Publication Year: 2014, Page(s): v
    Request permission for reuse | PDF file iconPDF (82 KB)
    Freely Available from IEEE
  • Par-BF: A Parallel Partitioned Bloom Filter for Dynamic Data Sets

    Publication Year: 2014, Page(s):1 - 8
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1824 KB) | HTML iconHTML

    Compared with a hash table, a Bloom Filter (BF) is more space-efficient for supporting fast matching though resulting in a controllable and acceptable false positive probability. The space size of the basic BF is predetermined based on the expected number of elements to be stored. However, we cannot predict the scale of a BF space for dynamic sets. The two existing solutions for supporting dynamic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • dispel4py: A Python Framework for Data-Intensive Scientific Computing

    Publication Year: 2014, Page(s):9 - 16
    Cited by:  Papers (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1216 KB) | HTML iconHTML

    This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. The main aim of dispel4py is to enable scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use. Therefore, special care has been taken to provide dispel4py with the ability to map abstract ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient, Failure Resilient Transactions for Parallel and Distributed Computing

    Publication Year: 2014, Page(s):17 - 24
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (547 KB) | HTML iconHTML

    Scientific simulations are moving away from using centralized persistent storage for intermediate data between workflow steps towards an all online model. This shift is motivated by the relatively slow IO bandwidth growth compared with compute speed increases. The challenges presented by this shift to Integrated Application Workflows are motivated by the loss of persistent storage semantics for no... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution

    Publication Year: 2014, Page(s):25 - 32
    Cited by:  Papers (3)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1618 KB) | HTML iconHTML

    In today's "Big Data" era, developers have adopted I/O techniques such as MPI-IO, Parallel NetCDF and HDF5 to garner enough performance to manage the vast amount of data that scientific applications require. These I/O techniques offer parallel access to shared datasets and together with a set of optimizations such as data sieving and two-phase I/O to boost I/O throughput. While most of these techn... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rethinking Key-Value Store for Parallel I/O Optimization

    Publication Year: 2014, Page(s):33 - 40
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1035 KB) | HTML iconHTML

    Key-Value Stores (KVStore) are being widely used as the storage system for large-scale Internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems (PFS) are the dominant storage systems. In this study, we carefully examine the architecture difference and performance characteristics of PFS and KVStore. We propose that it is valuable to util... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PSA: A Performance and Space-Aware Data Layout Scheme for Hybrid Parallel File Systems

    Publication Year: 2014, Page(s):41 - 48
    Cited by:  Papers (9)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (406 KB) | HTML iconHTML

    The underlying storage of hybrid parallel file systems (PFS) is composed of both SSD-based file servers (SServer) and HDD-based file servers (HServer). Unlike a traditional HServer, an SServer consistently provides improved storage performance but lacks storage space. However, most current data layout schemes do not consider the differences in performance and space between heterogeneous servers, a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Multipath Routing Algorithm for Data Center Networks

    Publication Year: 2014, Page(s):49 - 56
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (655 KB) | HTML iconHTML

    Multipath routing has been studied in diverse contexts such as wide-area networks and wireless networks in order to minimize the finish time of data transfer or the latency of message sending. The fast adoption of cloud computing for various applications including high-performance computing applications has drawn more attention to efficient network utilization through adaptive or multipath routing... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CULZSS-Bit: A Bit-Vector Algorithm for Lossless Data Compression on GPGPUs

    Publication Year: 2014, Page(s):57 - 64
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (1543 KB) | HTML iconHTML

    In this paper, we describe an algorithm to improve dictionary based lossless data compression on GPGPUs. The presented algorithm uses bit-wise computations and leverages bit parallelism for the core part of the algorithm which is the longest prefix match calculations. Using bit parallelism, also known as bit-vector approach, is a fundamentally new approach for data compression and promising in per... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Caching Approach to Reduce Communication in Graph Search Algorithms

    Publication Year: 2014, Page(s):65 - 72
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (686 KB) | HTML iconHTML

    In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a graph search, it is possible to capitalize on this characteristic and cache relevant information in high-degree vertexes. We applied this idea by cac... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping of RAID Controller Performance Data to the Job History on Large Computing Systems

    Publication Year: 2014, Page(s):73 - 80
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (447 KB) | HTML iconHTML

    For systems executing a mixture of different data intensive applications in parallel there is always the question about the impact that each application has on the storage subsystem. From the perspective of storage, I/O is typically anonymous as it does not contain user identifiers or similar information. This paper focuses on the analysis of performance data collected on shared system components ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author Index

    Publication Year: 2014, Page(s): 81
    Request permission for reuse | PDF file iconPDF (58 KB)
    Freely Available from IEEE