Journals & Magazines >IEEE Transactions on Parallel... >Volume: 28 Issue: 11

aHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this paper, we propose an erasure-coded data archival system called aHDFS for Hadoop clusters, where RS(k + r; k) codes are employed to archive data replicas in the Ha...Show More

Metadata

Abstract:

In this paper, we propose an erasure-coded data archival system called aHDFS for Hadoop clusters, where RS(k + r; k) codes are employed to archive data replicas in the Hadoop distributed file system or HDFS. We develop two archival strategies (i.e., aHDFS-Grouping and aHDFS-Pipeline) in aHDFSto speed up the data archival process. aHDFS-Groupinga MapReduce-based data archiving scheme - keeps each mapper's intermediate output Key-Value pairs in a local key-value store. With the local store in place, aHDFS-Grouping merges all the intermediate key-value pairs with the same key into one single key-value pair, followed by shuffling the single Key-Value pair to reducers to generate final parity blocks. aHDFS-Pipeline forms a data archival pipeline using multiple data node in a Hadoop cluster. aHDFS-Pipeline delivers the merged single key-value pair to a subsequent node's local key-value store. Last node in the pipeline is responsible for outputting parity blocks. We implement aHDFS in a real-world Hadoop cluster. The experimental results show that aHDFS-Grouping and aHDFS-Pipeline speed up Baseline's shuffle and reduce phases by a factor of 10 and 5, respectively. When block size is larger than 32 MB, aHDFS improves the performance of HDFS-RAID and HDFS-EC by approximately 31.8 and 15.7 percent, respectively.

Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 28, Issue: 11, 01 November 2017)

Page(s): 3060 - 3073

Date of Publication: 19 May 2017

ISSN Information:

DOI: 10.1109/TPDS.2017.2706686

Funding Agency:

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

IEEE Keywords
- Mathematical model ,
- Distributed databases ,
- Redundancy ,
- Encoding ,
- Programming ,
- Pipelines ,
- Data models
Index Terms
- Data Repository ,
- Archiving System ,
- Hadoop Cluster ,
- Block Size ,
- Local Store ,
- File System ,
- Parity-check ,
- Intermediate Output ,
- Key-value Pairs ,
- Subsequent Nodes ,
- Hadoop Distributed File System ,
- Storage Systems ,
- Side Of Equation ,
- Fault-tolerant ,
- Intermediate Results ,
- Urban Network ,
- Storage Cost ,
- File Size ,
- Key Values ,
- Pipelining ,
- Reed-Solomon Codes ,
- Total Execution Time ,
- Data Block ,
- Reduction In Execution Time ,
- Map Tasks ,
- Phase Map ,
- Optimal Guidance ,
- Intermediate Data ,
- Multiple Mapping
Author Keywords

Contents

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

IEEE Keywords
- Mathematical model ,
- Distributed databases ,
- Redundancy ,
- Encoding ,
- Programming ,
- Pipelines ,
- Data models
Index Terms
- Data Repository ,
- Archiving System ,
- Hadoop Cluster ,
- Block Size ,
- Local Store ,
- File System ,
- Parity-check ,
- Intermediate Output ,
- Key-value Pairs ,
- Subsequent Nodes ,
- Hadoop Distributed File System ,
- Storage Systems ,
- Side Of Equation ,
- Fault-tolerant ,
- Intermediate Results ,
- Urban Network ,
- Storage Cost ,
- File Size ,
- Key Values ,
- Pipelining ,
- Reed-Solomon Codes ,
- Total Execution Time ,
- Data Block ,
- Reduction In Execution Time ,
- Map Tasks ,
- Phase Map ,
- Optimal Guidance ,
- Intermediate Data ,
- Multiple Mapping
Author Keywords

References is not available for this document.

aHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

aHDFS: An Erasure-Coded Data Archival System for Hadoop Clusters

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?