Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud | IEEE Conference Publication | IEEE Xplore

Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud


Abstract:

Large Scale distributed storage systems play a vital role in maintaining data across storage locations globally. These systems use replication as the default mechanism fo...Show More

Abstract:

Large Scale distributed storage systems play a vital role in maintaining data across storage locations globally. These systems use replication as the default mechanism for providing fault-tolerance. Recently, erasure codes are being used as a viable alternative to replication, since they provide the same fault-tolerance for reduced storage overhead. However, their performance is unclear in a geographically diverse distributed storage system. This paper compares the performance of triple replication with the erasure coding (Reed-Solomon codes) used in Apache Hadoop’s implementation of a distributed file system, on a cluster distributed across Australia that runs on the NeCTAR research cloud. Our results show that using erasure coding does not degrade the read performance in such a setting. We also compare the Hadoop’s code with a local reconstruction code, implemented in the XORBAS version of Hadoop. These codes perform well in our clusters but the performance gain observed in our results does not conform to the results reported. Hence, we need new codes that perform better, addressing the geographical diversity issue. We believe that our framework is readily usable to test a range of novel erasure codes that are being introduced in the literature.
Date of Conference: 22-24 June 2015
Date Added to IEEE Xplore: 06 August 2015
Electronic ISBN:978-1-4799-1911-6
Print ISSN: 2374-9660
Conference Location: Sydney, NSW, Australia

Contact IEEE to Subscribe

References

References is not available for this document.