Loading [MathJax]/extensions/MathZoom.js
Scaling HDFS to More Than 1 Million Operations Per Second with HopsFS | IEEE Conference Publication | IEEE Xplore

Scaling HDFS to More Than 1 Million Operations Per Second with HopsFS


Abstract:

HopsFS is an open-source, next generation distribution of the Apache Hadoop Distributed File System (HDFS) that replaces the main scalability bottleneck in HDFS, single n...Show More

Abstract:

HopsFS is an open-source, next generation distribution of the Apache Hadoop Distributed File System (HDFS) that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-shared state distributed system built on a NewSQL database. By removing the metadata bottleneck in Apache HDFS, HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower client latencies for large clusters. In this paper, we detail the techniques and optimizations that enable HopsFS to surpass 1 million file system operations per second - at least 16 times higher throughput than HDFS. In particular, we discuss how we exploit recent high performance features from NewSQL databases, such as application defined partitioning, partition-pruned index scans, and distribution aware transactions. Together with more traditional techniques, such as batching and write-ahead caches, we show how many incremental optimizations have enabled a revolution in distributed hierarchical file system performance.
Date of Conference: 14-17 May 2017
Date Added to IEEE Xplore: 13 July 2017
ISBN Information:
Conference Location: Madrid, Spain

Contact IEEE to Subscribe

References

References is not available for this document.