Exploring Benefits of NVMe SSDs for BigData Processing in Enterprise Data Centers | IEEE Conference Publication | IEEE Xplore

Exploring Benefits of NVMe SSDs for BigData Processing in Enterprise Data Centers


Abstract:

Big data processing environments such as Apache Spark are prominently deployed for applications with large scale workloads. New storage technologies such as Non-Volatile ...Show More

Abstract:

Big data processing environments such as Apache Spark are prominently deployed for applications with large scale workloads. New storage technologies such as Non-Volatile Memory Express Solid State Drives (NVMe SSDs) provide higher throughput comparing to the traditional Hard Disk Drives (HDDs). Therefore, NVMe SSDs are rapidly substituting HDDs in modern data centers. In this paper, we explore whether it is critically necessary to use NVMe SSD for a large workload running on the Spark big data framework. Specifically, we investigate what are the influential factors of application design and Spark data processing framework to exploit the benefits of NVMe SSDs. Our real experimental results reveal that some applications even with large workloads cannot fully utilize NVMe SSDs to obtain high I/O throughput. Interestingly, we find out that characteristics of Spark data processing framework such as shuffling (i.e., the volume of transition data generated by an application), and parallelism (i.e., the number of concurrently running tasks) has very crucial impacts on the performance of big data applications running on NVMe SSDs.
Date of Conference: 09-11 August 2019
Date Added to IEEE Xplore: 21 November 2019
ISBN Information:
Conference Location: QingDao, China

Contact IEEE to Subscribe

References

References is not available for this document.