Abstract:
Apache Hive, Apache Pig and Pivotal HWAQ are very popular open source cluster computing frameworks for large scale data analytics. These frameworks hide the complexity of...Show MoreMetadata
Abstract:
Apache Hive, Apache Pig and Pivotal HWAQ are very popular open source cluster computing frameworks for large scale data analytics. These frameworks hide the complexity of task parallelism and fault-tolerance, by exposing a simple programming API to users. In this paper, we discuss the major architectural component differences in them and conduct detailed experiments to compare their performances with different inputs. Furthermore, we attribute these performance differences to different components which are architected differently in the three frameworks and we show the detailed execution overheads of Apache Hive, Apache Pig and Pivotal HAWQ, in which the CPU utilization, memory utilization, and disk read/write during their runtime are analyzed. Finally, a discussion and summary of our findings and suggestions are presented.
Date of Conference: 25-30 June 2017
Date Added to IEEE Xplore: 11 September 2017
ISBN Information:
Electronic ISSN: 2159-6190