Abstract:
Since its inception in 2004, MapReduce has revealed as a paramount platform and disruptive technology for the execution of high performance applications that process very...Show MoreMetadata
Abstract:
Since its inception in 2004, MapReduce has revealed as a paramount platform and disruptive technology for the execution of high performance applications that process very large volumes of data. Hadoop is one of the most popular and widely adopted open source MapReduce implementation. Companies that execute large applications over hundreds or thousands of machines every day spend large efforts in performance tuning and optimization to reduce infrastructure costs. However, the framework has around 190 parameters which can be adjusted in a large number of different configurations that can significantly impact the performance of applications. The task of optimizing Hadoop parameters requires deep knowledge about a myriad platform details. In this paper, we propose and evaluate the use of derivative-free (DFO) methods for the automatic setup of Hadoop parameters to optimize the performance of applications. DFO methods provide a simple and efficient manner for automatic optimization of Hadoop MapReduce programs. Parameter changes are deployed through DevOps tools which are used to efficiently reconfigure the cluster according to DFO decisions. In the best scenario in our experiments, the automatic optimization leads to a reduction of 71% in the execution time over the default setup of parameters (i.e., an acceleration of 3.5 times) on a cluster of 28 nodes with very low overhead for production environments. Such results show that DFO methods and automatic optimization provide a promising tool for optimizing performance and reduction of costs for Hadoop applications which do not present dramatic variation in their behavior in daily production environments.
Published in: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)
Date of Conference: 17-19 February 2016
Date Added to IEEE Xplore: 04 April 2016
ISBN Information:
Electronic ISSN: 2377-5750