Loading [MathJax]/extensions/MathMenu.js
Towards Machine Learning-Based Auto-tuning of MapReduce | IEEE Conference Publication | IEEE Xplore

Towards Machine Learning-Based Auto-tuning of MapReduce


Abstract:

MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoptio...Show More

Abstract:

MapReduce, which is the de facto programming model for large-scale distributed data processing, and its most popular implementation Hadoop have enjoyed widespread adoption in industry during the past few years. Unfortunately, from a performance point of view getting the most out of Hadoop is still a big challenge due to the large number of configuration parameters. Currently these parameters are tuned manually by trial and error, which is ineffective due to the large parameter space and the complex interactions among the parameters. Even worse, the parameters have to be re-tuned for different MapReduce applications and clusters. To make the parameter tuning process more effective, in this paper we explore machine learning-based performance models that we use to auto-tune the configuration parameters. To this end, we first evaluate several machine learning models with diverse MapReduce applications and cluster configurations, and we show that support vector regression model (SVR) has good accuracy and is also computationally efficient. We further assess our auto-tuning approach, which uses the SVR performance model, against the Starfish auto tuner, which uses a cost-based performance model. Our findings reveal that our auto-tuning approach can provide comparable or in some cases better performance improvements than Starfish with a smaller number of parameters. Finally, we propose and discuss a complete and practical end-to-end auto-tuning flow that combines our machine learning-based performance models with smart search algorithms for the effective training of the models and the effective exploration of the parameter space.
Date of Conference: 14-16 August 2013
Date Added to IEEE Xplore: 03 February 2014
Electronic ISBN:978-0-7695-5102-9

ISSN Information:

Conference Location: San Francisco, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.