Skip to Main Content
In this paper we present a study of resource consumption profiles for MapReduce applications using Hadoop on Amazon EC2. We selected three applications and measured their resource usage in terms of CPU and memory footprint. Specifically, we study Grep, Word Count and Sort applications while altering Hadoop's configuration parameters corresponding to I/O buffer. Our study brings up 3 key points. Firstly, effect of I/O parameters on total running time of the application; secondly, invalid assumptions of Hadoop scheduler that three phases (copy, sort and reduce) of a Reduce task are equal; finally, an insight supported by the results from the experiments on ways to improve the Hadoop scheduler for running multiple jobs by capturing the resource consumption information of different applications. To the best of our knowledge this is the first work that presents resource usage study.