Skip to Main Content
MapReduce is a widely-used model for data parallel applications. We found its resource utilization is inefficient when there are not enough tasks to fill all task slots as the resources "reserved" for idle slots are just wasted. We propose resource stealing which enables running tasks to steal the unutilized resources and return them when new tasks are assigned. It exploits the opportunistic use of the otherwise wasted resources to improve overall resource utilization and reduce job execution time. Besides, our practical use of Hadoop shows the current mechanism adopted to trigger speculative execution creates many unnecessary speculative tasks that are killed soon after creation as the original tasks complete earlier. To alleviate the issue, we propose Benefit Aware Speculative Execution which predicts the benefit of running new speculative tasks and greatly eliminates unnecessary runs. Finally, MapReduce is mainly optimized for homogeneous environments and its inefficiency in heterogeneous network environments has been observed in our experiments. We investigate network heterogeneity aware scheduling of both map and reduce tasks. Overall, our goal is to enhance Hadoop to cope with significant network heterogeneity and improve resource utilization.