Skip to Main Content
Hadoop is seriously limited by its MapReduce scheduler which does not scale well in heterogeneous environment. Heterogenous environment is characterized by various devices which vary greatly with respect to the capacities of computation and communication, architectures, memorizes and power. As an important extension of Hadoop, LATE MapReduce scheduling algorithm takes heterogeneous environment into consideration. However, it falls short of solving the crucial problem - poor performance due to the static manner in which it computes progress of tasks. Consequently, neither Hadoop nor LATE schedulers are desirable in heterogeneous environment. To this end, we propose SAMR: a Self-Adaptive MapReduce scheduling algorithm, which calculates progress of tasks dynamically and adapts to the continuously varying environment automatically. When a job is committed, SAMR splits the job into lots of fine-grained map and reduce tasks, then assigns them to a series of nodes. Meanwhile, it reads historical information which stored on every node and updated after every execution. Then, SAMR adjusts time weight of each stage of map and reduce tasks according to the historical information respectively. Thus, it gets the progress of each task accurately and finds which tasks need backup tasks. What's more, it identifies slow nodes and classifies them into the sets of slow nodes dynamically. According to the information of these slow nodes, SAMR will not launch backup tasks on them, ensuring the backup tasks will not be slow tasks any more. It gets the final results of the fine-grained tasks when either slow tasks or backup tasks finish first. The proposed algorithm is evaluated by extensive experiments over various heterogeneous environment. Experimental results show that SAMR significantly decreases the time of execution up to 25% compared with Hadoop's scheduler and up to 14% compared with LATE scheduler.
Date of Conference: June 29 2010-July 1 2010