Data locality is one of the critical factors which affect the system performance. In this paper, we focus on the data locality problem in Hadoop MapReduce. To improve the data locality of MapReduce, we propose a scheduling method. After receiving a request from a node, the method selects a task from the first level followed by the second and the third level of the node. Then, it checks whether the task is the only one on the first level of the node to issue a request. If so, the method skips the selected task, and selects another task for the node issuing a request. Otherwise, the method schedules the selected task to the node. We have analyzed the method. Comparing with default scheduling method of Hadoop MapReduce, the proposed method can improve the efficiency of data locality.
Published in:
Electrical & Electronics Engineering (EEESYM), 2012 IEEE Symposium on
Date of Conference: 24-27 June 2012