Skip to Main Content
MapReduce framework has received a wide acclaim over the past few years for large scale computing. It has become a standard paradigm for batch oriented workloads. As the adoption of this paradigm has increased rapidly, scheduling of these MapReduce jobs has become a problem of great interest in research community. We propose an approach which tries to maintain harmony among the jobs running on the cluster, and in turn decrease their runtime. In our model, the scheduler is made aware of different types of jobs running on the cluster. The scheduler tries to allocate a task on a node if the incoming task does not affect the tasks already running on that node. From the list of available pending tasks, our algorithm selects the one that is most compatible with the tasks already running on that node. We bring up heuristic and machine learning based solutions to our approach and try to maintain a resource balance on the cluster by not overloading any of the nodes, thereby reducing the overall runtime of the jobs. The results show a saving of runtime of around 21% in the case of heuristic based approach and around 27% in the case of machine learning based approach when compared to Yahoo's Capacity scheduler.