Abstract:
Hadoop, as an open-source implementation of the MapReduce paradigm, is increasingly being used in both industry and academia for large-scale data processing. Yarn, one of...Show MoreMetadata
Abstract:
Hadoop, as an open-source implementation of the MapReduce paradigm, is increasingly being used in both industry and academia for large-scale data processing. Yarn, one of the core components of the second-generation Hadoop, manages cluster resources and job scheduling. Minimizing the total completion time of a set of MapReduce jobs is a point worth exploring in terms of Yarn’s performance. Hadoop’s default schedulers, including first-in-first-out (FIFO), Fair, and Capacity, do not consider the characteristics and preferences of job resource demand, resulting in insufficient resource utilization. Therefore, in this paper, a new job scheduler named Q-scheduler is proposed. It uses reinforcement learning (RL) to accumulate scheduling experience autonomously based on the Fair scheduler. Specifically, the proposed scheduler consists of a Classifier and a Decider. The Classifier classifies jobs through similarity measurement, and the Decider, as an agent with a Q-Table, considers the execution order of different job classes and updates the state-action values of the Q-Table to learn optimal scheduling. The experimental results show that Q-scheduler can reduce the total completion time of the job set and improve resource utilization.
Published in: 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)
Date of Conference: 08-10 May 2024
Date Added to IEEE Xplore: 10 July 2024
ISBN Information: