Abstract:
Distributed dataflow systems have been developed to help users analyze and process large datasets. While they make it easier for users to develop massively-parallel progr...Show MoreMetadata
Abstract:
Distributed dataflow systems have been developed to help users analyze and process large datasets. While they make it easier for users to develop massively-parallel programs, users still have to choose the amount of resources for the execution of their jobs. Yet, users do not necessarily understand workload and system dynamics, while they often have constraints like runtime targets and budgets. Addressing this problem, systems have been developed that automatically select the required amount of resources to fulfill the users' constraints. However, interference with co-located workloads can introduce a significant variance into the runtimes of jobs and make accurate runtime prediction harder. This paper presents CoBell, a resource allocation system that incorporates information about co-located workloads to improve the runtime prediction for jobs in shared clusters. CoBell receives jobs from users with runtime and scale-out constraints and then reserves resources based on predicted runtimes. We implemented CoBell as a job submission tool for YARN. As such, it works with existing YARN cluster setups. The paper evaluates CoBell using five different distributed dataflow jobs, showing that using CoBell results in runtimes that do not violate the runtime constraints by more than 7.2%.
Published in: 2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 27 December 2018
ISBN Information:
Electronic ISSN: 2330-2186