CoBell: Runtime Prediction for Distributed Dataflow Jobs in Shared Clusters | IEEE Conference Publication | IEEE Xplore

CoBell: Runtime Prediction for Distributed Dataflow Jobs in Shared Clusters


Abstract:

Distributed dataflow systems have been developed to help users analyze and process large datasets. While they make it easier for users to develop massively-parallel progr...Show More

Abstract:

Distributed dataflow systems have been developed to help users analyze and process large datasets. While they make it easier for users to develop massively-parallel programs, users still have to choose the amount of resources for the execution of their jobs. Yet, users do not necessarily understand workload and system dynamics, while they often have constraints like runtime targets and budgets. Addressing this problem, systems have been developed that automatically select the required amount of resources to fulfill the users' constraints. However, interference with co-located workloads can introduce a significant variance into the runtimes of jobs and make accurate runtime prediction harder. This paper presents CoBell, a resource allocation system that incorporates information about co-located workloads to improve the runtime prediction for jobs in shared clusters. CoBell receives jobs from users with runtime and scale-out constraints and then reserves resources based on predicted runtimes. We implemented CoBell as a job submission tool for YARN. As such, it works with existing YARN cluster setups. The paper evaluates CoBell using five different distributed dataflow jobs, showing that using CoBell results in runtimes that do not violate the runtime constraints by more than 7.2%.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 27 December 2018
ISBN Information:
Electronic ISSN: 2330-2186
Conference Location: Nicosia, Cyprus

Contact IEEE to Subscribe

References

References is not available for this document.