Abstract:
Dealing with resource allocation is one of the most critical problems in high performance computing (HPC). The jobs, which cannot get enough resources, are most likely de...Show MoreMetadata
Abstract:
Dealing with resource allocation is one of the most critical problems in high performance computing (HPC). The jobs, which cannot get enough resources, are most likely destined to fail. In this paper we suggest a new approach to predict whether the demanded CPUs and time slots for jobs are sufficient. To do so, we train a machine learning (ML) system, based on the collection of statistical data from the reference queue systems. Our ML predicts required resources for jobs at the time of job submission so that jobs won't fail due to the lack of resources. This machine learning uses supervised learning and it includes regression and classification tasks. Our results show that the accuracy of prediction is highly associated with prior information before submitting jobs. This information can be used to train our machine learning system better than before.
Published in: 2018 Global Smart Industry Conference (GloSIC)
Date of Conference: 13-15 November 2018
Date Added to IEEE Xplore: 09 December 2018
ISBN Information: