By Topic

Guarantee Strict Fairness and UtilizePrediction Better in Parallel Job Scheduling

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Yulai Yuan ; Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China ; Yongwei Wu ; Weimin Zheng ; Keqin Li

As the most widely used parallel job scheduling strategy, EASY backfilling achieved great success, not only because it can balance fairness and performance, but also because it is universally applicable to most HPC systems. However, unfairness still exists in EASY. Our simulation shows that a blocked job can be delayed by later jobs for more than 90 hours on real workloads. Additionally, directly employing runtime prediction techniques in EASY would lead to a serious situation called reservation violation. In this paper, we aim at guaranteeing strict fairness (no job is delayed by any jobs of lower priority) while achieving attractive performance, and employing prediction without causing reservation violation in parallel job scheduling. We propose two novel strategies, namely, shadow load preemption (SLP) and venture backfilling (VB), which are integrated into EASY to construct preemptive venture EASY backfilling (PV-EASY). Experimental results on three real HPC workloads demonstrate that PV-EASY is more attractive than EASY in parallel job scheduling, from both academic and industry perspectives.

description of the attached tpds-gagraphic-88.gif linked by @xlink:href description of the attached tpds-gagraphic-88.gif linked by @xlink:href

Published in:

Parallel and Distributed Systems, IEEE Transactions on  (Volume:25 ,  Issue: 4 )