This paper addresses open job scheduling questions for the challenge workloads that run on the large scale parallel systems at supercomputer centers. Simulation results for six recent one-month job traces from the NCSA Origin 2000 (O2K) system are used to evaluate (I) the experimentally tuned NCSA LSF* policy, (2) the FCFS-backfill policy, (3) the Priority-backfill policy with alternative priority functions and with limited pre-emption to provide immediate service to each arriving job, and (4) the spatial equipartitioning (EQspatial) policy with an optional modification to reduce the maximum waiting time for the largest jobs in the challenge workloads. Measurements on the O2K validate the simulation results for two of the policies. The priority-backfill policy with immediate service and a starvation-free priority measure that favors short jobs is shown to be the most promising if jobs cannot adapt to changing processor allocations at runtime, but EQspatial provides significantly better 95th percentile waiting time
Published in:
Parallel and Distributed Processing Symposium., Proceedings 15th International
Date of Conference: Apr 2001