Abstract:
Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of th...Show MoreMetadata
Abstract:
Parallel systems divide jobs into smaller tasks that can be serviced by many workers at the same time. Some parallel systems have blocking barriers that require all of their tasks to start and/or depart in unison. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. The drawback of these barriers is reduced performance and stability compared to equivalent non-blocking systems.We derive analytical expressions for the stability regions for parallel systems with blocking start and/or departure barriers. We extend results from queueing theory to derive waiting and sojourn time bounds for systems with blocking start barriers. Our results show that for a given system utilization and number of servers, there is an optimal degree of parallelism that balances waiting time and job execution time. This observation leads us to propose and implement a class of self-adaptive schedulers, we call "Take-Half", that modulate the allowed degree of parallelism based on the instantaneous system load, improving mean performance and eliminating stability issues.
Date of Conference: 02-05 May 2022
Date Added to IEEE Xplore: 20 June 2022
ISBN Information: