Skip to Main Content
Workflow-based workloads usually consist of multiple instances of the same workflow, which are jobs with control or data dependencies to carry out a well-defined scientific computation task, with each instance acting on its own input data. To maximize the performance, a high degree of concurrency is always achieved by running multiple instances simultaneously. However, since the amount of storage is limited on most systems, deadlock due to oversubscribed storage requests is a potential problem. To address this problem, we integrate two novel concepts with the traditional problem of deadlock avoidance by proposing two algorithms that can maximize active (not just allocated) resource utilization and minimize makespan. Our approach is based on the well-known banker's algorithm, but our algorithms make the important distinction between active and inactive resources, which is not a part of previous approaches. The central idea is to leverage the data-flow information to dynamically approximate localized maximum claim (i.e., the resource requirements of the remaining jobs of the instance) to improve either interinstance or intrainstance concurrency and still avoid deadlock. Through simulation-based studies, we show how our proposed algorithms are better than the classic banker's algorithm and the more recent Lang's algorithm in terms of makespan and active storage resource utilization.