Skip to Main Content
Scheduling parallel applications on existing or emerging computing platforms is challenging, and, among other attributes, must be efficient and robust. A dual-stage framework is proposed in this paper to evaluate the robustness of efficient resource allocation and dynamic load balancing of scientific applications in heterogeneous computing environments with uncertain availability. The first stage employs robust resource allocation heuristics, while the second stage incorporates robust dynamic loop scheduling techniques. The combined dual-stage framework constitutes a comprehensive framework that enables and provides guarantees for the robust execution of scientific applications in computing systems where uncertainty is caused by various unpredictable perturbations. The paper reports on studies for determining the best techniques to be used for each stage that: (a) maximize the probability that the system make span satisfies a deadline, and (b) minimize the system make span for every given availability level in the system. The usefulness and benefits of the proposed framework are demonstrated via a small scale example.