Skip to Main Content
Large-scale Bag-of-Tasks (BoT) applications are characterized by their massively parallel, yet independent operations. The use of resources in public clouds to dynamically expand the capacity of a private computer system might be an appealing alternative to cope with such massive parallelism. To fully realize the benefit of this 'cloud bursting', the performance to cost ratio (or cost efficiency) must be thoroughly studied and incorporated into scheduling and resource allocation strategies. In this paper, we present PANDA, a framework for scheduling BoT applications across resources in both private and public clouds. The framework at the core incorporates a fully polynomial-time approximation scheme (FPTAS) as a novel scheduling algorithm, which generates schedules with the best tradeoff point between cost and performance; hence Pareto-optimality. We have theoretically discussed the complexity and correctness of our algorithms, and experimentally verified their efficacy and practicality using ISOMAP, a widely-used nonlinear manifold method as a real-world BoT application. Our evaluation conducted in a 'multi-cloud' environment of 48-core local/private system and Amazon EC2 public cloud demonstrates the scheduling quality of PANDA is guaranteed to be within a measurable distance from the optimal solution. Results obtained from our experiments show such quality is 8% or less from the optimum.