Skip to Main Content
Optimizing ordered throughput not only improves the system efficiency but also makes cloud bursting transparent to the user. This is critical from the perspective of user fairness in customer-facing systems, correctness in stream processing systems, and so on. In this paper, we consider optimizing ordered throughput for near real-time, data-intensive, independent computations using cloud bursting. Intercloud computation of data-intensive applications is a challenge due to large data transfer requirements, low intercloud bandwidth, and best-effort traffic on the Internet. The system model we consider is comprised of two processing stages. The first stage uses cloud bursting opportunistically for parallel processing, while the second stage (sequential) expects the output of the first stage to be in the same order as the arrival sequence. We propose three scheduling heuristics as part of an autonomic cloud bursting approach that adapt to changing workload characteristics, variation in bandwidth, and available resources to optimize ordered throughput. We also characterize the operational regimes for cloud bursting as stabilization mode versus acceleration mode, depending on the workload characteristics like the size of data to be transferred for a given compute load. The operational regime characterization helps in deciding how many instances can be optimally utilized in the external cloud.