Skip to Main Content
Purpose-built clusters permeate many of today's organizations, providing both large-scale data storage and computing. Within local clusters, competition for resources complicates applications with deadlines. However, given the emergence of the cloud's pay-as-you-go model, users are increasingly storing portions of their data remotely and allocating compute nodes on-demand to meet deadlines. This scenario gives rise to a hybrid cloud, where data stored across local and cloud resources may be processed over both environments. While a hybrid execution environment may be used to meet time constraints, users must now attend to the costs associated with data storage, data transfer, and node allocation time on the cloud. In this paper, we describe a modeling-driven resource allocation framework to support both time and cost sensitive execution for data-intensive applications executed in a hybrid cloud setting. We evaluate our framework using two data-intensive applications and a number of time and cost constraints. Our experimental results show that our system is capable of meeting execution deadlines within a 3.6% margin of error. Similarly, cost constraints are met within a 1.2% margin of error, while minimizing the application's execution time.