Due to the advantages of cost-effectiveness, on-demand resource provision and easy for sharing, cloud computing has grown in popularity with research community for deploying scientific applications such as workflows. When such interest continues growing and workflows are widely performed in collaborative cloud environments that consist of a number of data centers, there is an urgent need for exploiting strategies which can place the application data across globally distributed data centers and schedule tasks according to the data layout to reduce both the latency and make span for workflow execution. In this paper, by utilising dependencies among datasets and tasks, we propose an efficient data and task co scheduling strategy that can place input datasets in a load balance way and meanwhile group the mostly related datasets and tasks together. We build a simulation environment on Tianhe supercomputer to evaluate the proposed strategy and run simulations by random and realistic workflows. The results demonstrate that the proposed strategy can effectively improve workflows performance while reducing the total volume of data transfer across data centers.
Published in:
Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on
Date of Conference: 12-14 Dec. 2011