Data-intensive workflows often need to process and transfer large amount of datasets. Existing scheduling strategy attempts to achieve high performance by scheduling tasks and move datasets on resources with the highest processing capacity. However, this strategy may not obtain the minimum makespan in some situation. In this paper, a SCP Based Critical Path (SBCP) scheduling approach for data-intensive workflows which consists of two steps is proposed. The submitted tasks are first matched to one compute resource and a set of data hosts by a Set Cover Problem (SCP) based heuristic. Then the system schedules these tasks to the compute resource by a critical path workflow scheduling algorithm. The result of experiments shows that the proposed approach can shorten the data transferring time and cut down the total makespan of data-intensive workflow.
Published in:
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
(Volume:4
)
Date of Conference: 10-12 Aug. 2010