Skip to Main Content
Selection of resources for execution of scientific workflows in data grids becomes challenging with the exponential growth of files as a result of the distribution of scientific experiments around the world. With more runs of these experiments, huge number of data-files produced can be made available from numerous resources. There is lack of work in optimal selection of data-hosts and compute resources in the presence of replicated files for scientific workflows. Foreseeing this, the thesis work aims at proposing novel workflow scheduling algorithms on data grids with large number of replicated files that incorporates practical constraints in heterogeneous environments such as grids. In this paper, we define the workflow scheduling problem statement in the context of data grids, supported by motivating applications; list research issues arising from practical constraints; propose two algorithms for experimenting with the problem; report simulation results obtained as a result of preliminary studies. The results are promising enough to motivate us to research on the problem stated.