Skip to Main Content
Data Grid consists of geographically distributed computing and storage resources that are used in large scale scientific applications. Job scheduling and data replication are two well-known techniques to boost the performance of Data Grid. There has been extensive research on integrating both techniques to further improve performance in Data Grid. However, most of the current work are heuristic based without performance guarantees. In this paper, we propose to integrate data replication and job scheduling into one framework to minimize the total job execution time in Data Grid. We refer to the problem as Integrated Scheduling and Replication Problem. This problem is NP-hard. We first propose a job scheduling and data replication algorithm, which not only has theoretically provable performance but also dramatically reduces the time complexity compared to that of the optimal algorithm. We then design a series of polynomial heuristic algorithms. Using extensive simulations, we demonstrate that among the heuristic algorithms, the integrated replication and scheduling algorithm performs closest to the one with performance guarantee.