Skip to Main Content
Data-intensive applications are becoming increasingly common in Grid environments. These applications require enormous volume of data for the computation. Most conventional meta-scheduling approaches are aimed at computation intensive application and they do not take data requirement of the applications into account, thus leading to poor performance. Efficient scheduling of data-intensive applications in Grid environments is a challenging problem. In addition to process utilization and average turnaround time, it is important to consider the worst-case turnaround time in evaluating the performance of Grid scheduling strategies. In this paper, we propose an adaptive scheduling scheme that takes into account both the computational requirements and the data requirements of the jobs while making scheduling decisions. In our scheme, data transfer is viewed in par with computation and explicitly considered when scheduling. Jobs are dispatched to the sites that are optimal in terms of both data transfer time and computation time. In addition, our scheme overlaps a job's data transfer time with its own queuing time and other jobs' computation time as much as possible. Trace-based simulations show that the proposed scheme can gain significant performance benefits for data-intensive jobs.