Skip to Main Content
Almost every computation job requires input data in order to find the solution, and the computation cannot proceed without the required data becoming available. As a result a proper interleaving of data transfer and job execution has a significant impact on the overall efficiency. In this paper we analyze the computational complexity of the shared data job scheduling problem, with and without consideration of storage capacity constraint. We show that if there is an upper bound on the server capacity, the problem is NP-complete, even when each job depends on at most three data. On the other hand, if there is no upper bound on the server capacity, we show that there exists an efficient algorithm that gives optimal job schedule when each job depends on at most two data. We also give an efficient heuristic algorithm that gives good schedule for cases where there is no limit on the number of data a job may access.