Skip to Main Content
Many companies use business intelligence (BI) to improve organization performance. BI relies on the data in the enterprise information system (EIS), and in order to guarantee data freshness, this data needs periodic updating. The EIS contains data warehouses frequently residing in the many machines in at least one data center. Many jobs, also called tasks or operations, run to bring data into the data warehouses. Some jobs may depend on others, whereas others may be independent. Traditionally, these jobs are scheduled based on the dependency but not on the critical path; thus, the conventional scheduling algorithm is nonoptimal. This leads to excessive runtime and nonoptimal usage of resources. Data processing jobs in the data warehouse system involve many resources, so it is necessary to find the best job-scheduling methodology. According to existing literature, job scheduling in the distributed data warehouse system has not been studied. This paper discusses how to use the historic data and critical path to generate an optimal scheduling plan involving the minimum number of machines while guaranteeing shortest runtime. Experimental results show the optimal scheduling algorithm can reduce runtime and optimize usage of resources.