Skip to Main Content
Apache Hadoop is a framework for managing large scale storage based datacenters whose primary job is to deliver data to clients. In such systems, the primary job is to associate each data request to a specific data replica among many available replicas. This assignment impacts the workload and power distribution across the storage servers. In this paper, we explore thermal and power aware task scheduling for Hadoop based storage centric datacenters. In order to maintain the reliability of datacenters, we would like to make sure that each node in the datacenter operates at a temperature below a certain temperature threshold. At the same time, we would like to minimize the total power consumption in the air conditioning (A/C) system that provides the cooling for maintaining the temperature. We formulate the resultant optimization problem as an Integer Linear Programming problem and develop minimum cost flow based heuristic to solve the problem. The experimental result shows that, our method forces the A/C system to output air temperature only 0.69K lower on average compared to the optimal ILP solution. However, the runtime of our method is only 1%-2.5% of the runtime using ILP solver. Also, random selection of data replica for each data request results in the required A/C output air temperature to be 6.35K lower than our method, which forces the A/C system to work harder.