Skip to Main Content
Distributed data processing system is becoming one of the most important components for data-intensive computational tasks in the enterprise software infrastructure. Deploying and operating such systems require large amount of costs, including hardware costs to build clusters and energy costs to run clusters. To make these systems sustainable and scalable, power management has been an important research problem. In this paper, we take Hadoop as an example to illustrate the power peak problem which causes power inefficiency and provides in-depth analysis to explain issues with existing system designs. We propose a novel power capping module in the Hadoop scheduler to mitigate power peaks. Extensive simulation studies show that our proposed solution can effectively smooth the power consumption curve and mitigate temporary power peaks for Hadoop clusters.