Skip to Main Content
Distributed systems have led to the adoption of cloud computing concepts among countless enterprises. A large number of companies have already benefited from delegating IT services to cloud service providers. At the same time, the interest on energy efficiency has dramatically increased. Energy efficiency in large distributed systems is a big concern for system engineers. In addition, the proliferation of distributed data processing frameworks such as MapReduce have led to a vast amount of research and practices. In this paper, we are particularly interested in providing energy proportionality for MapReduce. To provide energy proportionality, we propose Data Aware Scaling Down (DASCA), a scaling down framework for MapReduce. There are two problems we must address in order to support scaling down for MapReduce. The first is to choose a proper set of nodes to suspend, which we call candidate set. The second is to minimize the replica redistribution which occurs during the initiation of power save mode. To address these problems, we use the data awareness of the MapReduce framework. To address the first problem, we provide two greedy algorithms which exploit the data awareness of MapReduce. To address the second problem, we propose locality aware replica redistribution to efficiently redistribute the lost replicas while preserving the availability of replicas and performance of distributed processing.