Skip to Main Content
MapReduce has gradually become the framework of choice for "big data". The MapReduce model allows for efficient and swift processing of large scale data with a cluster of compute nodes. However, the efficiency here comes at a price. The performance of widely used MapReduce implementations such as Hadoop suffers in heterogeneous and load-imbalanced clusters. We show the disparity in performance between homogeneous and heterogeneous clusters in this paper to be high. Subsequently, we present MARLA, a MapReduce framework capable of performing well not only in homogeneous settings, but also when the cluster exhibits heterogeneous properties. We address the problems associated with existing MapReduce implementations affecting cluster heterogeneity, and subsequently present through MARLA the components and trade-offs necessary for better MapReduce performance in heterogeneous cluster and cloud environments. We quantify the performance gains exhibited by our approach against Apache Hadoop and MARIANE in data intensive and compute intensive applications.