Skip to Main Content
MapReduce is a very popular parallel programming model for processing large data sets. This paper discusses strategies in implementing a MapReduce runtime system using Message Passing Interface (MPI) library. The implementation uses blocking communication function in MPI, e.g. MPI_Send and MPI_Recv, to transfer intermediate data, so as to make the communication between mappers and reducers in MapReduce model much more efficient. Experiment results indicate that our MPI implementation performs better than Hadoop when the data volume is below 60MB, and perform five times better then native Hadoop when the input size is below 5MB.