Skip to Main Content
MapReduce is an emerging programming model for data-intense application proposed by Google, which has attracted a lot of attention recently. MapReduce borrows from functional programming, where programmer defines Map and Reduce tasks executed on large set of distributed data. In this paper we propose an implementation of the MapReduce programming model. We present the architecture of the prototype based on Bit Dew, a middleware for large scale data management on Desktop Grid. We describe the set of features which makes our approach suitable for large scale and loosely connected Internet Desktop Grid: massive fault tolerance, replica management, barriers-free execution, latency-hiding optimisation as well as distributed result checking. We also present performance evaluation of the prototype both against micro-benchmarks and real MapReduce application. The scalability test shows that we achieve linear speedup on the classical Word Count benchmark. Several scenarios involving lagger hosts and host crashes demonstrate that the prototype is able to cope with an experimental context similar to real-world Internet.