Skip to Main Content
MapReduce is a widely used parallel programming model and computing platform. With MapReduce, it is very easy to develop scalable parallel programs to process data-intensive applications on clusters of commodity machines. However, it does not directly support heterogeneous related data sets processing, which is common in operations like spatial joins. This paper presents SJMR (spatial join with MapReduce), a novel parallel algorithm to relieve the problem. The strategies include strip-based plane sweeping algorithm, tile-based spatial partitioning function and duplication avoidance technology. We evalauted the performance of SJMR algorithm in various situations with the real world data sets. It demonstrates the applicability of computing-intensive spatial applications with MapReduce on small scale clusters.