As the rapid development of hardware and network technology, cloud computing has become an important research topic. For applications of large-scale data processing, such as data warehouse, Map Reduce is the most famous platform for parallel data processing in cloud computing. To support the star-join queries in data warehouse, Scatter-Gather-Merge (SGM) proposes an efficient algorithm on the Map Reduce framework. However, SGM supports only the equi-join queries. Nonequi-join queries may cause SGM to fail. In this paper, we propose a method to cope with theta-join queries, i.e., both equi-join and nonequi-join queries. Our proposed method uses a novel manipulation of keys for partitioning data. The key manipulation matches up the Map Reduce paradigm, and makes theta-join queries workable on the Map Reduce platform. Our experimental results show that the proposed method achieves similar performance to SGM, but our method supports more join-query types. Our method performs even better than SGM in some query types of high data selectivity.
Published in:
Computer, Consumer and Control (IS3C), 2012 International Symposium on
Date of Conference: 4-6 June 2012