The feasibility of using dataflow systems for running complex graph queries is studied in this paper. A general query optimization framework for parallel dataflow systems is also proposed. The proposed methods are used to optimize a suite of benchmark queries, and their effectiveness is evaluated. The performance of the optimized queries is measured on an actual parallel dataflow machine using a large semantic graph and compared to that of equivalent SQL queries on a high-end parallel relational database system. The study has revealed that dataflow system can achieve significant performance improvement over state-of-art database systems and can be a viable and scalable alternative to run large complex graph queries.
Published in:
System Sciences (HICSS), 2010 43rd Hawaii International Conference on
Date of Conference: 5-8 Jan. 2010