Skip to Main Content
Due to rapid growth of the Internet and new scientific/technological advances, there exist many new applications that model data as graphs, because graphs have sufficient expressiveness to model complicated structures. The dominance of graphs in real-world applications demands new graph processing techniques to access large data graphs effectively and efficiently. In this paper, we study a graph pattern matching problem, which is to find all patterns in a large data graph that match a user-given graph pattern. We propose new two-step R-join (reachability join) algorithms with a filter step (R-semijoin) and a fetch step (R-join) by utilizing a new cluster-based join index with graph codes in a relational database context. We also propose two optimization approaches to further optimize sequences of R-joins/R-semijoins. The first approach is based on R-join order selection followed by R-semijoin enhancement, and the second approach is to interleave R-joins with R-semijoins. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches.