Skip to Main Content
Graphs are being increasingly used to model a wide range of scientific data. Such widespread usage of graphs has generated considerable interest in mining patterns from graph databases. While an array of techniques exists to mine frequent patterns, we still lack a scalable approach to mine statistically significant patterns, specifically patterns with low p-values, that occur at low frequencies. We propose a highly scalable technique, called GraphSig, to mine significant subgraphs from large graph databases. We convert each graph into a set of feature vectors where each vector represents a region within the graph. Domain knowledge is used to select a meaningful feature set. Prior probabilities of features are computed empirically to evaluate statistical significance of patterns in the feature space. Following analysis in the feature space, only a small portion of the exponential search space is accessed for further analysis. This enables the use of existing frequent subgraph mining techniques to mine significant patterns in a scalable manner even when they are infrequent. Extensive experiments are carried out on the proposed techniques, and empirical results demonstrate that GraphSig is effective and efficient for mining significant patterns. To further demonstrate the power of significant patterns, we develop a classifier using patterns mined by GraphSig. Experimental results show that the proposed classifier achieves superior performance, both in terms of quality and computation cost, over state-of-the-art classifiers.