Skip to Main Content
One method for discovering knowledge in structural data is the identification of common substructures, or subgraphs, within the data. Once identified, these substructures can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. In this article, we describe the SUBDUE system that discovers interesting substructures in structural data. SUBDUE discovers substructures that compress the original database and represent interesting structural concepts in the data. By compressing previously discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data, The capabilities of SUBDUE are used to discover patterns in protein and DNA databases.