Skip to Main Content
A successful application of data mining to bioinformatics is protein classification. A number of techniques have been developed to classify proteins according to important features in their sequences, secondary structures, or three-dimensional structures. In this paper, we introduce a novel approach to protein classification based on significant patterns discovered on the surface of a protein. We define a notion called α-surface. We discuss the geometric properties of α-surface and present an algorithm that calculates the α-surface from a finite set of points in R3. We apply the algorithm to extracting the α-surface of a protein and use a pattern discovery algorithm to discover frequently occurring patterns on the surfaces. The pattern discovery algorithm utilizes a new index structure called the ΔB+ tree. We use these patterns to classify the proteins. While most existing techniques focus on the binary classification problem, we apply our approach to classifying three families of proteins. Experimental results show the good performance of the proposed approach.