Skip to Main Content
Considering that proteins with similar 3D structures have similar functions or biological actions, classification of proteins based on 3D structure can lead biologists to the investigation of new protein's structural, evolutionary, and functional relatedness. However, 3D protein structure remains a hard task, because different protein classes may have different discriminant features and unbalanced data distribution. This means that standard pairwise distance computation like Euclidean distance does not always have equal strength on features in the feature space. Most existing 3D protein structure classification methods relies on a K-Nearest Neighbor (KNN) classifier with a Euclidean distance, which do not consider the above factors and results on low classification accuracy. To improve the KNN for 3D protein structure classification, we propose the Class Conditional Distance Metric (CCDM), which takes into account the within-class neighborhood distribution of the protein descriptors and iteratively estimates distance update terms, thereby modifying the within-class neighborhood structure. Experimental results show that our approach gives significantly better results than a standard KNN classifier and is comparable the state of the art in terms of accuracy on the FSSP/DALI protein data set.