Current protein structure prediction methods often generate a large number of structural candidates (decoys), and then select near-native decoys through clustering. Classical clustering methods for decoys are time consuming due to the pair-wise distance calculation between decoys. In this study, we developed a novel method for very fast decoy clustering. Instead of the commonly used pair-wise RMSD (pRMSD) values, we propose a new distance measure C-score based on contact maps of decoys. The analysis indicates that C-score and pRMSD are highly correlated and the clusters obtained from pRMSD and C-score are highly similar. Our C-score based clustering achieves a calculation time linearly proportional to the number of decoys while obtaining almost the same accuracy for near-native model selection in comparison to existing methods such as SPICKER and Calibur with calculation time quadratic to the number of decoys. Our method has been implemented in a package named MUFOLD-CL, available at http://mufold.org/clustering.php.
Published in:
Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
Date of Conference: 12-15 Nov. 2011