Skip to Main Content
The high-throughput technologies have led to vast amounts of protein-protein interaction (PPI) data, and a number of approaches based on PPI networks have been proposed for protein function prediction. However, these approaches do not work well if there is not enough PPI information. To address this issue, we propose a novel collective classification based approach that combines protein sequence information and PPI information to improve the prediction performance. We first reconstruct a PPI network by adding a number of computed edges based on protein sequence similarity, and then apply a collective classification algorithm to predict protein function based on the new PPI network. Experiments over two real datasets demonstrate that our approach outperforms most of existing approaches across a series of label situations, especially in sparsely-labeled networks where the existing approaches fail because of PPI information inadequacy. Experimental results also validate the robustness of our approach to the number of labeled proteins in PPI networks.