Skip to Main Content
Recent advances in high-throughput technology have dramatically increased the quantity of available protein-protein interaction (PPI) data and stimulated the development of many methods for predicting protein complexes, which are important in understanding the functional organization of protein-protein interaction networks in different biological processes. However, automated protein complex prediction from PPI data alone is significantly hindered by the high level of noise, sparseness, and highly skewed degree distribution of PPI networks. Here we present a novel network topology-based algorithm to remove spurious interactions and recover missing ones by computational predictions, and to increase the accuracy of protein complex prediction by reducing the impact of hub nodes. The key idea of our algorithm is that two proteins sharing some high-order topological similarities, which are measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. Applying our algorithm to a yeast protein-protein interaction network, we found that the interactions in the reconstructed PPI network have more significant biological relevance than the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species, and known protein complexes. Comparison with several existing methods show that the network reconstructed by our method has the highest quality. Finally, using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes.
Date of Conference: 4-7 Oct. 2012