Skip to Main Content
Protein-protein interaction (PPI) database curation requires text-mining systems that can recognize and normalize interactor genes and return a ranked list of PPI pairs for each article. The order of PPI pairs in this list is essential for ease of curation. Most of the current PPI pair ranking approaches rely on association analysis between the two genes in the pair. However, we propose that ranking an extracted PPI pair by considering both the association between the paired genes and each of those genes' global associations with all other genes mentioned in the paper can provide a more reliable ranked list. In this work, we present a composite interaction score that considers not only the association score between two interactors (pair association score) but also their global association scores. We test three representative data fusion algorithms to estimate this global association score-two Borda-Fuse models and one linear combination model (LCM). The three estimation methods are evaluated using the data set of the BioCreative II.5 Interaction Pair Task (IPT) in terms of area under the interpolated precision/recall curve (AUC iP/R). Our experimental results indicate that using LCM to estimate the global association score can boost the AUC iP/R score from 0.0175 to 0.2396, outperforming the best BioCreative II.5 IPT system.
Computational Biology and Bioinformatics, IEEE/ACM Transactions on (Volume:9 , Issue: 6 )
Date of Publication: Nov.-Dec. 2012