Skip to Main Content
Protein-Protein Interaction (PPI) extraction from biomedicine literature can supply the biomedicine researcher with useful information rapidly. This paper presents a PPI extraction system based on the ensemble kernel model and active learning. Firstly, the ensemble kernel within SVM classifier combines the lexical feature-based kernel and the path-based kernel. Experimental results show that the F-score of PPI extraction using ensemble kernel model on AIMED, IEPA and BCPPI corpora are 64.50%, 69.74% and 60.38% respectively with 10-fold cross-validation, which are better than the lexical feature-based kernel and the path-based kernel separately. As the above ensemble kernel model based on SVM needs large labeled data and it is expensive to label data manually, we integrate active learning into the ensemble kernel model. The active learning method uses the uncertainty-based sampling strategy. The experimental results integrating the active learning show that the F-score on AIMED, IEPA and BCPPI corpora are 65.24%, 70.19% and 61.87% respectively, which are better than those using the ensemble kernel model with the passive learning, and meantime reduce the labeling data by 20%, 30% and 30%, respectively.