Given the essential role of protein in life processes, computational assignment of protein functions has become one of the most important tasks in the area of bioinformatics. While Gene Ontology (GO) has been widely used in functional annotation, new approaches to address the problem of annotation incompleteness, which can leverage the support of the GO framework, are imminently required. In this paper, two new models are proposed to predict GO terms from domain content: a Correlation Coefficient based model (CC-M) and a Support Vector Machine (SVM) based model (SVM-M). We have developed our models in the form of predictors for all GO terms with manually curated annotations. In comparison with the Bayesian probabilistic approach published previously [Forslund et al., 2008], our methods are demonstrated to have better capability in dealing with incomplete training data. In particular, the CC-M method is suitable for GO terms with extremely low occurrence frequency, and the SVM-M method for the remaining GO terms. Therefore, CC-M and SVM-M are subsequently integrated into a single model (CC-SVM), with their respective advantages combined.
Published in:
Bioinformatics and Biomedicine Workshop, 2009. BIBMW 2009. IEEE International Conference on
Date of Conference: 1-4 Nov. 2009