Skip to Main Content
In Bioinformatics, the prediction of protein function is considered a very important task but also difficult. Using a set of enzymes represented by Hydrolase, Isomerase, Ligase, Lyase, Transferase and Oxidoreductase classes, previously used by Dobson et al., this paper proposes a self-learning process able to predict their classes, based on their primary and secondary structures, through a Support Vector Machine (SVM) classifier and genetic algorithm. An SVM can be characterized as a supervised machine learning algorithm capable of resolving linear and non-linear classification problems. During the learning process, both the training data and the corresponding output are presented to the SVM to allow its parameters to be adjusted. This study utilized genetic algorithms - optimization heuristics often used to estimate parameters - to adjust the main parameters of the classifier such as kernel function type and parameter C, which provides the relationship between the training error and the margin of separation between classes. In this specific prediction problem, the results indicate that the best function is an RBF where width is 6.1 and C is 6.9. Using these parameters, the classifier obtains an average accuracy of 79.74%.