Skip to Main Content
Accurate prediction of splice sites in DNA sequences is a challenging problem in bioinformatics. The splice site prediction still faces many tough challenges, and above all is that it is not clear how many and which features are relevant with the splicing process. So feature selection is often used to improve the prediction accuracy, and it will also provide us with useful biological knowledge. On the other hand, the parameters setting for the classifier always has a significant influence on the classification performance. Hence we used an UMDA-based method which selects the features and optimizes the parameters simultaneously. In addition, most splice sites have remarkable conservative properties and they can be correctly predicted only using conservative signal features around the splice sites, while others which have inconspicuous conservative properties might need some more complex features. Therefore, according to the differences of conservative properties in splice site signal sequences, a layered prediction algorithm based on feature selection and parameter optimization is proposed: UMDA SVM 2 layer algorithm. Our experiment results show that this two-layer algorithm which optimizes features and parameters simultaneously achieved better performance than some current methods.