Skip to Main Content
In this paper, we apply the least-square support vector machine (LS-SVM) to operon prediction of Escherichia coli (E.coli), with different combinations of intergenic distance, gene expression data, and phylogenetic profile. Experimental results demonstrate that the WO pairs tend to have shorter intergenic distances, higher correlation coefficient and much stronger relation of co-envoled between phylogenetic profiles. Also, we dealt with the data sets extracted from WOsÂ¿ and TUBsÂ¿, processed the intergenic distances with log-energy entropy, de-noised the Pearson correlation coefficients of two genes expression data with wavelet transform, and computed the Hamming distances of two phylogenetic profiles. Then we trained LS-SVM using part of the data sets and tested the trained classifier model using the rest data sets. It shows that different combinations of features could affect the prediction results. When the combination of intergenic distance, gene expression data and phylogenetic profile is taken as the input of LS-SVM in the linear kernel type, good results can be obtained, of which the accuracy, sensitivity and specificity are 92.34%, 93.54%, and 90.73%, respectively.