Skip to Main Content
The task of finding transcription start sites (TSSs) can be modeled as a classification problem. Relevance vector machines (RVM) is a family of machine learning methods that represent a Bayesian approach to the training of general linear models (GLM). Based on the Markov-chain Monte Carlo(MCMC) sampler, propose a model for using the RVM to explore very large numbers of candidate features. The model applyes the power of the RVM to classifying and detecting interesting points and regions in biological sequence data. The model has been used successfully for testing predicting transcription start sites and other features in genome sequences. Our experimental results on real nucleotide sequences data show that our method improve the prediction accuracy greatly and our method performs significantly better than Promoter Inspector and CpG islands.