Coding and non-coding gene prediction is still a challenge. Diverse computer-based tools have been created to screen sequences using elaborate strategies for gene prediction. Many of these implement various statistical tests to measure the plausibility of the prediction but until now, a comprehensive negative control did not exist. We developed an algorithm that generates sequences with characteristics of the intergenic regions of a genome, including nucleotide composition and typical inserted elements like interspersed repeats, low complexity sequences and pseudogenes. We also challenged some gene prediction programs to compare the artificial sequences with real intergenic regions.
Published in:
Genomic Signal Processing and Statistics, 2009. GENSIPS 2009. IEEE International Workshop on
Date of Conference: 17-21 May 2009