Skip to Main Content
The indispensable prerequisites in characterizing information content of DNA molecules by computational methods are the numerical representations of symbolic DNA sequences. Current numerical representation methods for DNA sequences do not contain the genetic code context information, which may play an important role in defining protein coding regions. We propose a novel numerical representation of DNA sequences based on genetic code context within DNA sequences and explore the feasibility of applying this method to identify protein coding regions in genomes. Computational experiments indicate that incorporating genetic code information into numerical representations is a promising approach in which DNA sequences are uniquely represented and more information is represented so that digital processing tools can be applied to the periodicity analysis in DNA sequences effectively.