Skip to Main Content
Processing of DNA sequences using traditional digital signal processing methods requires their conversion from a character string into numerical sequences as a first step. Many representations introduced previously assign values to indicate the four DNA nucleotides A, C, G, and T that impose mathematical structures not present in the actual DNA sequence. In this paper, almost all existing methods are compared for the purpose of identifying protein coding regions, using the discrete Fourier transform (DFT) based spectral content measure to exploit period-3 behaviour in the exonic regions for the GENSCAN test set. False positive vs. sensitivity, receiver operating characteristic (ROC) curve and exonic nucleotides detected as false positive results all show that the two newly proposed numerical of DNA representations perform better than the well-known Z-curve, tetrahedron, and Voss representations, with 66-75% less processing. By comparison with Voss representation, the proposed paired numeric method can produce relative improvements of up to 12% in terms of prediction accuracy of exonic nucleotides at a 10% false positive rate using the GENSCAN test set.