Skip to Main Content
Hidden Markov models (HMM) have largely demonstrated their usefulness in the fields of statistics and pattern recognition, particularly for speech recognition and hand writing recognition. In the field of genetics, the same principles of statistics and probability can be applied. DNA primarily has four bases: adenine, guanine, thymine, and cytosine, which when paired together can form nucleotides. However, the length of a nucleotide chain can be uncertain. The DNA sequence constitutes the heritable genetic information in nuclei that forms the basis for the developmental programs of all living organisms. Determining the DNA sequence is therefore useful in studying fundamental biological processes, as well as in diagnostic or forensic research. In this study, we will utilize hidden Markov models (HMM) to determine DNA sequence likelihoods. A training sequence of nucleotide bases of the first 1000 bases of rice chromosomes will be used, and the transition and emission probabilities would determine a probable DNA sequence of the next 2000 bases. This sequence should be comparable to the actual sequence. However, experimentation did not show this to be the case, despite previous experiments showing otherwise. Only a fourth of a nucleotide sequence was ever classified correctly.