Skip to Main Content
Comparing biological sequences represents one of the most important tools in computational biology. By comparing the sequences, we identify similar subsequences which may lead to the identification of structures as well as similar functions. Sequence alignment has been the method of choice for testing similarity and gained a lot of trust among researchers, though this method suffers some shortcomings. In particular, having repetitions in the input sequences often leads to inaccurate results, especially if these repetitions are dispersed overall the sequence. In this paper, we are conducting a study of alternative methods based on compression techniques, borrowed from information theory, to identify accurate comparison of the sequences. We test the proposed technique on various datasets and illustrate that they outperform alignment based methods in several cases.