Skip to Main Content
The basic tasks in molecular biology data analysis such as multiple sequence alignment (MSA) and phylogenetic tree inference. The quality of the phylogenetic tree depends on the quality of the MSA. Hidden Markov model (HMM) is one of the good methods to produce the MSA, but having sequences with low similarity, this method will produce less optimal MSA. This research works on performing multiple alignments of protein sequences with low similarity using the HMM, which can be used as input and it produces more accurate phylogenetic tree. The research is carried out by building augmented set. The parameters are the number of child sequences and the percentage of mutation applied in child sequence. The mutation process is based on substitution matrix BLOSUM 80. Augmented set used as input into the HMM to obtain the MSA. Baum welch learning algorithm is used to estimate the parameters in HMM. While Viterbi algorithm is used to arrange the alignment from unaligned sequences. The prototype tool is built using Java programming language and utilizing Biojava library. In this research, the accuracy of phylogenetic trees using MSA with augmented set is compared with the MSA without augmented set. There are two phylogenetic tree inference methods used in here. First, neighbour joining is conducted using ClustalX tool. Second, parsimony methods is conducted using Phylip Protpars tool. The data are the amino acid sequences of ribosomes 16S from mitochondria. The accuracy of phylogenetic tree using neighbour joining method increases when the datasets with criteria : the number of sequences and HDS (high diverge sequence) are small enough, and the difference between maximum length and average length of sequences is small enough. While the accuracy of phylogenetic trees using the augmented set and the parsimony method can increase or decrease arbitrarily.