By Topic

Multiple sequence alignment using Hidden Markov model with augmented set based on BLOSUM 80 and its influence on phylogenetic accuracy

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Afiahayati ; Dept. of Comput. Sci. & Electron., Gadjah Mada Univ., Yogyakarta, Indonesia ; Hartati, S.

The basic tasks in molecular biology data analysis such as multiple sequence alignment (MSA) and phylogenetic tree inference. The quality of the phylogenetic tree depends on the quality of the MSA. Hidden Markov model (HMM) is one of the good methods to produce the MSA, but having sequences with low similarity, this method will produce less optimal MSA. This research works on performing multiple alignments of protein sequences with low similarity using the HMM, which can be used as input and it produces more accurate phylogenetic tree. The research is carried out by building augmented set. The parameters are the number of child sequences and the percentage of mutation applied in child sequence. The mutation process is based on substitution matrix BLOSUM 80. Augmented set used as input into the HMM to obtain the MSA. Baum welch learning algorithm is used to estimate the parameters in HMM. While Viterbi algorithm is used to arrange the alignment from unaligned sequences. The prototype tool is built using Java programming language and utilizing Biojava library. In this research, the accuracy of phylogenetic trees using MSA with augmented set is compared with the MSA without augmented set. There are two phylogenetic tree inference methods used in here. First, neighbour joining is conducted using ClustalX tool. Second, parsimony methods is conducted using Phylip Protpars tool. The data are the amino acid sequences of ribosomes 16S from mitochondria. The accuracy of phylogenetic tree using neighbour joining method increases when the datasets with criteria : the number of sequences and HDS (high diverge sequence) are small enough, and the difference between maximum length and average length of sequences is small enough. While the accuracy of phylogenetic trees using the augmented set and the parsimony method can increase or decrease arbitrarily.

Published in:

Distributed Framework and Applications (DFmA), 2010 International Conference on

Date of Conference:

2-3 Aug. 2010