By Topic

Barking Up The Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Kevin Liu ; The University of Texas at Austin, Austin ; Serita Nelesen ; Sindhu Raghavan ; C. Randal Linder
more authors

The current technique for estimating phylogenies from sequence data uses two phases: first, the sequences are aligned, and then the tree is estimated using the obtained alignment. More recently, however, several computational methods have been developed for simultaneous estimation of the alignment and the tree, of which POY (a heuristic for the NP-hard "minimum treelength" problem, which extends maximum parsimony (MP) so that gaps contribute to the cost) is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to the very simple two-phase method of estimating the alignment using ClustalW and then analyzing the resultant alignment using MP. They found that in the overwhelming majority of the cases, ClustalW + MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques (collectively referred to as "Direct Optimization") are not competitive with two-phase techniques. Our paper presents a simulation study in which we take a closer look at the points raised by Ogden and Rosenberg. Instead of focusing specifically on POY, we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Since this optimization depends upon the specific edit distance criterion used to score a tree, our study considers the impact of the gap penalty (in particular, affine versus simple) on the accuracy of the resultant alignment and tree that optimizes the treelength for that gap penalty function. Our study suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs, but also suggests the intriguing possibility that optimizing under an affine gap penalty might produce alignments that are not only better than ClustalW alignments, but competitive with (or perhaps better than) those produced by the best current alignmen t methods. This study also shows that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW + MP, and competitive with the current best two-phase methods.

Published in:

IEEE/ACM Transactions on Computational Biology and Bioinformatics  (Volume:6 ,  Issue: 1 )