Loading [a11y]/accessibility-menu.js
Phylogenetic models of rate heterogeneity: a high performance computing perspective | IEEE Conference Publication | IEEE Xplore

Phylogenetic models of rate heterogeneity: a high performance computing perspective


Abstract:

Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1...Show More

Abstract:

Inference of phylogenetic trees using the maximum likelihood (ML) method is NP-hard. Furthermore, the computation of the likelihood function for huge trees of more than 1,000 organisms is computationally intensive due to a large amount of floating point operations and high memory consumption. Within this context, the present paper compares two competing mathematical models that account for evolutionary rate heterogeneity: the Gamma and CAT models. The intention of this paper is to show that - from a purely empirical point of view - CAT can be used instead of Gamma. The main advantage of CAT over Gamma consists in significantly lower memory consumption and faster inference times. An experimental study using RAxML has been performed on 19 real-world datasets comprising 73 up to 1,663 DNA sequences. Results show that CAT is on average 5.5 times faster than Gamma and - surprisingly enough - also yields trees with slightly superior Gamma likelihood values. The usage of the CAT model decreases the amount of average L2 and L3 cache misses by factor 8.55
Date of Conference: 25-29 April 2006
Date Added to IEEE Xplore: 26 June 2006
Print ISBN:1-4244-0054-6
Print ISSN: 1530-2075
Conference Location: Rhodes, Greece

1. Introduction

Phylogenetic trees are used to represent the evolutionary history of a set of organisms (also called taxa). A multiple alignment of a small region of their DNA or protein sequences can be used as input for the computation of phylogenies. In a computational context phylogenetic trees are usually strictly bifurcating unrooted trees. The organisms of the alignment are located at the tips and the inner nodes represent extinct common ancestors. The branches of the tree represent the time which was required for the mutation of one species into another-new-one. The inference of phylogenies with computational methods has many important applications in medical and biological research, such as e.g. drug discovery and conservation biology (see [1] for a summary). Due to the rapid growth of available sequence data and the constant improvement of multiple alignment methods it has now become feasible to compute large trees which comprise more than 1,000 organisms. The computation of the tree-of-life containing representatives of all living beings on earth is one of the grand challenges in Bioinformatics.

Contact IEEE to Subscribe

References

References is not available for this document.