Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics
Song, Y.S.
Lyngso, R.
Hein, J.
Dept. of Comput. Sci., California Univ., Davis, CA;
This paper appears in: Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publication Date: July-Sept. 2006
Volume: 3,
Issue: 3
On page(s): 239-251
ISSN: 1545-5963
INSPEC Accession Number: 9067095
Digital Object Identifier: 10.1109/TCBB.2006.31
Current Version Published: 2006-08-07
Abstract
Given a set D of input sequences, a genealogy for D can be constructed backward in time using such evolutionary events as mutation, coalescent, and recombination. An ancestral configuration (AC) can be regarded as the multiset of all sequences present at a particular point in time in a possible genealogy for D. The complexity of computing the likelihood of observing D depends heavily on the total number of distinct ACs of D and, therefore, it is of interest to estimate that number. For D consisting of binary sequences of finite length, we consider the problem of enumerating exactly all distinct ACs. We assume that the root sequence type is known and that the mutation process is governed by the infinite-sites model. When there is no recombination, we construct a general method of obtaining closed-form formulas for the total number of ACs. The enumeration problem becomes much more complicated when recombination is involved. In that case, we devise a method of enumeration based on counting contingency tables and construct a dynamic programming algorithm for the approach. Last, we describe a method of counting the number of ACs that can appear in genealogies with less than or equal to a given number R of recombinations. Of particular interest is the case in which R is close to the minimum number of recombinations for D
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.