Skip to Main Content
We describe a novel application of computational phylogenetic approaches to predict functional linkage among proteins, using proteomes derived from whole genome sequence data. The methods detect independent instances of the correlated gain or loss of pairs of genes on branches of a phylogenetic tree, on the assumption that functionally linked genes are often gained and lost at approximately the same time during evolution. According to this view, several correlated gain and/or loss events between a pair of genes suggests the gene products are functionally linked. We implement this approach using Dollo parsimony and maximum likelihood (ML) to seek correlated evolution among 21 eukaryotic species. We compare these approaches to each other and to the existing method of phylogenetic profiles, which seeks an across-species correlation but does not explicitly incorporate a phylogenetic tree. We assess all methods according to a positive test set of functionally linked protein pairs based on the MIPS catalogue of yeast protein complexes, and a negative test set of random protein pairs. Both Dollo parsimony and ML are able to achieve far greater specificity than the existing method of phylogenetic profiles. We show that ML is by far the best approach, provided that an appropriate model is used. Best results are obtained if the rate of gain of genes is fixed at a low value, to prevent modeling of multiple gains. With such a model, proteins with strong ML evidence of correlated evolution among eukaryotes are almost certainly functionally linked.