Skip to Main Content
In kinship inference, genealogical relationships among organisms, typically in naturally-occurring populations, based on genetic marker information are identified. This task is crucial to conservation of endangered species and to understand the diversity of populations. Some of the simplest problems in this domain are sib group and half-sibgroup discover. Natural objectives in this domain are statistical ones (such as maximum likelihood) and combinatorial one (such as parsimony). Unfortunately, even with error-free data, the simplest combinatorial objective, minimizing the number of matings, is NP-hard to approximate; the statistical objectives are even more challenging. Here, a simple combinatorial approach for the problem is shown. By enumerating triplets of population members that could be siblings and that could not be siblings, putative sibgroups are greedily constructed, merging them until no further mergings can occur. The simple algorithm performs comparably to or better than integer programming methods for the problem, in a tiny fraction of the runtime. Moreover, with high probability, these methods find the correct sibgroups, under a straightforward and standard probabilistic model of inheritance and mating. Hence, the NP-hardness of the original problem is ameliorated in "typical" instances of the problem. This phenomenon is common to a large variety of bioinformatics problems, so a discussion of how to respond to this observation is presented.