Skip to Main Content
Cluster analysis is widely used in the genetic researches, especially in phylogeny analysis. However, it is time-consuming to infer the evolutionary dendrogram from large biological data. Thus, in this paper, single nucleotide polymorphisms (SNPs), which can characterize the genetic variations, are mined from the genetic sequences to reduce the dimensions of original data in phylogeny analysis. The cost of phylogeny analysis can be reduced and the noises can be eliminated by the mining algorithm. The common used measures for subpopulation genetic divergences, such as the Euclidean distance, often lose important genetic variation information in clustering process. Therefore, the relative information entropy is used to evaluate the subpopulation genetic diversity of given species. A new genetic distance is defined to measure the subpopulation divergence by combining the genetic diversity evaluation value and the sequence structure similarity among subpopulations. The new genetic distance is employed by a hierarchical clustering algorithm to infer the dendrogram of given species in genetic phylogeny analysis. The experimental results of human data show that our method can accurately evaluate the genetic divergences among subgroups of given species, and produce reasonable evolutionary dendrogram in shorter time.
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on (Volume:5 )
Date of Conference: 10-12 Aug. 2010