We proposed a new measure (SimPLD) for calculating the semantic similarity of terms in gene ontology (GO) based on the depth of least common ancestor (LCA) of two terms and the path length between them in GO hierarchy. The similarity between genes is computed based on this measure when it is applied to the GO-terms related to those genes. The method is based on the average of SimPLD between the GO terms annotated for both genes in a given gene pair. We evaluated the proposed method with a series of experiments on large groups of genes and proteins from two genomes: Saccharomyces database (SGD) and Drosophila Melanogaster (FlyBase); and one dataset of Human-Yeast protein pairs. The experimental results proved that the method has fairly impressive agreement with Blast sequence similarity. Therefore SimPLD can be used as an automated tool for determining the similarity between genes and proteins.
Published in:
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Date of Conference: 14-17 Oct. 2007