Skip to Main Content
Most of the knowledge intensive organizations are having their information resided in large text document repositories and most of these text repositories and databases are either unstructured or semi-structured. Recently various soft computing techniques have been used to improve information retrieval efficiency. More specifically genetic algorithms have been used for various information retrieval components like matching function learning, documents clustering, information extraction, query optimization [1 - 6]. In most of the cases in information retrieval matching function is based on term frequency. But the problem with this approach is that the syntactic information of the text document is lost and phrases are also not considered, so results in poor accuracy. In this paper we have proposed a new semantic based similarity measure in which each term can be a phrase or a single word and the weight assigned to each term is based on its semantic importance considering each sentence. We have used this semantic similarity measure along with other standard similarity measure as Jaccard and cosine to form the semantic-based-combined-similarity-measure. Standard genetic algorithm has been used to optimize the weight given for each similarity measure.