Measuring semantic similarity between words plays vital role in information retrieval and natural language processing. The existing system uses page counts and snippets retrieved by a search engine to measure semantic similarity between words. Various similarity scores are calculated from the page counts retrieved by the search engine for the queried conjunctive words. A lexical pattern extraction algorithm identifies the patterns from the snippets. Different patterns showing the same semantic relation are clustered using a lexical pattern clustering algorithm. The existing system makes use of Support Vector Machines to combine the similarity scores from page counts and clusters of patterns from snippets for measuring similarity. We propose a different machine learning approach called Latent Structural Support Vector Machine which can handle the missing data values which occurs frequently in statistical data analysis. The proposed system also makes a comparative study between similarity results from both SVM and LS-SVM.
Published in:
Computer Communication and Informatics (ICCCI), 2012 International Conference on
Date of Conference: 10-12 Jan. 2012