Unsupervised Semantic Similarity Computation using Web Search Engines
Iosif, Elias
Potamianos, Alexandros
This paper appears in: Web Intelligence, IEEE/WIC/ACM International Conference on
Publication Date: 2-5 Nov. 2007
On page(s): 381-387
Location: Fremont, CA,
ISBN: 978-0-7695-3026-0
Digital Object Identifier: 10.1109/WI.2007.34
Current Version Published: 2008-01-07
Abstract
In this paper, we propose two novel web-based metrics for semantic similarity computation between words. Both metrics use a web search engine in order to exploit the retrieved information for the words of interest. The first metric considers only the page counts returned by a search engine, based on the work of [1]. The second downloads a number of the top ranked documents and applies "widecontext" and "narrow-context" metrics. The proposed metrics work automatically, without consulting any human annotated knowledge resource. The metrics are compared with WordNet-based methods. The metrics' performance is evaluated in terms of correlation with respect to the pairs of the commonly used Charles - Miller dataset. The proposed "wide-context" metric achieves 71% correlation, which is the highest score achieved among the fully unsupervised metrics in the literature up to date.
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.