Skip to Main Content
Search engines use content and links to search, rank, cluster, and classify Web pages. These information discovery applications use similarity measures derived from this data to estimate relatedness between pages. However, little research exists on the relationships between similarity measures or between such measures and semantic similarity. The author analyzes and visualizes similarity relationships in massive Web data sets to identify how to integrate content and link analysis for approximating relevance. He uses human-generated metadata from Web directories to estimate semantic similarity and semantic maps to visualize relationships between content and link cues and what these cues suggest about page meaning. Highly heterogeneous topical maps point to a critical dependence on search context.