By Topic

WISE: Hierarchical Soft Clustering of Web Page Search Results Based on Web Content Mining Techniques

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Campos, R. ; Centre of Human Language Technol. & Bioinformatics, Univ. of Beira Interior ; Dias, Gael ; Nunes, C.

Typically, search engines are low precision in response to a query, retrieving lots of useless Web pages, and missing some other important ones. In this paper, we study the problem of the hierarchical clustering of Web pages search results. In particular, we propose an architecture called WISE, a meta-search engine that automatically builds clusters of related Web pages embodying one meaning of the query. These clusters are then hierarchically organized and labeled with a phrase representing the key concept of the cluster and the corresponding Web documents. The system which is a Web-based interface (soon available at wise.di.ubi.pt), introduces some interesting new ideas, such as the preselection of the retrieved Web pages, the capacity to statistically detect phrases within documents and the representation of documents based on their most relevant key concepts by using Web content mining techniques. The final step of the system is supported by a graph-based overlapping clustering algorithm which groups the selected documents into a hierarchy of clusters

Published in:

Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on

Date of Conference:

18-22 Dec. 2006