BINGO!: bookmark-induced gathering of information | IEEE Conference Publication | IEEE Xplore

BINGO!: bookmark-induced gathering of information


Abstract:

Focused (thematic) crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It involves the automatic classification of visit...Show More

Abstract:

Focused (thematic) crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It involves the automatic classification of visited documents into a user- or community-specific topic hierarchy (ontology). The quality of training data for the classifier is the most critical issue and a potential bottleneck for the effectivity and scale of a focused crawler. This paper presents the BINGO! approach to focused crawling that aims to overcome the limitations of initial training data. To this end, BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic "archetypes" and uses them for periodically re-training the classifier; this way the crawler is dynamically adapted based on the most significant documents seen so far. Two kinds of archetypes are considered: good authorities as determined by employing Kleinberg's (1999) link analysis algorithm, and documents that have been automatically classified with high confidence using a linear SVM classifier. Our approach is fully implemented in the BINGO! system, and our experiments indicate that the dynamic enhancement of training data based on archetypes extends the "knowledge base" of the classifier by a substantial margin without loss of classification accuracy.
Date of Conference: 14-14 December 2002
Date Added to IEEE Xplore: 25 February 2003
Print ISBN:0-7695-1766-8
Conference Location: Singapore

Contact IEEE to Subscribe

References

References is not available for this document.