As an alternative to search capability, many search engines are providing directory servers containing categorized Web documents for users to navigate and browse through. We are investigating three issues in portal site construction given a large collection of categorized Web documents: (1) distillation of important topics for each category of documents; (2) distillation of important documents/sites for these topics; and (3) automation of these two tasks. We have developed an automated technique for topics and Web site distillation. Our technique integrates Web document content analysis and link structure analysis. It considers local importance of keywords and their global distribution statistics on a given Web document category hierarchy
Published in:
Knowledge and Data Engineering Exchange, 1999. (KDEX '99) Proceedings. 1999 Workshop on
Date of Conference: 1999