By Topic

Detecting the content related parts of Web pages

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Yong Li ; Fac. of Sci. & Technol., Macau Univ., Macau ; Zhiguo Gong ; Ke Qi

Many Web pages are semantic diverse. That is, the whole content of a Web page is not consistent to address one topic. However, current search engines are page-oriented (other than topic-oriented). But, most Web users retrieve their target information by topics. Therefore, how to partition Web pages by semantics is one of interesting research topics. In this paper, we firstly build a tree (called semantic tree, ST) to partition the Web page into the content parts (called semantic part, SP) based on the Web page tags. Then we analyze the characteristics of the words (or terms) appearing on the Web page in order to build a term weighting formula. Based on these term weight values we employ the similarity formula to calculate the semantic similar degree between each two SPs. Finally, we consider the balance point of precision and recall as the reference value of the similarity - threshold. Through the work above we can find the content-related parts (or segmentations) of a Web page. And we achieved a satisfied result.

Published in:

Services Systems and Services Management, 2005. Proceedings of ICSSSM '05. 2005 International Conference on  (Volume:2 )

Date of Conference:

13-15 June 2005