By Topic

Study of Web Page Information topic extraction technology based on vision

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Qingshui Li ; Comput. Sci. & Technol. Coll., Zhejiang Univ. of Technol., Hangzhou, China ; Kai Wu

The vision information of Web page is applied for information extraction, which avoids using the sophisticate natural language processing technology. This paper combines the natural language processing technology with vision character of HTML page in the application of information extraction for Web page, we carried out relevant research. We propose a Web Page Information extraction algorithm based on vision character, we use the vision character rule of web page, in respect of the detailed problem of coarse-grained web page segmentation and the restructure problem of the smallest web page segmentation, we analyze the vision character of page block and finally accurate determine the topic data region. After using the information extraction technology of web page, it reduces the information block of web page content and thus reduces the cost of index generating, and also increases the hit rate of search engine.

Published in:

Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on  (Volume:9 )

Date of Conference:

9-11 July 2010