By Topic

Data extraction from Web forums based on similarity of page layout

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Yun Wang ; Information Processing Dept, Information Technology Institute, Zhengzhou, China ; Bicheng Li ; Chen Lin

Web forums contain a wealth of information resources. Forum data can be widely used in areas such as Internet community mining, information retrieval and public opinion analysis and so on. This paper solves the problems of what should be extracted and how to extract from the Web forums. Aimed at the limitation of current methods to extract data from Web forums, an automated method is proposed to extract metadata from Web forum pages. The method processes in two steps. We firstly recognizes the topic-block by making full use of the special layout of the Web forum pages, then extract metadata from the topic-block by making use of statistical regularity of the metadata, the whole process done without manual work. Experimental results show that this method performs well both in adjustability and accuracy.

Published in:

Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on

Date of Conference:

24-27 Sept. 2009