Skip to Main Content
Web forums contain a wealth of information resources. Forum data can be widely used in areas such as Internet community mining, information retrieval and public opinion analysis and so on. This paper solves the problems of what should be extracted and how to extract from the Web forums. Aimed at the limitation of current methods to extract data from Web forums, an automated method is proposed to extract metadata from Web forum pages. The method processes in two steps. We firstly recognizes the topic-block by making full use of the special layout of the Web forum pages, then extract metadata from the topic-block by making use of statistical regularity of the metadata, the whole process done without manual work. Experimental results show that this method performs well both in adjustability and accuracy.
Date of Conference: 24-27 Sept. 2009