By Topic

Design and implementation of a web news extraction system

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Hua-lin Xia ; Inst. of Intell. Inf. Process., Beijing Inf. Sci. & Technol. Univ., Beijing, China ; Yang-sen Zhang

With the widespread use of Internet and the development of information technology, there is a tremendous amount of news information resource. The ability to quickly obtain useful resource from the huge news information is a crucial problem at present. Based on the analysis of the structure of the news portal page, this paper combines the technology of regular expressions and HTML-Parser, introduces a general method of news and information automatically extracted, and realizes an efficient general news information extraction system. The system can not only extract the headlines, time released, text content rightly, but also can extract the news information relevant or similar to the subject.

Published in:

Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on  (Volume:3 )

Date of Conference:

26-28 July 2011