By Topic

A Generalized Hidden Markov Model Approach for Web Information Extraction

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Ping Zhong ; City University of New York, USA ; Jinlin Chen

A generalized hidden Markov model (GHMM) which extends traditional HMMs by making use of Web-specific information for Web information extraction is presented in this paper. Web content blocks are used instead of content terms as basic extraction unit in our approach. Besides, instead of using the traditional sequential state transition order, the state transition orders of GHMMs are detected based on layout structures of the corresponding Web pages. Furthermore, multiple emission features are applied instead of single emission feature. In this way GHMMs can better accommodate Web information extraction. Experiments show promising results of GHMMs

Published in:

2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)

Date of Conference:

18-22 Dec. 2006