By Topic

Web Information Extraction Using Generalized Hidden Markov Model

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Ping Zhong ; Ph.D student, Computer Science Dept., the Graduate Center, the City Univ. of New York. e-mail: ; Jinlin Chen ; Terry Cook

Hidden Markov model (HMM) is an important approach for information extraction (IE). When applied to Web IE, several problems exist with HMM based approaches due to the lack of consideration on Web-specific features. In this paper we present a generalized hidden Markov model (GHMM) that extends traditional HMMs by making use of Web-specific information for Web IE. In our approach we use Web content block instead of term as basic extraction unit. Besides, instead of using the traditional sequential state transition order, we detect the state transition order of GHMM based on layout structure of the corresponding Web page. Furthermore, we use multiple emission features instead of single emission feature. In this way GHMM can better accommodate Web IE. Experiments show promising results comparing to traditional HMM based Web IE

Published in:

2006 1st IEEE Workshop on Hot Topics in Web Systems and Technologies

Date of Conference:

13-14 Nov. 2006