By Topic

Effect of Named Entities in Web Page Classification

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Sameendra Samarawickrama ; Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka ; Lakshman Jayaratne

With the rapid multiplication of World Wide Web, there is an increasing requirement for automated web page classification techniques. Web page classification is an important task in web mining and is utilized in many other areas of research as well. General practice during classification is to use lexical terms as features. In this paper we investigate the effect of considering named entities as features in web page classification. We have conducted tests in five different domains â"-baseball, football, health, politics and science â"-with web pages collected from online news providers. Our results show that incorporating named entities can result in slight gains in classifier performance for narrow domains, but is not always true for all the domains. Results also showed that classification based only on named entities can be good for certain domains (e.g., baseball) but is still lower than the lexical terms based representation.

Published in:

2012 Fourth International Conference on Computational Intelligence, Modelling and Simulation

Date of Conference:

25-27 Sept. 2012