By Topic

Graph-based Semi-supervised Learning Algorithm for Web Page Classification

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Rong Liu ; Digital Eng. Res. Center, Huazhong Univ. of Sci. & Technol., Wuhan ; Jianzhong Zhou ; Ming Liu

Many application domains such as Web page classification suffer from not having enough labeled training examples for learning. However, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. As a result, there has been a great deal of work in resent years on semi-supervised learning. This paper proposes a graph-based semi-supervised learning algorithm that is applied to the Web page classification. Our algorithm uses a similarity measure between Web pages to construct a k-nearest neighbor graph. Labeled and unlabeled Web pages are represented as nodes in the weighted graph, with edge weights encoding the similarity between the Web pages. In order to use unlabeled data to help classification and get higher accuracy, edge weights of the graph are computed through combining weighting schemes and link information of Web pages. The learning problem is then formulated in terms of label propagation in the graph. By using probabilistic matrix methods and belief propagation, the labeled nodes push out labels through unlabeled nodes. Our preliminary experiments on the WebKB dataset show that the algorithm in this paper can effectively exploit unlabeled data in addition to labeled ones to get higher accuracy of Web page classification

Published in:

Intelligent Systems Design and Applications, 2006. ISDA '06. Sixth International Conference on  (Volume:2 )

Date of Conference:

16-18 Oct. 2006