By Topic

A boosted semi-supervised learning framework for web page filtering

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Zhu He ; Nat. Lab. of Pattern Recognition, Chinese Acad. of Sci., Beijing, China ; Xi Li ; Weiming Hu

The World Wide Web provides great convenience for users to obtain information. However, there exists much harmful information on the Internet, such as pornographic content and prohibited drugs' information. Thus, how to filter harmful Web pages on the Internet is quite an important issue. In general, the problem of harmful Web page filtering is converted to that of Web page classification, which needs plenty of well labeled training samples. However, the cost of labeling a large set of Web pages is very expensive. To address this problem, we adopt a semi-supervised framework for Web page filtering. In this framework, each Web page is represented by bags of different features, extracted using its HTML structure. Then a semi-supervised learning strategy is taken for efficiently obtaining well labeled training samples. Finally, a boosting classifier is utilized for harmful Web page filtering. Experiments have demonstrated the effectiveness of our framework.

Published in:

Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference on

Date of Conference:

11-14 Oct. 2009