By Topic

A Categorization Algorithm for Harmful Text Information Filtering

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
2 Author(s)
Juan Du ; Software Coll., Northeast Pet. Univ., Daqing, China ; Zhi an Yi

Harmful text information filtering is a typical pattern recognition problem of small sample, the prediction result of classifier was biased towards the class with more samples, because of the samples that including the harmful information were difficult to gain. Construct virtual samples is an effective means to solve the problem of pattern recognition in the small sample, using the up-sampling method to construct virtual samples in the data layer, the traditional KNN algorithm has been improved: a small sample set is divided into clusters by using the K-means clustering, the virtual samples are generated and verified the validity in the cluster. The experimental results show that this method can construct the virtual samples which are similar to the real sample characteristics, and expand the small sample collection in order to effectively identify the harmful text information.

Published in:

Multimedia Information Networking and Security (MINES), 2012 Fourth International Conference on

Date of Conference:

2-4 Nov. 2012