By Topic

Using Web Search Results and Genetic Algorithm to Improve the Accuracy of Chinese Spam Email Filters

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Kai-Shin Lu ; Dept. of Comput. Sci., Iowa State Univ., Ames, IA, USA ; Chang, C.K.

In recent years, many researches were focusing on developing effective spam email filters because spam emails became serious problems. Among all existing solutions, studies showed that the Naive Bayesian spam email filter was the best one because it could achieve the highest accuracy in filtering out English spam emails. However, how to filter out Chinese spam emails is still an open problem since it is difficult to correctly segment Chinese sentences. This paper presents a Web-Search-Results (WSR) based Genetic Algorithm (GA) Chinese sentence tokenizer which can automatically segment Chinese sentences. A fuzzy-splitting algorithm which helps GA handle longer sentences is also proposed. Besides, we show the implementation details of this tokenizer along with a standard Naive Bayesian email filter, and then we introduce the training and evaluation process. Evaluations on a real world spam email dataset "CCERT Data Sets of Chinese Emails" (CDSCE) showed that our approach effectively improves the accuracy of identifying Chinese spam emails.

Published in:

Computer Software and Applications Conference Workshops (COMPSACW), 2011 IEEE 35th Annual

Date of Conference:

18-22 July 2011