By Topic

GA-based feature subset selection in a spam/non-spam detection system

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Behjat, A.R. ; Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia ; Mustapha, A. ; Nezamabadi-pour, H. ; Sulaiman, M.N.
more authors

Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes.

Published in:

Computer and Communication Engineering (ICCCE), 2012 International Conference on

Date of Conference:

3-5 July 2012