By Topic

Dealing with Class Noise in Large Training Datasets for Malware Detection

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Gavrilut, D. ; Fac. of Comput. Sci., Al. I. Cuza Univ. of Iasi, Iasi, Romania ; Ciortuz, L.

This paper presents the ways we explored until now for detecting and dealing with the class noise found in large annotated datasets used for training the classifiers that we have previously designed for industrial-scale malware identification. First we established a number of distance-based filtering rules that allow us to identify different "levels'' of potential noise in the training data, and secondly we analysed the effects produced by either removal or "cleaning'' of the potentially-noised records on the performances of our simplest classifiers. We show that a careful distance-based filtering can lead to sensibly better results in malware detection.

Published in:

Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2011 13th International Symposium on

Date of Conference:

26-29 Sept. 2011