By Topic

A parameter-free hybrid clustering algorithm used for malware categorization

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
ZhiXue Han ; Dept. of Comput. Sci., Xiamen Univ., Xiamen, China ; Shaorong Feng ; Yanfang Ye ; Qingshan Jiang

Nowadays, numerous attacks made by the malware, such as viruses, backdoors, spyware, trojans and worms, have presented a major security threat to computer users. The most significant line of defense against malware is antivirus products which detects, removes, and characterizes these threats. The ability of these AV products to successfully characterize these threats greatly depends on the method for categorizing these profiles of malware into groups. Therefore, clustering malware into different families is one of the computer security topics that are of great interest. In this paper, resting on the analysis of the extracted instruction of malware samples, we propose a novel parameter-free hybrid clustering algorithm (PFHC) which combines the merits of hierarchical clustering and K-means algorithms for malware clustering. It can not only generate stable initial division, but also give the best K. PFHC first utilizes agglomerative hierarchical clustering algorithm as the frame, starting with N singleton clusters, each of which exactly includes one sample, then reuses the centroids of upper level in every level and merges the two nearest clusters, finally adopts K-means algorithm for iteration to achieve an approximate global optimal division. PFHC evaluates clustering validity of each iteration procedure and generates the best K by comparing the values. The promising studies on real daily data collection illustrate that, compared with popular existing K-means and hierarchical clustering approaches, our proposed PFHC algorithm always generates much higher quality clusters and it can be well used for malware categorization.

Published in:

Anti-counterfeiting, Security, and Identification in Communication, 2009. ASID 2009. 3rd International Conference on

Date of Conference:

20-22 Aug. 2009