Loading [a11y]/accessibility-menu.js
Multi-label Classification for Android Malware Based on Active Learning | IEEE Journals & Magazine | IEEE Xplore

Multi-label Classification for Android Malware Based on Active Learning


Abstract:

The existing malware classification approaches (i.e., binary and family classification) can barely benefit subsequent analysis with their outputs. Even the family classif...Show More

Abstract:

The existing malware classification approaches (i.e., binary and family classification) can barely benefit subsequent analysis with their outputs. Even the family classification approaches suffer from lacking a formal naming standard and an incomplete definition of malicious behaviors. More importantly, the existing approaches are powerless for one malware with multiple malicious behaviors, while this is a very common phenomenon for Android malware in the wild. So that both of them actually cannot provide researchers with a direct and comprehensive enough understanding of malware. In this paper, we propose MLCDroid, an ML-based multi-label classification approach that can directly indicate the existence of pre-defined malicious behaviors. With an in-depth analysis, we summarize 6 basic malicious behaviors from real-world malware with security reports and construct a labeled dataset. We compare the results of 70 algorithm combinations to evaluate the effectiveness (best at 73.3%). Faced with the challenge of the expensive cost of data annotation, we further propose an active learning approach based on data augmentation, which can improve the overall accuracy to 86.7% with a data augmentation of 5,000+ high-quality samples from an unlabeled malware dataset. This is the first multi-label Android malware classification approach intending to provide more information on fine-grained malicious behaviors.
Page(s): 1 - 18
Date of Publication: 17 October 2022

ISSN Information: