Skip to Main Content
Malicious software (malware) represents a threat to the security and privacy of computer users. Traditional signature-based and heuristic-based methods are unsuccessful in detecting some forms of malware. This paper presents a malware detection approach based on supervised learning. The main contributions of the paper are an ensemble learning algorithm, two pre-processing techniques, and an empirical evaluation of the proposed algorithm. Sequences of operational codes are extracted as features from malware and benign files. These sequences are used to produce three different data sets with different configurations. A set of learning algorithms is evaluated on the data sets and the predictions are combined by the ensemble algorithm. The predicted output is decided on the basis of veto voting. The experimental results show that the approach can accurately detect both novel and known malware instances with higher recall in comparison to majority voting.