Skip to Main Content
Malware is software designed to infiltrate or damage a computer system without the owner's informed consent (e.g., viruses, backdoors, spyware, trojans, and worms). Nowadays, numerous attacks made by the malware pose a major security threat to computer users. Unfortunately, along with the development of the malware writing techniques, the number of file samples that need to be analyzed, named "gray list," on a daily basis is constantly increasing. In order to help our virus analysts, quickly and efficiently pick out the malicious executables from the "gray list," an automatic and robust tool to analyze and classify the file samples is needed. In our previous work, we have developed an intelligent malware detection system (IMDS) by adopting associative classification method based on the analysis of application programming interface (API) execution calls. Despite its good performance in malware detection, IMDS still faces the following two challenges: (1) handling the large set of the generated rules to build the classifier; and (2) finding effective rules for classifying new file samples. In this paper, we first systematically evaluate the effects of the postprocessing techniques (e.g., rule pruning, rule ranking, and rule selection) of associative classification in malware detection, and then, propose an effective way, i.e., CIDCPF, to detect the malware from the "gray list." To the best of our knowledge, this is the first effort on using postprocessing techniques of associative classification in malware detection. CIDCPF adapts the postprocessing techniques as follows: first applying Chi-square testing and Insignificant rule pruning followed by using Database coverage based on the Chi-square measure rule ranking mechanism and Pessimistic error estimation, and finally performing prediction by selecting the best First rule. We have incorporated the CIDCPF method into our existing IMDS system, and we call the new system as CIMDS system. Case studies are performed on - - the large collection of file samples obtained from the Antivirus Laboratory at Kingsoft Corporation and promising experimental results demonstrate that the efficiency and ability of detecting malware from the "gray list" of our CIMDS system outperform popular antivirus software tools, such as McAfee VirusScan and Norton Antivirus, as well as previous data-mining-based detection systems, which employed Naive Bayes, support vector machine, and decision tree techniques. In particular, our CIMDS system can greatly reduce the number of generated rules, which makes it easy for our virus analysts to identify the useful ones.