Skip to Main Content
For the well-known reasons, the virus detection schemes based on signature manifest unsatisfactory performance when they dispose the previously unknown virus. Recently, machine learning methods were introduced to build new ways for virus detection. They adopted classification algorithms to learn patterns in the binary code files in order to classify unknown files. In this paper, we present a graph features based method, which can be used in the process of machine learning, and design a virus detection model based on our feature method. The features are extracted from Control Flow Graph (CFG) of executable. We follow a threefold research methodology in our detection model: (1) create the CFG of the executables, (2) extract features from the CFG and create training data, (3) generate classifiers according to specific machine learning algorithms, and detect virus with these classifiers. For the sake of fixed sum of features, our model avoids situation that too much features could be found in other feature methods and leaves the filter step out of it, so it presents the efficient and scalability. With our experiments, we were able to achieve as high as 95.9% detection rate and as low as 5.9% false positive rate on novel malware.