The metadata embedded in program executables provides information that can be useful for automated malware detection or classification. With potentially tens of thousands of variants per malware family, it is unclear how much consistency there is in the metadata, and whether different families exhibit different consistencies. Header information from multiple variants of recent malware was studied to understand the variability of the header information within and among malware families. Classification accuracy extracted using multiple common classifiers showed that, even for rapidly mutating malware families, classifiers trained on header information can outperform ones trained on the program bodies. The results also show that some families have highly consistent header information; this fact suggests limited evolutionary pressure from defense systems. The results indicate that care is needed when evaluating classifiers operating on header as well as program body information.
Published in:
Malicious and Unwanted Software (MALWARE), 2010 5th International Conference on
Date of Conference: 19-20 Oct. 2010