Skip to Main Content
The diversity, sophistication and availability of malicious software (malcode/malware) pose enormous challenges for securing networks and end hosts from attacks. In this paper, we analyze a large corpus of malcode meta data compiled over a period of 19 years. Our aim is to understand how malcode has evolved over the years, and in particular, how different instances of malcode relate to one another. We develop a novel graph pruning technique to establish the inheritance relationships between different instances of malcode based on temporal information and key common phrases identified in the malcode descriptions. Our algorithm enables a range of possible inheritance structures. We study the resulting ldquolikelyrdquo malcode families, which we identify through extensive manual investigation. We present an evaluation of gross characteristics of malcode evolution and also drill down on the details of the most interesting and potentially dangerous malcode families.