Skip to Main Content
Since most malware is derived from prior code, understanding malware derivation and evolution is essential for many types of malware analysis. However prior models of malware relationships are insufficiently precise or fail to capture important relationships. A framework is proposed that treats both production and evolution uniformly as compositions of code transformations, and distinguishes disjoint but interleaved evolution of production code and malware code. Evolution relations are defined in terms of path patterns on derivation graphs; this generalizes and formalizes the relationship between phylogenies and provenance graphs. The comprehensiveness of the modeling framework is demonstrated using examples from the literature; implications for future work in relationship reconstruction are drawn.