Skip to Main Content
Protocol reverse engineering is the process of extracting application-level specifications for network protocols. Such specifications are very useful in a number of security-related contexts, for example, to perform deep packet inspection and black-box fuzzing, or to quickly understand custom botnet command and control (C&C) channels. Since manual reverse engineering is a time-consuming and tedious process, a number of systems have been proposed that aim to automate this task. These systems either analyze network traffic directly or monitor the execution of the application that receives the protocol messages. While previous systems show that precise message formats can be extracted automatically, they do not provide a protocol specification.The reason is that they do not reverse engineer the protocol state machine. In this paper, we focus on closing this gap by presenting a system that is capable of automatically inferring state machines. This greatly enhances the results of automatic protocol reverse engineering, while further reducing the need for human interaction. We extend previous work that focuses on behavior-based message format extraction, and introduce techniques for identifying and clustering different types of messages not only based on their structure, but also according to the impact of each message on server behavior. Moreover, we present an algorithm for extracting the state machine. We have applied our techniques to a number of real-world protocols, including the command and control protocol used by a malicious bot. Our results demonstrate that we are able to extract format specifications for different types of messages and meaningful protocol state machines. We use these protocol specifications to automatically generate input for a stateful fuzzer, allowing us to discover security vulnerabilities in real-world applications.