Skip to Main Content
In 2009 UnrealIRCd 220.127.116.11, an IRC (Internet Relay Chat) server, was replaced by a version with a backdoor at its mirror sites. It was not detected until seven months later and it had caused irrevocable damages in IRC services. It is of vital importance and also a challenge to detect implanted malicious code in newly developed systems before their deployment. We apply machine learning to uncover a system implementation structure that includes its normal functions from the design, as well as the hidden malicious behaviors. Published works with machine learning often assume that systems are completely specified. Unfortunately, practical system implementations are usually incompletely specified and the prevalent algorithms do not apply. We design generalized and efficient machine learning algorithms for incompletely specified protocol system implementations for detecting implanted malicious code. We further extend the results where machine learning starts from an approximate model instead of an empty conjecture - a usual approach of machine learning algorithms, and our approach learns an implementation structure more efficiently than the known algorithms. We implement and apply our method to two case studies: an IRC server with backdoor and an MSN client with message flooder. Experiments show that our procedures successfully and efficiently detect the implanted malicious behaviors.