Skip to Main Content
Statistical methods have been used for a long time as a way to detect viral code. Such a detection method has been called spectral analysis, because it works with statistical distributions, such as bytes, instructions or system calls frequencies spectra. Most statistical classification algorithms can be described as graphical models, namely Bayesian networks. We will first present in this paper an approach of viral detection by means of spectral analysis based on Bayesian networks, through two basic examples of such learning models: naive Bayes and hidden Markov models. Designing a statistical information retrieval model requires careful and thorough evaluation in order to demonstrate the superior performance of new techniques on representative program collections. Nowadays, it has developed into a highly empirical discipline. We will next present information theory based criteria to characterize the effectiveness of spectral analysis models and then discuss the limits of such models.