Skip to Main Content
Our contribution in this paper is two fold. First we provide preliminary investigation results establishing program based anomaly detection is effective if short system call sequences are modeled along with their occurrence frequency. Second as a consequence of this, built normal program model can tolerate some level of contamination in the training dataset. We describe an experimental system Sequencegram, designed to validate the contributions. Sequencegram model short sequences of system calls in the form of n-grams and store in a tree (for the space efficiency) called as n-gram-tree. A score known as anomaly score is associated with every short sequence (based on its occurrence frequency) which represents the probability of short sequence being anomalous. As it is generally assumed that, there is a skewed distribution of normal and abnormal sequences, more frequently occurring sequences are given lower anomaly score and vice versa. Individual n-gram anomaly score contribute to the anomaly score of a program trace.