Skip to Main Content
Novelty detection is one of primary tasks in data mining and machine learning. The task is to differentiate unseen outliers from normal patterns. Though novelty detection has been well-studied for many years and has found a wide range of applications, identifying outliers is still very challenging because of the absence or scarcity of outliers. We observe several characteristics of outliers and normal patterns. First, normal patterns are usually grouped together and form some clusters in high density regions of the data. Second, outliers are very different from the normal patterns, and in turn these outliers are far away from the normal patterns. Third, the number of outliers is very small compared with the size of the dataset. Based on these observations, we can envisage that the decision boundary between outliers and normal patterns usually lies in some low density regions of the data, which is referred to as cluster assumption. The resultant optimization problem is in form of a mixed integer programming. Then, we present a cutting plane algorithm together with multiple kernel learning techniques to solve its convex relaxation. Moreover, we make use of the scarcity of outliers to find a violating solution in cutting plane algorithm.
Date of Conference: 14-16 Oct. 2011