Skip to Main Content
Anomaly detection techniques are widely used in a variety of applications, e.g., computer networks, security systems, etc. This paper describes and analyzes an approach to anomaly detection using proximity graphs and the PageRank algorithm. We run a variant of the PageRank algorithm on top of a proximity graph comprised of data points as vertices, which produces a score quantifying the extent to which each data point is anomalous. Previous work in this direction requires first forming a density estimate using the training data, e.g., using kernel methods, and this step is very computationally intensive for high-dimensional data sets. Under mild assumptions and appropriately chosen parameters, we show that PageRank produces point-wise consistent probability density estimates for the data points in an asymptotic sense, and with much less computational effort. As a result, big improvements in terms of running time are witnessed while maintaining similar detection performance. Experiments with synthetic and real-world data sets illustrate that the proposed approach is computationally tractable and scales well to large high-dimensional data sets.