Skip to Main Content
This paper presents a statistical method for modeling large volume of traffic data by Dirichlet Process Mixtures (DPM). Traffic signals are in general defined by their spatial-temporal characteristics, of which some can be common or similar across a set of signals, while a minority of these signals may have characteristics inconsistent with the majority. These are termed outliers. Outlier detection aims to segment and eliminate them in order to improve signal quality. It is accepted that the problem of outlier detection is non-trivial. As traffic signals generally share a high degree of spatial-temporal similarities within the signal and between different types of traffic signals, traditional modeling approaches are ineffective in distinguishing these similarities and discerning their differences. In regard to modeling the traffic data characteristics by DPM, this paper conveys three contributions. First, a new generic statistical model for traffic data is proposed based on DPM. Second, this model achieves an outlier detection rate of 96.74% based on a database of 764,027 vehicles. Third, the proposed model is scalable to the entire road network.