Skip to Main Content
In surveillance videos, the task of tracking multiple people is of primary importance and is often a preliminary step before applying higher-level algorithms, e.g. to analyze interactions or to recognize behaviors. In this paper, we take a tracking-by-detection approach and formulate multi-person tracking as a statistical data association problem which seeks for the optimal label field in which detections belonging to the same person have the same label. Specifically, unlike most previous works that rely on generative approaches, we use a Conditional Random Field (CRF) model, whose pairwise detection factors, defined for both distance and color features, are modeled using a two-hypothesis framework: a pair of detections corresponds either to the same person or not. Parameters of these two-hypothesis model factors are learned in a fully unsupervised way from data. Optimization is conducted using a deterministic sliding window method. Qualitative and quantitative results on several different surveillance datasets show that our method can generate robust and accurate tracks in spite of the noisy output of the human detector and of occlusions.