Skip to Main Content
We propose an object detection system that uses the locations of tracked low-level feature points as input, and produces a set of independent coherent motion regions as output. As an object moves, tracked feature points on it span a coherent 3D region in the space-time volume defined by the video. In the case of multi-object motion, many possible coherent motion regions can be constructed around the set of all feature point tracks. Our approach is to identify all possible coherent motion regions, and extract the subset that maximizes an overall likelihood function while assigning each point track to at most one motion region. We solve the problem of finding the best set of coherent motion regions with a simple greedy algorithm, and show that our approach produces semantically correct detections and counts of similar objects moving through crowded scenes.