Skip to Main Content
We propose a detection-free system for segmenting multiple interacting and deforming people in a video. People detectors often fail under close agent interaction, limiting the performance of detection based tracking methods. Motion information often fails to separate similarly moving agents or to group distinctly moving articulated body parts. We formulate video segmentation as graph partitioning in the trajectory domain. We classify trajectories as foreground or background based on trajectory saliencies, and use foreground trajectories as graph nodes. We incorporate object connectedness constraints into our trajectory weight matrix based on topology of foreground: we set repulsive weights between trajectories that belong to different connected components in any frame of their time intersection. Attractive weights are set between similarly moving trajectories. Information from foreground topology complements motion information and our spatiotemporal segments can be interpreted as connected moving entities rather than just trajectory groups of similar motion. All our cues are computed on trajectories and naturally encode large temporal context, which is crucial for resolving local in time ambiguities. We present results of our approach on challenging datasets outperforming by far the state of the art.