Skip to Main Content
We describe a framework that leverages mixed probabilistic and deterministic networks and their AND/OR search space to efficiently find and track the hands and feet of multiple interacting humans in 2D from a single camera view. Our framework detects and tracks multiple people's heads, hands, and feet through partial or full occlusion; requires few constraints (does not require multiple views, high image resolution, knowledge of performed activities, or large training sets); and makes use of constraints and AND/OR Branch-and-Bound with lazy evaluation and carefully computed bounds to efficiently solve the complex network that results from the consideration of interperson occlusion. Our main contributions are: 1) a multiperson part-based formulation that emphasizes extremities and allows for the globally optimal solution to be obtained in each frame, and 2) an efficient and exact optimization scheme that relies on AND/OR Branch-and-Bound, lazy factor evaluation, and factor cost sensitive bound computation. We demonstrate our approach on three datasets: the public single person HumanEva dataset, outdoor sequences where multiple people interact in a group meeting scenario, and outdoor one-on-one basketball videos. The first dataset demonstrates that our framework achieves state-of-the-art performance in the single person setting, while the last two demonstrate robustness in the presence of partial and full occlusion and fast nontrivial motion.