Skip to Main Content
We present an algorithm to estimate the body pose of a walking person given synchronized video input from multiple uncalibrated cameras. We construct an appearance model of human walking motion by generating examples from the space of body poses and camera locations, and clustering them using expectation-maximization. Given a segmented input video sequence, we find the closest matching appearance cluster for each silhouette and use the sequence of matched clusters to extrapolate the position of the camera with respect to the person's direction of motion. For each frame, the matching cluster also provides an estimate of the walking phase. We combine these estimates from all views and find the most likely sequence of walking poses using a cyclical, feed-forward hidden Markov model. Our algorithm requires no manual initialization and no prior knowledge about the locations of the cameras.