Skip to Main Content
Recognizing humans, estimating their pose and segmenting their body parts are key to high-level image understanding. Because humans are highly articulated, the range of deformations they undergo makes this task extremely challenging. Previous methods have focused largely on heuristics or pairwise part models in approaching this problem. We propose a bottom-up parsing of increasingly more complete partial body masks guided by a parse tree. At each level of the parsing process, we evaluate the partial body masks directly via shape matching with exemplars, without regard to how the parses are formed. The body is evaluated as a whole, not the sum of its constituent parses, unlike previous approaches. Multiple image segmentations are included at each of the levels of the parsing, to augment existing parses or to introduce ones. Our method yields both a pose estimate as well as a segmentation of the human. We demonstrate competitive results on this challenging task with relatively few training examples on a dataset of baseball players with wide pose variation. Our method is comparatively simple and could be easily extended to other objects.