Skip to Main Content
We propose an image-based shape model for view-invariant human motion recognition. Image-based visual hull explicitly represents the 3D shape of an object, which is computed from a set of silhouettes. We then use the set of silhouettes to implicitly represent the visual hull. Due to the fact that a silhouette is the 2D projection of an object in the 3D world with respect to a certain camera, which is sensitive to the point of view, our multi-silhouette representation for the visual hull entails the correspondence between views. To guarantee the correspondence, we define a canonical multi-camera system and a canonical human body orientation in motions. We then "normalize" all the constructed visual hulls into the canonical multi-camera system, align them to follow the canonical orientation, and finally render them. The rendered views thereby satisfy the requirement of the correspondence. In our visual hull's representation, each silhouette is represented as a fixed number of sampled points on its closed contour, therefore, the 3D shape information is implicitly encoded into the concatenation of multiple 2D contours. Each motion class is then learned by a Hidden Markov Model (HMM) with mixture of Gaussians outputs. Experiments using our algorithm over some data sets give encouraging results.