Skip to Main Content
We present an effective real-time approach for automatically estimating 3D human body poses from monocular video sequences. In this approach, human body is automatically detected from video sequence, then image features such as silhouette, edge and color are extracted and integrated to infer 3D human poses by iteratively minimizing the cost function defined between 2D features derived from the projected 3D model and those extracted from video sequence. In addition, 2D locations of head, hands, and feet are tracked to facilitate 3D tracking. When tracking failure happens, the approach can detect and recover from failures quickly. Finally, the efficiency and robustness of the proposed approach is shown in two real applications: human event detection and video gaming.