Skip to Main Content
In this paper we propose a novel appearance descriptor for 3D human pose estimation from monocular images using a learning-based technique. Our image-descriptor is based on the intermediate local appearance descriptors that we design to encapsulate local appearance context and to be resilient to noise. We encode the image by the histogram of such local appearance context descriptors computed in an image to obtain the final image-descriptor for pose estimation. We name the final image-descriptor the histogram of local appearance context (HLAC). We then use relevance vector machine (RVM) regression to learn the direct mapping between the proposed HLAC image-descriptor space and the 3D pose space. Given a test image, we first compute the HLAC descriptor and then input it to the trained regressor to obtain the final output pose in real time. We compared our approach with other methods using a synchronized video and 3D motion dataset. We compared our proposed HLAC image-descriptor with the Histogram of shape context and histogram of SIFT like descriptors. The evaluation results show that HLAC descriptor outperforms both of them in the context of 3D Human pose estimation.