By Topic

Gesture Recognition using Hidden Markov Models from Fragmented Observations

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Ruiduo Yang ; University of South Florida ; S. Sarkar

We consider the problem of computing the likelihood of a gesture from regular, unaided video sequences, without relying on perfect segmentation of the scene. Instead of requiring that low-and mid-level processes produce near-perfect segmentation of relevant body parts such as hands, we take into account that such processes can only produce uncertain information. The hands can only be detected as fragmented regions along with clutter. To address this problem, we propose an extension of the HMM formalism, which we call the frag-HMM, to allow for reasoning based on fragmented observations, via the use of an intermediate grouping process. In this formulation, we do not match the frag- HMMto one observation sequence, but rather to a sequence of observation sets, where each observation set is a collection of groups of fragmented observations. Based on the developed model, we show how to perform three kinds of computations. The first one is to decide on the best observation group for each frame, given a sequence of observation groups for the past frames. This allows us to incrementally compute the best segmentation of the hand for each frame, given the model. The second one involves the computation of likelihood of a sequence, averaged over all possible states sequences and possible groupings. The third is the computation of the likelihood of a sequence, maximized over all possible state sequences and group sequences. This can give us the best possible groupings for each frame, as well. We demonstrate our ideas using a publicly available hand gesture dataset that spans different subjects, is against complex background, and involves hand occlusions. The recognition performance is within 2% of that obtained with manually segmented hands and about 10% better than that obtained with segmentations that use the prior knowledge of the hand color.

Published in:

2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)  (Volume:1 )

Date of Conference:

17-22 June 2006