This paper discusses the use of the Scale Invariance Feature Transform (SIFT) features for bare hand gesture recognition. In the training stage, we can not use SIFT keypoints of training images directly with a multi-class Support Vector Machine (SVM) to build a training classifier model, because of the space incompatibility of the SIFT keypoints for every training image that contains the hand gesture only. Therefore, the Bag-of-features model was introduced. After extracting the keypoints for every training image using the SIFT algorithm, a vector quantization technique is used to unify them. The quantization will map keypoints extracted from every training image into a unified dimensional histogram vector (Bag-of-words) after K-means clustering. This histogram is treated as an input vector for a multi-class SVM to build the training classifier model. In the testing stage, the keypoints are extracted from every image captured from the webcam and fed into the cluster model to map them with one (Bag-of-words) vector, which is finally fed into the multi-class SVM training classifier model to recognize the hand gesture.
Published in:
Haptic Audio-Visual Environments and Games (HAVE), 2010 IEEE International Symposium on
Date of Conference: 16-17 Oct. 2010