Abstract:
This work develops a continuous sign language (SL) recognition framework with deep neural networks, which directly transcribes videos of SL sentences to sequences of orde...Show MoreMetadata
Abstract:
This work develops a continuous sign language (SL) recognition framework with deep neural networks, which directly transcribes videos of SL sentences to sequences of ordered gloss labels. Previous methods dealing with continuous SL recognition usually employ hidden Markov models with limited capacity to capture the temporal information. In contrast, our proposed architecture adopts deep convolutional neural networks with stacked temporal fusion layers as the feature extraction module, and bidirectional recurrent neural networks as the sequence learning module. We propose an iterative optimization process for our architecture to fully exploit the representation capability of deep neural networks with limited data. We first train the end-to-end recognition model for alignment proposal, and then use the alignment proposal as strong supervisory information to directly tune the feature extraction module. This training process can run iteratively to achieve improvements on the recognition performance. We further contribute by exploring the multimodal fusion of RGB images and optical flow in sign language. Our method is evaluated on two challenging SL recognition benchmarks, and outperforms the state of the art by a relative improvement of more than 15% on both databases.
Published in: IEEE Transactions on Multimedia ( Volume: 21, Issue: 7, July 2019)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Training Iterations ,
- Sign Language ,
- Sign Language Recognition ,
- Continuous Recognition ,
- Continuous Sign ,
- Continuous Sign Language Recognition ,
- Neural Network ,
- Convolutional Network ,
- Convolutional Neural Network ,
- Deep Network ,
- Deep Neural Network ,
- Hidden Markov Model ,
- Recurrent Neural Network ,
- Deep Convolutional Neural Network ,
- Deep Convolutional Network ,
- RGB Images ,
- Recognition Performance ,
- Optical Flow ,
- Sequence Learning ,
- Bidirectional Recurrent Neural Network ,
- Word Error Rate ,
- Deep Neural Architecture ,
- Bidirectional Long Short-term Memory ,
- Gesture Recognition ,
- Deep Architecture ,
- Gestures ,
- Spatiotemporal Representation ,
- Suitable Structure ,
- Temporal Convolution ,
- Recognition System
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Training Iterations ,
- Sign Language ,
- Sign Language Recognition ,
- Continuous Recognition ,
- Continuous Sign ,
- Continuous Sign Language Recognition ,
- Neural Network ,
- Convolutional Network ,
- Convolutional Neural Network ,
- Deep Network ,
- Deep Neural Network ,
- Hidden Markov Model ,
- Recurrent Neural Network ,
- Deep Convolutional Neural Network ,
- Deep Convolutional Network ,
- RGB Images ,
- Recognition Performance ,
- Optical Flow ,
- Sequence Learning ,
- Bidirectional Recurrent Neural Network ,
- Word Error Rate ,
- Deep Neural Architecture ,
- Bidirectional Long Short-term Memory ,
- Gesture Recognition ,
- Deep Architecture ,
- Gestures ,
- Spatiotemporal Representation ,
- Suitable Structure ,
- Temporal Convolution ,
- Recognition System
- Author Keywords