Abstract:
Speech recognition may be an intuitive process for humans, but it turns out to be intimidating to make computer automatically recognize speeches. Although recent progress...Show MoreMetadata
Abstract:
Speech recognition may be an intuitive process for humans, but it turns out to be intimidating to make computer automatically recognize speeches. Although recent progresses in speech recognition have been very promising in other languages, Bengali lacks such progress. There are very little research works published for Bengali speech recognizer. In this paper, we have investigated long short term memory (LSTM), a recurrent neural network, approach to recognize individual Bengali words. We divided each word into a number of frames each containing 13 mel-frequency cepstral coefficients (MFCC), providing us with a useful set of distinctive features. We trained a deep LSTM model with the frames to recognize the most plausible phonemes. The final layer of our deep model is a softmax layer having equal number of units to the number of phonemes. We picked the most probable phonemes for each time frame. Finally, we passed these phonemes through a filter where we got individual words as the output. Our system achieves word detection error rate 13.2% and phoneme detection error rate 28.7% on Bangla-Real-Number audio dataset.
Date of Conference: 22-24 December 2017
Date Added to IEEE Xplore: 08 February 2018
ISBN Information: