Skip to Main Content
Continuous, real-time speech recognition is required for various mobile and hands-free applications. In this paper, hardware implementation of real-time speech recognition system is proposed using two approaches and their performances are evaluated. The first approach uses Mel Filter Banks with Mel Frequency Cepstrum Coefficients (MFCC) as feature input and the second approach uses Cochlear Filter Banks with Zero-crossings (ZC) as feature input for recognition. The features extracted from input speech are fed to multi-class Support Vector Machine (SVM) classifier for recognition. The proposed recognition systems are implemented on a Texas Instruments TMS320C6713 floating point digital signal processor for recognizing isolated digits (0-9) and their performances are compared. It is observed that the program memory required for MFCC feature extraction is 44.42% higher than that required for feature extraction using Cochlear filters. Recognition accuracies of 93.33% and 98.67% are achieved for feature inputs from Mel filter banks and Cochlear filter banks respectively. It is also observed that the computational complexity of feature extraction using cochlear filters is 1.53 times of that required for MFCC feature extraction. The recognition performance is also studied for different combinations of test and training utterances. It is found that training using 15 utterances of each digit results in best recognition accuracy. The techniques proposed here can be adapted for various other hands-free consumer applications such as washing machines, hands-free cordless and many more.
Date of Conference: 2-7 Jan. 2011