Skip to Main Content
We present several methods able to fit speech recognition system requirements to cellular phone resources. The proposed techniques are evaluated on a digit recognition task using both French and English corpora. We investigate particularly three aspects of speech processing: acoustic parameterization, recognition algorithms; acoustic modeling. Several parameterization algorithms (LPCC, MFCC and PLP) are compared to the linear predictive coding (LPC) included in the GSM norm. The MFCC and PLP parameterization algorithms perform significantly better than the others. Moreover, feature vector size can be reduced to 6 PLP coefficients, allowing memory and computation resources to be decreased without a significant loss of performance. In order to achieve good performance with reasonable resource needs, we develop several methods to embed a classical HMM-based speech recognition system in a cellular phone. We first propose an automatic on-line building of a phonetic lexicon which allows a minimal but unlimited lexicon. Then we reduce the HMM complexity by decreasing the number of (Gaussian) components per state. Finally, we evaluate our propositions by comparing dynamic time warping (DTW) with our HMM system - in the cellular phone context - for clean conditions. The experiments show that our HMM system outperforms DTW for speaker independent tasks and allows more practical applications for the cellular-phone user interface.
Date of Conference: 17-21 May 2004