Skip to Main Content
A low-cost robust Mandarin speech recognition system is investigated for embedded car navigation application. In the front-end, log-spectral minimum mean-square error (LogMMSE) estimation algorithm is applied to suppress the background noise, and a piece-wise linear function is used to approximate the traditional Taylor expansion in its gain function calculation to reduce the computational complexity. After speech enhancement, spectral smoothing is implemented in both time and frequency indexes with geometric sequence weights to further compensate the spectral components distorted by noise over-reduction. In acoustic model training, an immunity learning scheme is applied, in which pre-recorded car noise is artificially added to clean training utterances to simulate the in-car environment. In the context of Mandarin speech recognition, a special difficulty is the diversity of Chinese dialects, i.e. the pronunciation difference among accents degrades the recognition performance if the acoustic models are trained with a mismatched accented database. We propose to train the models with multiple accented Mandarin databases to deal with this problem. Evaluation results of isolated phrase recognition confirm the effectivity of the proposed technologies.