Abstract:
Automatic speech recognition (ASR) has suffered great performance degradation when facing low-resource languages with limited training data. In this work, we propose a se...Show MoreMetadata
Abstract:
Automatic speech recognition (ASR) has suffered great performance degradation when facing low-resource languages with limited training data. In this work, we propose a series of training strategies to exploring more effective data utilization for low-resource speech recognition. In low-resource scenarios, multilingual pretraining is of great help for the above purpose. We exploit relationships among different languages for better pretraining. Then, the knowledge extracted from the language classifier is utilized for data weighing on training samples, making the model more biased towards the target low-resource language. Moreover, dynamic curriculum learning as a warm-up strategy and length perturbation as data augmentation are also designed. All these three methods form a newly improved training strategy for low-resource speech recognition. Meanwhile, we evaluate the proposed strategies using rich-resource languages for pretraining (PT) and finetuning (FT) the model on the target language with limited data. The experimental results show that on the CommonVoice dataset, compared with the commonly used multilingual PT+FT method, the proposed strategies achieve a relative 15-25% reduction in word error rate on different target languages, which shows the significant effects of the proposed data utilization strategy.
Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 23-27 May 2022
Date Added to IEEE Xplore: 27 April 2022
ISBN Information: