Skip to Main Content
A key challenge in rapidly building Tibetan language speech recognition applications is minimizing the manual effort required in transcribing and labeling speech data. Accurate labeling of Tibetan speech utterances is extremely time consuming and requires trained linguists. For alleviate this problem, we present an approach that aims at reducing the amount of manually transcribed speech data required for building automatic speech recognition (ASR) models. The experimental results show that our approach has better performance than traditional methods based on semi-supervised learning and supervised learning under few labeled Tibetan speech utterances.