By Topic

Cross-Lingual Language Modeling for Low-Resource Speech Recognition

Sign In

Full text access may be available.

To access full text, please use your member or institutional sign in.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Ping Xu ; The Hong Kong University of Science and Technology, Kowloon, Hong Kong ; Pascale Fung

This paper proposes using cross-lingual language modeling with syntactic information for low-resource speech recognition. We propose phrase-level transduction and syntactic reordering for transcribing a resource-poor language and translating it into a resource-rich language, if necessary. The phrase-level transduction is capable of performing n -m cross-lingual transduction. The syntactic reordering serves to model the syntactic discrepancies between the source and target languages. Our purpose is to leverage the statistics in a resource-rich language model to improve the language model of a resource-poor language and at the same time to improve low-resource speech recognition performance. We implement our cross-lingual language model using weighted finite-state transducers (WFSTs), and integrate it into a WFST-based speech recognition search space to output the transcriptions of both resource-poor and resource-rich languages. This creates an integrated speech transcription and translation framework. Evaluations on Cantonese speech transcription and Cantonese to standard Chinese translation tasks show that our proposed approach improves the system performance significantly, with up to 12.5% relative character error rate (CER) reduction over baseline language model interpolation, 6.6% relative CER reduction and 18.5% relative BLEU score improvement, compared to the best word-level transduction approach.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:21 ,  Issue: 6 )