Skip to Main Content
N-gram language modeling typically requires large quantities of in-domain training data, i.e., data that matches the task in both topic and style. For conversational speech applications, particularly meeting transcription, obtaining large volumes of speech transcripts is often unrealistic; topics change frequently and collecting conversational-style training data is time-consuming and expensive. In particular, new topics introduce new vocabulary items which are not included in existing models. In this work, we use a variety of data sources (reflecting different sizes and styles), combined using mixture n-gram models. We study the impact of the different sources on vocabulary expansion and recognition accuracy, and investigate possible indicators of the usefulness of a data source. For the task of recognizing meeting speech, we obtain a 9% relative reduction in the overall word error rate and a 61% relative reduction in the word error rate for "new" words added to the vocabulary over a baseline language model trained from general conversational speech data.