Skip to Main Content
We propose a latent Dirichlet-tree allocation (LDTA) model - a correlated latent semantic model - for unsupervised language model adaptation. The LDTA model extends the latent Dirichlet allocation (LDA) model by replacing a Dirichlet prior with a Dirichlet-tree prior over the topic proportions. Latent topics under the same subtree are expected to be more correlated than topics under different subtrees. The LDTA model falls back to the LDA model using a depth-one Dirichlet-tree, and the model fits to the variational Bayes inference framework employed in the LDA model. Empirical results show that the LDTA model has a faster training convergence than the LDA model with the same initial flat model. Experimental results show that LDTA-adapted LM performed better than LDA-adapted LM on the Mandarin RT04-eval set when the models were trained using a small text corpus, while both models had the same recognition performance when the models were trained using a big text corpus. We observed 0.4% absolute CER reduction after LM adaptation using LSA marginals.