Abstract:
Ngram language models are ubiquitous in speech applications and many other natural language systems. One issue with n-gram language models is that the language is not com...Show MoreMetadata
Abstract:
Ngram language models are ubiquitous in speech applications and many other natural language systems. One issue with n-gram language models is that the language is not completely represented in the model. When words appear that are not in the model, we may need to provide a smoothing method to distribute the model probabilities over the unknown values. Many techniques exist for language model smoothing with many different performance characteristics. Often the performance of smoothing algorithms may depend on the application of the language model (so, for example, unigram models with interpolation smoothing may perform better with information retrieval applications, but trigram models with backoff smoothing might perform better for speech). This paper examines the relative performance of some selected smoothing methods with bigram language models created using chat data. The language models are used for machine translation of chat data and for creating text classification models.
Date of Conference: 20-24 November 2012
Date Added to IEEE Xplore: 22 April 2013
ISBN Information: