The statistical and syntactic approaches to the modelling of language are consolidated in order to improve performance in speech recognition. The authors also aim to minimise the need for human intervention in the training of the language model from a corpus. Hybrid speech recognition systems using both bigram and grammar models can yield improved performance compared with the use of either model alone, but performance is still sub-optimal because the grammar is abandoned completely for sentences which fail to parse overall. Extending the concept of a bigram to the most informative (rather than the immediate) previous word leads to a reduction in perplexity: a purely statistical approach is presented. Incorporating syntax from a substring parser will require these principles to be extended to strings of nonterminal symbols, raising important training issues but opening the way towards a language model with greater capacity for adaptive enhancement of performance
Published in:
Grammatical Inference: Theory, Applications and Alternatives, IEE Colloquium on
Date of Conference: 22-23 Apr 1993