Loading [MathJax]/extensions/MathMenu.js
Extensions of recurrent neural network language model | IEEE Conference Publication | IEEE Xplore

Scheduled Maintenance: On Tuesday, May 20, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (6:00-10:00 PM UTC). During this time, there may be intermittent impact on performance. We apologize for any inconvenience.

Extensions of recurrent neural network language model


Abstract:

We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competi...Show More

Abstract:

We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.
Date of Conference: 22-27 May 2011
Date Added to IEEE Xplore: 11 July 2011
ISBN Information:

ISSN Information:

Conference Location: Prague, Czech Republic

1. INTRODUCTION

Statistical models of natural language are a key part of many systems today. The most widely known applications are automatic speech recognition (ASR), machine translation (MT) and optical character recognition (OCR). In the past, there was always a struggle between those who follow the statistical way, and those who claim that we need to adopt linguistics and expert knowledge to build models of natural language. The most serious criticism of statistical approaches is that there is no true understanding occurring in these models, which are typically limited by the Markov assumption and are represented by n-gram models. Prediction of the next word is often conditioned just on two preceding words, which is clearly insufficient to capture semantics. On the other hand, the criticism of linguistic approaches was even more straightforward: despite all the efforts of linguists, statistical approaches were dominating when performance in real world applications was a measure.

Contact IEEE to Subscribe

References

References is not available for this document.