1. INTRODUCTION
Statistical models of natural language are a key part of many systems today. The most widely known applications are automatic speech recognition (ASR), machine translation (MT) and optical character recognition (OCR). In the past, there was always a struggle between those who follow the statistical way, and those who claim that we need to adopt linguistics and expert knowledge to build models of natural language. The most serious criticism of statistical approaches is that there is no true understanding occurring in these models, which are typically limited by the Markov assumption and are represented by n-gram models. Prediction of the next word is often conditioned just on two preceding words, which is clearly insufficient to capture semantics. On the other hand, the criticism of linguistic approaches was even more straightforward: despite all the efforts of linguists, statistical approaches were dominating when performance in real world applications was a measure.