Skip to Main Content
In Korean language, a large proportion of word units are pronounced differently from their written forms due to an agglutinative and highly inflective nature having severe phonological phenomena and coarticulation effects. This paper reports on an ongoing study of Korean pronunciation modeling, in which the mapping between phonemic and orthographic units is modeled by a Bayesian network (BN). The advantages of this graphical model framework is that the probabilistic relationship between these symbols as well as additional knowledge sources can be learned in a general and flexible way. Thus, we can easily incorporate various additional knowledge sources from different domains. In this preliminary study, we start with a simple topology where the additional knowledge only includes the preceding and succeeding contexts of the current phonemic unit. In practise, this proposed BN pronunciation model is applied on our syllable-based Korean large-vocabulary continuous speech recognition (LVCSR) system, where we construct the speech recognition task as a serial architecture composed of two independent parts. The first part is to perform standard hidden Markov model (HMM)-based recognition of phonemic syllable units of the actual pronunciation (surface forms). By this way, the lexicon dictionary and out-of-vocabulary rates can be kept small, while avoiding high acoustic confusability. In the second part, the system then transforms the phonemic syllable surface forms into the desirable Korean orthography eumjeol of a recognition unit, by utilizing the proposed BN pronunciation model. Experimental results show that the proposed BN model can successfully map the phonemic syllable surface forms to eumjeols transcription with more than 97% accuracy on average. It also revealed that it could help to enhance our Korean LVCSR system, and gave about 25.53% absolute improvement on average with respect to baseline orthographic syllable recognition.