Skip to Main Content
In this paper, the problem of text-to-phoneme mapping of isolated words for the English language is studied. Multilayer perceptron, recurrent and bidirectional recurrent neural network architectures are compared in the text-to-phoneme mapping task. Multilayer perceptron neural networks utilize contextual information due to the orthography of a word. In our study, recurrent and bidirectional recurrent neural networks, on the other hand, do not take the letter context into account. Instead, these networks utilize the contextual information due to the previously transcribed phonemes as introduced by the feedback loop in the networks.