By Topic

Improving phoneme and accent estimation by leveraging a dictionary for a stochastic TTS front-end

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Nagano, T. ; Tokyo Res. Lab., IBM Res., Yamato ; Tachibana, R. ; Itoh, N. ; Nishimura, M.

Determining the correct phonemes and pitch accents is important for creating natural Japanese speech. We implemented a TTS front-end system based on an n-gram model. However, the vocabulary of the word n-gram model is limited to the list of the words found in the training corpus, and collecting a very large training corpus is not an easy task. In this paper, we propose using an additional class n-gram model to incorporate not only the words found in the training corpus, but the words found in the dictionary to further improve the accuracy. In our experiments, our proposed model relatively improves the accuracy for estimating accents by 16.9% and the accuracy for estimating phonemes by 21.6% compared to the word n-gram model.

Published in:

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Date of Conference:

March 31 2008-April 4 2008