Grapheme-to-phoneme (G2P) conversion is an important component in TTS systems. The difficulty in Chinese G2P conversion is to disambiguate the polyphones. In this paper, we formulate the polyphone disambiguation problem into a classification problem and propose a language independent classifier based on maximum entropy to address the issue. Furthermore, we introduce inequality smoothing to alleviate data sparseness and exploit language independent character features as linguistic knowledge. Experimental results show that the character features perform as well as the language dependent features such as words and part-of-speech, compared with the widely-used Gaussian smoothing, the inequality smoothing can greatly reduce the active features used in the classifier and achieve better performance. Our classifier achieves 96.35% in term of overall accuracy, greatly superior to 81.22% by using high-frequent "pin-yin"(Romanization of Chinese phoneme). Finally, we explore to merge all key polyphones into 6 groups and find that the overall accuracy only decreases about 2% and the active features are reduced more than 33% further
Published in:
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
(Volume:4
)
Date of Conference: 15-20 April 2007