Letter-to-Sound (LTS) conversion, which is used to compress the lexicon for embedded application purpose, has become an important part in Text-to-Speech (TTS) system. In this paper, coupled Hidden Markov Models (CHMM) for LTS conversion is proposed. In the phase of preprocessing, many-to-many alignment is adopted for lexicon alignment instead of one-to-one alignment which is commonly used in previous approaches. Two Hidden Markov Models (HMM) which are respectively designed to predict the best phonemic string and corresponding graphemic substring segmentation are coupled in the phase of phonemes generation. The best phonemic string as the global optimal solution is given by maximizing the joint likelihood. Both combined and separated phone/stress prediction are concerned in stress assignment. The experimental result shows the performance of our approach is better than other previous approaches.
Published in:
Speech Database and Assessments (Oriental COCOSDA), 2012 International Conference on
Date of Conference: 9-12 Dec. 2012