Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $31
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yeh, C.-Y. ; Dept. of Electr. Eng., Nat. Taipei Univ. of Technol., Taiwan ; Hwang, S.-H.

A new approach for an efficient text analyser is proposed. A prosody generator-driven method is employed to design an efficient text analyser for Mandarin text-to-speech. A simpler structure for text analysis, a more suitable classification of linguistic features and a more efficient contribution of linguistic features to the prosody generator can be achieved. Three heuristic and theoretical methods are used to analyse and examine the capability of each linguistic feature: (1) the contribution of each linguistic feature to the prosody generator is examined experimentally; (2) the cross-influence of each linguistic feature on the prosody generator is analysed; (3) the problem of over- and under-classification of the linguistic features is inspected. Finally, these three analytic results are referenced to design an efficient text analyser. In total 35,243 Chinese characters are employed to examine the performance of our text analyser. Only 79 ms CPU time on a P4-1.4G PC is needed for word segmentation and POS tagging. Correction rates of 97.5% and 93.2% are achieved for word segmentation and POS tagging, respectively. This confirms that the performance of our text analyser is very good. Moreover, a Mandarin text-to-speech system is implemented to inspect the performance of the text analysis and the contribution to the prosody generator. More natural and fluent speech is obtained under the lower computation. The MOS of prosody of the synthesised and original speech are 4.2 and 4.8, respectively, which is reasonably good.

Published in:

Vision, Image and Signal Processing, IEE Proceedings -  (Volume:152 ,  Issue: 6 )