Scheduled System Maintenance:
On April 27th, single article purchases and IEEE account management will be unavailable from 2:00 PM - 4:00 PM ET (18:00 - 20:00 UTC).
We apologize for the inconvenience.
By Topic

Efficient text analyser with prosody generator-driven approach for Mandarin text-to-speech

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
2 Author(s)
Yeh, C.-Y. ; Dept. of Electr. Eng., Nat. Taipei Univ. of Technol., Taiwan ; Hwang, S.-H.

A new approach for an efficient text analyser is proposed. A prosody generator-driven method is employed to design an efficient text analyser for Mandarin text-to-speech. A simpler structure for text analysis, a more suitable classification of linguistic features and a more efficient contribution of linguistic features to the prosody generator can be achieved. Three heuristic and theoretical methods are used to analyse and examine the capability of each linguistic feature: (1) the contribution of each linguistic feature to the prosody generator is examined experimentally; (2) the cross-influence of each linguistic feature on the prosody generator is analysed; (3) the problem of over- and under-classification of the linguistic features is inspected. Finally, these three analytic results are referenced to design an efficient text analyser. In total 35,243 Chinese characters are employed to examine the performance of our text analyser. Only 79 ms CPU time on a P4-1.4G PC is needed for word segmentation and POS tagging. Correction rates of 97.5% and 93.2% are achieved for word segmentation and POS tagging, respectively. This confirms that the performance of our text analyser is very good. Moreover, a Mandarin text-to-speech system is implemented to inspect the performance of the text analysis and the contribution to the prosody generator. More natural and fluent speech is obtained under the lower computation. The MOS of prosody of the synthesised and original speech are 4.2 and 4.8, respectively, which is reasonably good.

Published in:

Vision, Image and Signal Processing, IEE Proceedings -  (Volume:152 ,  Issue: 6 )