By Topic

A hybrid diphone speech unit and a speech corpus construction technique for a Thai text-to-speech system on mobile devices

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

7 Author(s)
K. Wongpatikaseree ; TAIST Tokyo Tech, ICTES Program, Sirindhorn International Institute of Technology, Thammasat University, Thailand, 12121 ; A. Ratikan ; A. Chotimongkol ; P. Chootrakool
more authors

Most Thai text-to-speech systems on personal computers can synthesize sound in real time with acceptable quality. However, when porting the Thai TTS systems to limited-resource systems such as mobile devices, computational time has to be reduced. Hence, the quality of synthesized sound is decreased. Even though Flite_Thai, a unit concatenation synthesizer for Thai, can reduce the computational time into a real time system, the output sound is quite unintelligible. In this paper, we aim at selecting the appropriate speech unit for Flite_Thai in order to improve its intelligibility. We design a new speech corpus that consists of three different speech units: demi-syllable, diphone and a new speech unit called hybrid diphone. We use a non-sense carrier sentence technique for recording this corpus since we focus more on clear articulation of each speech unit. Our carrier sentence contains a speech unit or a set of similar speech units per sentence without concerning the meaning. We compare the quality of speech synthesized using four types of speech units, a diphone from the TsynC corpus recorded with natural sentences, and the three types of units from the new corpus recorded with non-sense carrier sentences. In terms of intelligibility, all of the speech units from the new corpus achieved higher MOS (Mean Opinion Score) than the existing Flite_Thai system which uses speech units from TsynC. Among the three unit types in the news corpus, demi-syllable obtained the highest score. Although hybrid diphone obtained higher MOS than the existing system and the diphone, it still suffers from a similar problem which is unsmooth joints between units.

Published in:

Electrical Engineering/Electronics Computer Telecommunications and Information Technology (ECTI-CON), 2010 International Conference on

Date of Conference:

19-21 May 2010