By Topic

A systematic approach to the extraction of diphone elements from natural speech

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Kaeslin, H. ; Swiss Federal Institute of Technology, CH Zürich, Switzerland

Synthetic speech can be generated with an unrestricted vocabulary by concatenating stored units such as diphone elements. When joining speech segments that were not adjacent in the original context they were taken from, discontinuities in the spectral envelope may arise that impair intelligibility. The method proposed here attempts to find optimum diphone boundaries in order to minimize these discontinuities, Steady-state zones of all phones carrying a diphone boundary are specified by means of a centroid vector. Based on the centroids and on an objective distance measure, hypothetical boundary cost functions are defined. Their minimization together with the evaluation of a set of additional rules determines the boundary locations. A rhyme test carried out with speech generated by concatenating diphone elements extracted according to this method yielded an intelligibility score of 96.7 percent for isolated words.

Published in:

Acoustics, Speech and Signal Processing, IEEE Transactions on  (Volume:34 ,  Issue: 2 )