By Topic

An auditory-based measure for improved phone segment concatenation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
D. T. Chappell ; Robust Speech Process. Lab., Duke Univ., Durham, NC, USA ; J. H. L. Hansen

This paper describes a new auditory-based distance measure intended for use in a concatenated synthesis technique wherein the time- and frequency-domain characteristics are used to perform natural-sounding speaker synthesis. Whereas most concatenation systems use large databases (often +100,000 units), we begin from a small, limited database (approx. 400 units) and use a new spectral distortion measure to aid in the selection of phones for optimal concatenation. At the transition between speech segments, the new auditory-based distance metric assesses perceived discontinuities in the frequency domain. The distortion measure, which employs the Carney (see J. Acoust. Soc. Am., vol.93, p.401-17, 1993) auditory model, is used to select phones which minimize the perceived distortion between concatenated segments. Moreover, time- and frequency-domain methods can shape the prosodic and spectral characteristics of each speech segment. The final results demonstrate improved performance over standard concatenation methods applied to small databases

Published in:

Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on  (Volume:3 )

Date of Conference:

21-24 Apr 1997