By Topic

Concatenative synthesis based on a harmonic model

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
O'Brien, D. ; Sun Microsyst. Inc., Dublin, UK ; Monaghan, A.I.C.

One of the most successful approaches to synthesizing speech, concatenative synthesis, combines recorded speech units to build full utterances. However, the prosody of the stored units is often not consistent with that of the target utterance and must be altered. Furthermore, several types of mismatch can occur at unit boundaries and must be smoothed. Thus, both pitch and time-scale modification techniques as well as smoothing algorithms play a crucial role in such concatenation based systems. In this paper, we describe novel approaches to each of these issues. First, we present a conceptually simple technique for pitch and time-scale modification of speech. Our method is based upon a harmonic coding of each speech frame, and operates entirely within the original sinusoidal model. Crucially, it makes no use of “pitch pulse onset times.” Instead, phase coherence, and thus shape invariance, is ensured by exploiting the harmonic relation existing between the sine waves used to code each analysis frame so that their phases at each synthesis frame boundary are consistent with those derived during analysis. Secondly, a smoothing algorithm, aimed specifically at correcting phase mismatches at unit boundaries, is described. Results are presented showing our prosodic modification techniques to be highly suitable for use within a concatenative speech synthesizer

Published in:

Speech and Audio Processing, IEEE Transactions on  (Volume:9 ,  Issue: 1 )