By Topic

Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Ljolje, A. ; Cambridge University, Cambridge, England ; Fallside, F.

A novel technique is introduced for characterizing prosodic structure and is used for speech synthesis. The mechanism consists of modeling a set of observations as a probabilistic function of a hidden Markov chain. It uses mixtures of Gaussian continuous probability density functions to represent the essential, perceptually relevant structure of intonation by observing movements of fundamental frequency in monosyllabic words of varying phonetic structure. High-quality speech synthesis, using multipulse excitation, is used to demonstrate the power of the HMM in preserving the naturalness of the intonational meaning, conveyed by the variation of fundamental frequency and duration. The fundamental frequency contours are synthesized using a random number generator from the models, and are imposed on a synthesized prototype word which had the intonation of a low fall. The resulting monosyllabic words with imposed synthesized fundamental frequency contours show a high level of naturalness and are found to be perceptually indistinguishable from the original recordings with the same intonation. The results clearly show the high potential of hidden Markov models as a mechanism for the representation of prosodic structure by naturally capturing its essentials.

Published in:

Acoustics, Speech and Signal Processing, IEEE Transactions on  (Volume:34 ,  Issue: 5 )