Continuous speech from one male American speaker is described by a single, large, hidden Markov model. The problems of model parameter estimation, speech-state decoding, and speech-observation decoding are presented under a unified conceptual framework built around a maximum-likelihood approach. The model has 64 states and 1024 observations, and is assumed to be fully connected. This model possesses not only speech analysis but also speech synthesis capabilities. Very-low-bit-rate speech coding is made possible by the fact that very little information is needed to encode speech state sequences. Intelligible speech was produced at 1.68 bits/frame, whereas a 2 bits/frame pitched-excited LPC vector quantizer did not produce intelligible speech
Published in:
Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on
Date of Conference: 11-14 Apr 1988