By Topic

On 450-600 b/s natural sounding speech coding

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Cheng, Y.M. ; INRS Telecommun., Verdun, Que., Canada ; O'Shaughnessy, D.

Algorithms for encoding speech with good intelligence and naturalness at very low rates are studied. Naturalness is retained by encoding accurately the speech excitation information from an LPC (linear predictive coding) model. A glottal ARX (autoregressive with exogenous input) technique is used to model the speech signal for high quality. A large reduction in coding rate is achieved through short-term temporal compression of the speech and vector quantization. Application of traditional vector quantization to the temporal decomposition output is discussed, with consideration of distortion measures and codebook generation. Based on properties of short-term temporal decomposition, finite-state vector quantization is introduced to further decrease the coding rate. A problem associated with this technique, estimation of a state transition matrix with incomplete data, is treated. The general result is that practical coders operating in a range of 450-600 b/s with a delay of about 200 ms and natural-sounding output speech can be designed

Published in:

Speech and Audio Processing, IEEE Transactions on  (Volume:1 ,  Issue: 2 )