By Topic

MMSE-Based Packet Loss Concealment for CELP-Coded Speech Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Carmona, J.L. ; Dept. de Teor. de la Senal, Telematica y Comun., Univ. de Granada, Granada, Spain ; Peinado, A.M. ; Pérez-Córdoba, J.L. ; Gomez, A.M.

In this paper, we analyze the performance of network speech recognition (NSR) over IP networks, adapting and proposing new solutions to the packet loss problem for code excited linear prediction (CELP) codecs. NSR has a client-server architecture which places the recognizer at the server side using a standard speech codec for speech transmission. Its main advantage is that no changes are required for the existing client devices and networks. However, the use of speech codecs degrades its performance, mainly in the presence of packet losses. First, we study the degradations introduced by CELP codecs in lossy packet networks. Later, we propose a reconstruction technique based on minimum mean square error (MMSE) estimation using hidden Markov models. This approach also allows us to obtain reliability measures associated to each estimate. We show how to use this information to improve the recognition performance by means of soft-data decoding and weighted Viterbi algorithm. The experimental results are obtained for two well-known CELP codecs, G.729 and AMR 12.2 kbps, carrying out recognition from decoded speech. Finally, we analyze an efficient and improved implementation of the proposed techniques using an NSR system which extracts speech recognition features directly from the bit-stream parameters. The experimental results show that the different proposed NSR systems achieve a comparable performance to distributed speech recognition (DSR).

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:18 ,  Issue: 6 )