By Topic

A Dynamic In-Search Data Selection Method With Its Applications to Acoustic Modeling and Utterance Verification

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Hui Jiang ; Dept. of Comput. Sci., Toronto Univ., Ont., Canada ; Soong, F.K. ; Chin-Hui Lee

In this paper, we propose a dynamic in-search data selection method to diagnose competing information automatically from speech data. In our method, the Viterbi beam search is used to decode all training data. During decoding, all partial paths within the beam are examined to identify the so-called competing-token and true-token sets for each individual hidden Markov model (HMM). In this work, the collected data tokens are used for acoustic modeling and utterance verification as two specific examples. In acoustic modeling, the true-token sets are used to adapt HMMs with a sequential maximum a posteriori adaptation method, while a generalized probabilistic descent-based discriminative training method is proposed to improve HMMs based on competing-token sets. In utterance verification, under the framework of likelihood ratio testing, the true-token sets are employed to train positive models for the null hypothesis and the competing-token sets are used to estimate negative models for the alternative hypothesis. All the proposed methods are evaluated in Bell Laboratories communicator system. Experimental results show that the new acoustic modeling method can consistently improve recognition performance over our best maximum likelihood estimation models, roughly 1% absolute reduction in word error rate. The results also show the new verification models can significantly improve the performance of utterance verification over the conventional anti models, almost relatively 30% reduction of equal error rate when identifying misrecognized words from the recognition results.

Published in:

Speech and Audio Processing, IEEE Transactions on  (Volume:13 ,  Issue: 5 )