Skip to Main Content
In this paper, a study about the uncertainty of the trained acoustic models and the confusion among these models is made in the context of speech recognition. The purpose is to find the most relevant voice features, hence the analysis is made on a per-feature basis. Model uncertainty is defined as a measure of feature distribution overlapping. A model is compared only to the models it is more similar to. Hence, confusion matrices are built from both feature distributions and recognition results. Next, the voice features are weighted according to their relevance in order to increase the discrimination among models, while relevance itself is deduced from the values of model uncertainty. Experimental results show that, by appropriate weighting, the recognition accuracy, in terms of Word Error Rate (WER), improves. Moreover, by removing the features with lower weights, the recognition accuracy is maintained, but the number of calculations is significantly reduced.