Skip to Main Content
Because of the inadequate performance of speech recognition systems, an accurate confidence scoring mechanism should be employed to understand user requests correctly. To determine a confidence score for a hypothesis, certain confidence features are combined. The performance of filler model based confidence features are investigated. Five types of filler model networks were defined: triphone-network, phone-network, phone-class network, 5-state catch-all model and 3-state catch-all model. First, all the models were evaluated in a Turkish speech recognition task in terms of their ability to tag correctly (recognition-error or correct) recognition hypotheses. The best performance was obtained from the triphone recognition network. Then, the performance of reliable combinations of these models was investigated and it was observed that certain combinations of filler models could significantly improve the accuracy of the confidence annotation.
Date of Conference: 28-30 April 2004