By Topic

Speech recognition performance as an effective perceived quality predictor

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
2 Author(s)
Wenyu Jiang ; Dept. of Comput. Sci., Columbia Univ., New York, NY, USA ; Schulzrinne, Henning

Determining the perceived quality of packet audio under packet loss usually requires human-based mean opinion score (MOS) listening tests. We propose a new MOS estimation method based on machine speech recognition. Its automated, machine-based nature facilitates real-time monitoring of transmission quality without the need to conduct time-consuming listening tests. Our evaluation of this new method shows that it can use the word recognition ratio metric to reliably predict perceived quality. In particular, we find that although the absolute word recognition ratio of a speech recognizer may vary depending on the speaker, the relative word recognition ratio, obtained by dividing the absolute word recognition ratio with its own value at 0% loss, is speaker-independent. Therefore the relative word recognition ratio is well suited as a universal, speaker-independent MOS predictor. Finally we have also conducted human-based word recognition tests and examined its relationship with machine-based recognition results. Our analysis shows that they are correlated although not very linearly. Also we find that human-based word recognition ratio does not degrade significantly once packet loss is large (≥10%).

Published in:

Quality of Service, 2002. Tenth IEEE International Workshop on

Date of Conference: