Skip to Main Content
Determining the perceived quality of packet audio under packet loss usually requires human-based mean opinion score (MOS) listening tests. We propose a new MOS estimation method based on machine speech recognition. Its automated, machine-based nature facilitates real-time monitoring of transmission quality without the need to conduct time-consuming listening tests. Our evaluation of this new method shows that it can use the word recognition ratio metric to reliably predict perceived quality. In particular, we find that although the absolute word recognition ratio of a speech recognizer may vary depending on the speaker, the relative word recognition ratio, obtained by dividing the absolute word recognition ratio with its own value at 0% loss, is speaker-independent. Therefore the relative word recognition ratio is well suited as a universal, speaker-independent MOS predictor. Finally we have also conducted human-based word recognition tests and examined its relationship with machine-based recognition results. Our analysis shows that they are correlated although not very linearly. Also we find that human-based word recognition ratio does not degrade significantly once packet loss is large (≥10%).