Skip to Main Content
In this paper, we consider the problem of characterizing the quality of audio that is impaired by time-varying distortions. Specifically, the quality of such audio must be characterized by a time-dependent function. We first study issues associated with the human subjective testing which must be addressed in order to create such a function, and we conclude that the best approach for achieving both high temporal resolution and high accuracy is most likely a combination of two testing methodologies. Using the collected subjective data, we then design an objective metric that attempts to replicate the subjective responses. The proposed metric uses as its core a subset of the model output variables that are part of the ITU perceptual evaluation of audio quality (PEAQ) recommendation as well as the structural similarity measure and the segmental signal to noise ratio. Depending on the dataset used in its design, we find that the estimated quality has an average correlation coefficient relative to the ground truth curves of between 0.887 and 0.933. Furthermore, we also find that it is possible to reduce the complexity of this metric considerably while only degrading its performance by between 0.3% and 1.6%. In addition, the reduced complexity metric appears to be more robust than the original one.