Skip to Main Content
Perceived conversational speech quality is a key quality of service (QoS) metric for voice over IP (VoIP) applications. Speech quality is mainly affected by network impairments, such as delay, jitter and packet loss. Playout buffer algorithms are used to compensate for jitter based on a tradeoff between delay and loss, but can have a significant effect on perceived quality. The main aim in this paper is to assess how buffer algorithms affect perceived speech quality and how to choose the best algorithm and its parameters to obtain optimum perceived speech quality (in terms of an objective mean opinion score). The contributions of the paper are three-fold. First, we introduce a new methodology for predicting conversational speech quality (conversational mean opinion score or MOSc) which combines the latest ITU-T speech quality measurement algorithm (PESQ) and the concepts of the E-model. Second, we assess different playout buffer algorithms using the new MOSc metric on Internet trace data. Our findings indicate that, in general, end-to-end delay has a major effect on the selection of a buffer algorithm and its parameters. For small end-to-end delays, an algorithm that seeks to minimise loss is preferred, whereas for large end-to-end delays, an algorithm that aims at a minimum buffer delay is best. Third, we propose a modified buffer algorithm together with an adaptive parameter adjustment scheme. Preliminary results show that this can achieve an "optimum" perceived speech quality for all the traces considered. The results are based on Internet trace data measurements between UK and USA, UK and China, UK and Germany.