Skip to Main Content
This paper deals with the measurement and calculation of various speech temporal parameters of interest in an environment where speech activity detection is employed. In particular it is shown that, based on either a measurement or model of the probability density function (pdf) for silence durations for the case of zero talkspurt "hangover" or "fill-in," that the following temporal parameters can be computed for any value of hangover or fill-in: the mean (and pdf) for silence durations, the mean talkspurt duration, the mean talkspurt rate, and the speech activity. Directly measured values of these parameters and those computed from both measured and fitted versions of the pdf for silence durations are compared and are shown to be in reasonable agreement. The illustrated results are based on measurements of about two minutes of taped male monolog source speech. However, the approach to calculating the above parameters is general in the sense that it can be applied to any measured or modeled pdf for silence durations. The significance of this work lies in the important role that talkspurt hangover plays, for example, in minimizing speech detector induced back-end clipping of talkspurts, reducing exposure to the variable talkspurt delay impairment, and in determining signaling overhead and resource occupancy in various speech interpolation, packet voice, and integrated voice/data systems.