By Topic

A Comparison of Measured and Calculated Speech Temporal Parameters Relevant to Speech Activity Detection

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Gruber, J.G. ; Bell-Northern Research, Ottawa, Ont., Canada

This paper deals with the measurement and calculation of various speech temporal parameters of interest in an environment where speech activity detection is employed. In particular it is shown that, based on either a measurement or model of the probability density function (pdf) for silence durations for the case of zero talkspurt "hangover" or "fill-in," that the following temporal parameters can be computed for any value of hangover or fill-in: the mean (and pdf) for silence durations, the mean talkspurt duration, the mean talkspurt rate, and the speech activity. Directly measured values of these parameters and those computed from both measured and fitted versions of the pdf for silence durations are compared and are shown to be in reasonable agreement. The illustrated results are based on measurements of about two minutes of taped male monolog source speech. However, the approach to calculating the above parameters is general in the sense that it can be applied to any measured or modeled pdf for silence durations. The significance of this work lies in the important role that talkspurt hangover plays, for example, in minimizing speech detector induced back-end clipping of talkspurts, reducing exposure to the variable talkspurt delay impairment, and in determining signaling overhead and resource occupancy in various speech interpolation, packet voice, and integrated voice/data systems.

Published in:

Communications, IEEE Transactions on  (Volume:30 ,  Issue: 4 )