Skip to Main Content
Automatic voicing-decision algorithms depend on thresholds which are dependent on speaker, channel, S/N ratio, etc. Low-frequency energy (LFE) is one of the best voicing statistics when properly thresholded; it is even better if two thresholds are set, one for onset of voicing and one for offset. Two schemes are proposed for adaptive, estimation of thresholds. The first is finding stretches that are "surely" voiced or unvoiced, finding boundaries by heuristic algorithms, and setting thresholds consistent with these boundaries, in the second, one finds segments that are "surely" voiced or unvoiced according to voicing statistics other than LFE, using these to form estimates of the distribution of LFE in voiced and unvoiced cases. Both schemes successfully determine speaker-dependent thresholds in about 15 seconds, during which "standard" thresholds can be used. Overall voicing error rate using LFE with adaptive thresholds is about 1%.