Loading [MathJax]/extensions/MathMenu.js
Teager–Kaiser Energy Operators for Overlapped Speech Detection | IEEE Journals & Magazine | IEEE Xplore

Teager–Kaiser Energy Operators for Overlapped Speech Detection


Abstract:

Overlapped speech is referred to a monophonic audio signal in which at least two speakers are present at the same time. In this study, the focus is on distinguishing over...Show More

Abstract:

Overlapped speech is referred to a monophonic audio signal in which at least two speakers are present at the same time. In this study, the focus is on distinguishing overlapped from single-speaker speech, i.e., overlapped speech detection. We develop an overlap detection algorithm using an enhanced time-frequency representation, called Pyknogram, estimated directly from the input audio signal. Pyknograms use the Teager-Kaiser energy operator to detect resonant time-frequency units and thereby suppress nonharmonic structures. We show how the resulting Pyknograms provide high separability in terms of detecting the presence of interfering speech. Our proposed unsupervised Pyknogram-based detection results in over 30% relative improvement in overlap detection error rates across different signal-to-interference ratios (SIR) compared to baseline systems. In addition, a case study is presented where we evaluate speaker verification performance under different overlap conditions using the GRID database and observe that speaker verification equal error rates (EER) vary from 2% to 30%, depending on the average SIR values introduced to train and test sets. In order to estimate the reliability of speaker verification scores across different trials, overlap detection results are interpreted as low-level information and stacked alongside verification outputs. The resulting high-dimensional space is passed through a support vector machine classifier to find the separating hyperplane between target and imposter scores. Combining overlap detection scores with speaker verification on average yields 20% relative decrease in EER. We also provide an upper bound for this approach using existing overlap labels, which yields 23% relative improvement.
Page(s): 1035 - 1047
Date of Publication: 06 March 2017

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.