Skip to Main Content
In this paper, we describe an analysis of our speaker diarization system based on a series of oracle experiments. In this analysis, each system component is substituted by an oracle component that uses the reference transcripts to perform flawlessly. By placing the original components back into the system one at a time, either in a top-down or bottom-up manner, the performance of each individual system component is measured. The analysis approach can be applied to any speaker diarization system that consists of a concatenation of separate components. Our experimental findings are relevant for most RT09s diarization systems that all apply similar techniques. The analysis revealed that three components caused most errors: speech activity detection, the inability to handle overlapping speech, and robustness of the merging component to cluster impurity.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:20 , Issue: 2 )
Biometrics Compendium, IEEE
Date of Publication: Feb. 2012