Loading [MathJax]/extensions/MathMenu.js
The Impact of Word Alignment Accuracy on Audio-visual Word Prominence Detection | VDE Conference Publication | IEEE Xplore

The Impact of Word Alignment Accuracy on Audio-visual Word Prominence Detection

; ;

Abstract:

To automatically detect prominent syllables or words, most approaches require a segmentation of the speech signal and a subsequent extraction of prosodic features in thes...Show More

Abstract:

To automatically detect prominent syllables or words, most approaches require a segmentation of the speech signal and a subsequent extraction of prosodic features in these segments. In this paper we investigate the impact of the precision of this segmentation on the detection. We perform the segmentation of our audiovisual prosodically rich corpus based on an HMM trained on a large dataset. Thereby, we investigate different training strategies of the HMM. We consider on one hand training without any prior information, i.e. flat start and on the other hand when using partially manually created segmentations. Additionally we also introduce features tailored to detect onsets in the spectrogram. We evaluate the performance of the segmentation on our corpus on one hand by comparing it to manual annotations and on the other hand functionally, i. e. via the impact on the prominent word detection. The results show that the use of manual annotations in the training and the onset features significantly improve the segmentation accuracy. Yet the results of the prominent word detection do not to benefit from the better segmentation. From this we conclude that the extraction of the prosodic features is robust against segmentation errors.
Date of Conference: 24-26 September 2014
Date Added to IEEE Xplore: 17 October 2014
Print ISBN:978-3-8007-3640-9
Conference Location: Erlangen, Germany

Contact IEEE to Subscribe