This paper reports on the improvement of speech and music indexation performance under various noisy conditions for radio broadcast using warped features fused with traditional features at the output stage. The system employs a bank of four parallel front ends followed by a classification in speech and music by Gaussian mixture models, where each front end employs a different feature extraction technique. Then an automatic gathering in macro classes is made. Indexing was performed on 8 hours of manually labelled radio broadcast from multilingual Radio France International recordings containing diverse speech and music content with different speaking styles, speakers, noise conditions and channels. For speech signal classification under the noisiest conditions, the warped features fused with traditional features produced an error rate three times smaller than that of either the warped features or the traditional features alone. Significant improvements were also found for speech classification under less noisy conditions.
Published in:
Multimedia Signal Processing, 2004 IEEE 6th Workshop on
Date of Conference: 29 Sept.-1 Oct. 2004