Conferences >2012 IEEE International Confe...

Music models for music-speech separation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We consider the task of speech recognition with loud music background interference. We use model-based music-speech separation and train GMM models for music on the audio...Show More

Metadata

Abstract:

We consider the task of speech recognition with loud music background interference. We use model-based music-speech separation and train GMM models for music on the audio prior to speech. We show over 8% relative improvement in WER at 10 dB SNR for a real world Voice Search ASR system. We investigate the relationship between ASR accuracy and the amount of music background used as prologue and the the size of music models. Our study shows that performance peaks when using a music prologue of around 6 seconds to train the music model. We hypothesize that this is due to the dynamic nature of music and the structure of popular music. Adding more history beyond a certain point does not improve results. Additionally, we show moderately sized 8-component music GMM models suffice to model this amount of music prologue.

Published in: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 25-30 March 2012

Date Added to IEEE Xplore: 30 August 2012

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP.2012.6289022

Conference Location: Kyoto, Japan

Contents

References is not available for this document.

Music models for music-speech separation

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Music models for music-speech separation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?