Abstract:
Weakly supervised learning algorithms are critical for scaling audio event detection to several hundreds of sound categories. Such learning models should not only disambi...Show MoreMetadata
Abstract:
Weakly supervised learning algorithms are critical for scaling audio event detection to several hundreds of sound categories. Such learning models should not only disambiguate sound events efficiently with minimal class-specific annotation but also be robust to label noise, which is more apparent with weak labels instead of strong annotations. In this work, we propose a new framework for designing learning models with weak supervision by bridging ideas from sequential learning and knowledge distillation. We refer to the proposed methodology as SeCoST (pronounced Sequest) — Sequential Co-supervision for training generations of Students. SeCoST incrementally builds a cascade of student-teacher pairs via a novel knowledge transfer method. Our evaluations on Audioset (the largest weakly labeled dataset available) show that SeCoST achieves a mean average precision of 0.383 while outperforming prior state of the art by a considerable margin.
Published in: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 04-08 May 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information:
ISSN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Event Detection ,
- Weak Labels ,
- Audio Events ,
- Audio Event Detection ,
- Average Precision ,
- Sequence Learning ,
- Mean Average Precision ,
- Sound Detection ,
- Sound Effects ,
- Weak Supervision ,
- Loss Function ,
- Neural Network ,
- Audio Recordings ,
- Network Training ,
- Deep Convolutional Neural Network ,
- Previous Stage ,
- Ground Truth Labels ,
- Student Model ,
- Teacher Network ,
- Single Teacher ,
- Area Under Receiver Operating Characteristic Curve ,
- Multiple Teachers
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Event Detection ,
- Weak Labels ,
- Audio Events ,
- Audio Event Detection ,
- Average Precision ,
- Sequence Learning ,
- Mean Average Precision ,
- Sound Detection ,
- Sound Effects ,
- Weak Supervision ,
- Loss Function ,
- Neural Network ,
- Audio Recordings ,
- Network Training ,
- Deep Convolutional Neural Network ,
- Previous Stage ,
- Ground Truth Labels ,
- Student Model ,
- Teacher Network ,
- Single Teacher ,
- Area Under Receiver Operating Characteristic Curve ,
- Multiple Teachers
- Author Keywords