Abstract:
Recently, the convolutional recurrent neural net-work (CRNN) has been widely used in weakly labeled sound event detection (SED) and audio tagging (AT) tasks. However, it ...Show MoreMetadata
Abstract:
Recently, the convolutional recurrent neural net-work (CRNN) has been widely used in weakly labeled sound event detection (SED) and audio tagging (AT) tasks. However, it is possible that the information of frequency dimension is not well used in the existing network design, which may cause information loss or redundancy. We propose a frequency axis pooling method to further boost the representation power of CRNN. Based on the existing pooling functions, the frequency axis pooling is applied on the feature map before recurrent neural network (RNN) input in CRNN. Compared to frequency axis no-pooling method, our method assigns different weights to different frequency dimensions during compressing, which can better compress frequency information and reduce information redundancy. To evaluate the proposed method, three commonly used pooling functions on frequency axis are compared on the Dcase2017 task4 dataset. The experimental results show that reasonable compression of frequency information helps to improve the performance of AT and SED tasks significantly. Among them, the frequency axis pooling based on linear softmax performs the best on both tasks.
Published in: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Date of Conference: 14-17 December 2021
Date Added to IEEE Xplore: 03 February 2022
ISBN Information:
ISSN Information:
Conference Location: Tokyo, Japan