Loading [MathJax]/extensions/MathZoom.js
Environment Sound Classification Based on Visual Multi-Feature Fusion and GRU-AWS | IEEE Journals & Magazine | IEEE Xplore

Environment Sound Classification Based on Visual Multi-Feature Fusion and GRU-AWS


First, the GRU network is used for the LM-MFCC feature extraction. Then, the attention mechanism is used to redistribute the hidden layer weights of LM and MFCC. Moreover...

Abstract:

There are two major questions regarding Environmental Sound Classification (ESC). What is the best audio recognition framework, and what is the most robust audio feature?...Show More

Abstract:

There are two major questions regarding Environmental Sound Classification (ESC). What is the best audio recognition framework, and what is the most robust audio feature? For investigating above problems, the Gated Recurrent Unit (GRU) network was used to analyze the effect of single features such as Mel Scale Spectrogram (Mel), Log-Mel Scale Spectrogram (LM), and Mel frequency cepstral coefficient (MFCC) as well as multi-feature about Mel-MFCC, LM-MFCC, and Mel-LM-MFCC (T-M) in this paper. The experiment results show that in the ESC tasks, multi-features are better than the single features in the same dimensions, and LM-MFCC has the strongest robustness. Meanwhile, reverse sequence MFCC (R-MFCC) and forward and reverse mixed sequence MFCC (FR-MFCC) were also proposed to study the effects of sequence changes on audio. The experiment results show that the sequence transformation of audio features has less influence on the recognition tasks. Furthermore, to investigate the ESC task further we introduced the attention weight similar model (AWS) in to the multi-feature. The AWS model allows different audio feature attention weights of the same sound to learn from each other. It makes the GRU-AWS model focus on the frame-level features more effectively. The experiment results show that the GRU-AWS gets excellent performance with a recognition rate of 94.3%, and it outperforms other state-of-the-art methods.
First, the GRU network is used for the LM-MFCC feature extraction. Then, the attention mechanism is used to redistribute the hidden layer weights of LM and MFCC. Moreover...
Published in: IEEE Access ( Volume: 8)
Page(s): 191100 - 191114
Date of Publication: 19 October 2020
Electronic ISSN: 2169-3536

Funding Agency:


References

References is not available for this document.