Loading [MathJax]/extensions/MathMenu.js
3DSEAVNet: 3D-Squeeze-and-Excitation Networks for Audio-Visual Saliency Prediction | IEEE Conference Publication | IEEE Xplore

3DSEAVNet: 3D-Squeeze-and-Excitation Networks for Audio-Visual Saliency Prediction


Abstract:

Video saliency prediction is an important task in the field of computer vision. Most of the existing video saliency prediction methods only focus on image information, an...Show More

Abstract:

Video saliency prediction is an important task in the field of computer vision. Most of the existing video saliency prediction methods only focus on image information, and the audio information is often ignored. This leads to an incomplete perception mode, which makes it difficult to achieve optimal performance. SENet is an excellent attention mechanism-based network. It significantly enhances the performance of 2D convolutional networks. However, whether the 3D convolutional network can be applied to this attention mechanism network remains to be studied. In order to solve the above problems, we propose a saliency prediction network for audio-visual fusion to extract and predict various information in videos. At the same time, we improve the traditional SENet to make it applicable in 3D convolutional neural networks and discuss its role. Compared with the state-of-the-art methods, our model has strong competitiveness in multiple data sets.
Date of Conference: 18-23 June 2023
Date Added to IEEE Xplore: 02 August 2023
ISBN Information:

ISSN Information:

Conference Location: Gold Coast, Australia

Contact IEEE to Subscribe

References

References is not available for this document.