Loading [MathJax]/extensions/MathMenu.js
Detecting Highlighted Video Clips Through Emotion-Enhanced Audio-Visual Cues | IEEE Conference Publication | IEEE Xplore

Detecting Highlighted Video Clips Through Emotion-Enhanced Audio-Visual Cues


Abstract:

Recent years have witnessed the growing research interests in video highlight detection. Existing studies mainly focus on detecting highlights in user-generated videos wi...Show More

Abstract:

Recent years have witnessed the growing research interests in video highlight detection. Existing studies mainly focus on detecting highlights in user-generated videos with simple topics based on visual content. However, relying solely on visual features limits the ability of conventional methods to capture highlights for videos with more complicated semantics, like movies. Therefore, we propose to mine the emotional information in video sounds to enhance highlight detection. Specifically, we design a novel emotion-enhanced framework with multi-stage fusion to detect highlights for complex videos. Along this line, we first extract multi-grained features from the audio waves. Then, the tailored-designed intra-modal fusion is applied on audio features to obtain emotional representation. Furthermore, the cross-modal fusion is developed to generate comprehensive representation of clip by merging audio emotional representations and visual features. This representation can be leveraged for predicting highlight probability. Finally, extensive experiments on real-world datasets demonstrate the effectiveness of our method.
Date of Conference: 05-09 July 2021
Date Added to IEEE Xplore: 09 June 2021
ISBN Information:

ISSN Information:

Conference Location: Shenzhen, China

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.