Loading [MathJax]/extensions/MathZoom.js
Saliency Prediction on Mobile Videos: A Fixation Mapping-Based Dataset and A Transformer Approach | IEEE Journals & Magazine | IEEE Xplore

Saliency Prediction on Mobile Videos: A Fixation Mapping-Based Dataset and A Transformer Approach


Abstract:

With the booming development of smart devices, mobile videos have drawn broad interest when humans surf social media. Different from traditional long-form videos, mobile ...Show More

Abstract:

With the booming development of smart devices, mobile videos have drawn broad interest when humans surf social media. Different from traditional long-form videos, mobile videos are featured with uncertain human attention behavior so far owing to the specific displaying mode, thus promoting the research on saliency prediction for mobile videos. Unfortunately, the current eye-tracking experiments are not applicable for mobile videos, since the stationary eye-tracker and eye fixation acquisition are dedicated to the videos presented on computers. To tackle this issue, we propose performing the wearable eye-tracker to record viewers’ egocentric fixations and then devising a fixation mapping technique to project the eye fixations from egocentric videos onto mobile videos. Resorting to this technique, the large-scale mobile video saliency (MVS) dataset is established, including 1,007 mobile videos and 5,935,927 fixations. Given this dataset, we exhaustively analyze the characteristics of subjects’ fixations and obtain two findings. Based on the MVS dataset and these findings, we propose a saliency prediction approach on mobile videos upon Video Swin Transformer (MVFormer), wherein long-range spatio-temporal dependency is captured to derive the human attention mechanism on mobile videos. In MVFormer, we develop the selective feature fusion module to balance multi-scale features, and the progressive saliency prediction module to generate saliency maps via progressive aggregation of multi-scale features. Extensive experiments show that our MVFormer approach significantly outperforms other state-of-the-art saliency prediction approaches. Finally, we demonstrate the potential application of our MVFormer approach in the H.265 video coding standard by embedding it into the rate control scheme, such that the perceptual quality of compressed mobile videos can be significantly improved. The dataset and code are available at https://github.com/wenshijie110/MVFormer.
Page(s): 5935 - 5950
Date of Publication: 14 December 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.