I. Introduction
Saliency prediction has received a lot of attention recently in the field of computer vision. It is well known that humans have the ability to rapidly analyze and interpret the complex environment around them and to focus their attention on a certain area. In order to make machines have this intelligent behavior as well, saliency prediction techniques are born on demand. The main purpose of video saliency prediction is to model human attention and then predict the areas where human attention is likely to be focused while watching a video. Previous studies have shown that this technology can be applied to many scenarios, such as visual tracking [1]–[2], video compression [5], video subtitles [3]–[4], etc.