Skip to Main Content
Visual attention, which is an important characteristic of human visual system, is a useful clue for image processing and compression applications in the real world. This paper proposes a computational scheme that adopts both low-level and high-level features to predict visual attention from video signal by machine learning. The adoption of low-level features (color, orientation, and motion) is based on the study of visual cells, and the adoption of the human face as a high-level feature is based on the study of media communications. We show that such a scheme is more robust than those using purely single low- or high-level features. Unlike conventional techniques, our scheme is able to learn the relationship between features and visual attention to avoid perceptual mismatch between the estimated salience and the actual human fixation. We also show that selecting the representative training samples according to the fixation distribution improves the efficacy of regressive training. Experimental results are shown to demonstrate the advantages of the proposed scheme.