Skip to Main Content
A new computational scheme for visual attention modeling is proposed. It adopts both low-level and high-level features to predict visual attention from a video signal and fuses the features by using machine learning. We show that such a scheme is more robust than those using purely single level features. Unlike conventional techniques, our scheme is able to avoid perceptual mismatch between the estimated saliency and the actual human fixation. We show that selecting the representative training samples according to the fixation distribution improves the efficacy of regressive training. Experimental results are shown to demonstrate the advantages of the proposed scheme.