Skip to Main Content
This paper introduces a new computational visual-attention model for static and dynamic saliency maps. First, we use the Earth Mover's Distance (EMD) to measure the center-surround difference in the receptive field, instead of using the Difference-of-Gaussian filter that is widely used in many previous visual-attention models. Second, we propose to take two steps of biologically inspired nonlinear operations for combining different features: combining subsets of basic features into a set of super features using the Lm-norm and then combining the super features using the Winner-Take-All mechanism. Third, we extend the proposed model to construct dynamic saliency maps from videos by using EMD for computing the center-surround difference in the spatiotemporal receptive field. We evaluate the performance of the proposed model on both static image data and video data. Comparison results show that the proposed model outperforms several existing models under a unified evaluation setting.