Skip to Main Content
A purely bottom-up model of visual attention is proposed and compared to five state-of-the-art models. The role of the low-level visual features is examined in two contexts. Two datasets are used: one containing data coming from an eye tracking experiment obtained in a free-viewing task and a second containing 5000 hand-label pictures (observers had to enclose the most visually interesting objects in a rectangle). The relevance of the bottom-up models, i.e., the ability of a model to predict where the salient areas are located, is evaluated. Whatever the metrics and the datasets, the degree of similarity between predictions and ground truth is significantly above chance. The proposed model, resting on a small number of features, is shown to be a good predictor of the human visual fixations but also a good predictor of the objects chosen as interesting by observers. This study suggests that the low-level visual features have a significant role in a free-viewing task but also in a high-level visual task, such as the choice of the object of interest in a complex visual scene. Another outcome concerns the viewing duration used in eye tracking experiments. Results suggest that this parameter is finally not as critical as one would expect.