Abstract:
Predicting the target of visual search from human gaze data is a challenging problem. In contrast to previous work that focused on predicting specific instances of search...Show MoreMetadata
Abstract:
Predicting the target of visual search from human gaze data is a challenging problem. In contrast to previous work that focused on predicting specific instances of search targets, we propose the first approach to predict a target's category and attributes. However, state-of-the-art models for categorical recognition require large amounts of training data, which is prohibitive for gaze data. We thus propose a novel Gaze Pooling Layer that integrates gaze information and CNN-based features by an attention mechanism - incorporating both spatial and temporal aspects of gaze behaviour. We show that our approach can leverage pre-trained CNN architectures, thus eliminating the need for expensive joint data collection of image and gaze data. We demonstrate the effectiveness of our method on a new 14 participant dataset, and indicate directions for future research in the gaze-based prediction of mental states.
Date of Conference: 22-29 October 2017
Date Added to IEEE Xplore: 22 January 2018
ISBN Information:
Electronic ISSN: 2473-9944