I. Introduction
Gaze following is one of the extraordinary human abilities to precisely track others’ gaze directions and identify their gaze targets. In real-world scenarios, the human visual system operates with speed and precision, allowing us to perform complex vision tasks with little conscious thoughts, such as effortlessly understanding the behavioural intentions of others. Similarly, fast and accurate gaze following detection algorithms will enable machines to better interpret human behaviors, thereby presenting a significantly potential in various human-centric vision tasks, such as saliency prediction [23], human action detection [32], and human-object interaction detection [43], among others [45].