Skip to Main Content
A user's focus of attention plays an important role in human-computer interaction applications, such as a ubiquitous computing environment and intelligent space, where the user's goal and intent have to be continuously monitored. We are interested in modeling people's focus of attention in a meeting situation. We propose to model participants' focus of attention from multiple cues. We have developed a system to estimate participants' focus of attention from gaze directions and sound sources. We employ an omnidirectional camera to simultaneously track participants' faces around a meeting table and use neural networks to estimate their head poses. In addition, we use microphones to detect who is speaking. The system predicts participants' focus of attention from acoustic and visual information separately. The system then combines the output of the audio- and video-based focus of attention predictors. We have evaluated the system using the data from three recorded meetings. The acoustic information has provided 8% relative error reduction on average compared to only using one modality. The focus of attention model can be used as an index for a multimedia meeting record. It can also be used for analyzing a meeting.