Actor and Observer: Joint Modeling of First and Third-Person Videos | IEEE Conference Publication | IEEE Xplore

Actor and Observer: Joint Modeling of First and Third-Person Videos


Abstract:

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspecti...Show More

Abstract:

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor). Despite this, learning such models for human action recognition has not been achievable due to the lack of data. This paper takes a step in this direction, with the introduction of Charades-Ego, a large-scale dataset of paired first-person and third-person videos, involving 112 people, with 4000 paired videos. This enables learning the link between the two, actor and observer perspectives. Thereby, we address one of the biggest bottlenecks facing egocentric vision research, providing a link from first-person to the abundant third-person data on the web. We use this data to learn a joint representation of first and third-person videos, with only weak supervision, and show its effectiveness for transferring knowledge from the third-person to the first-person domain.
Date of Conference: 18-23 June 2018
Date Added to IEEE Xplore: 16 December 2018
ISBN Information:

ISSN Information:

Conference Location: Salt Lake City, UT, USA

1. Introduction

What is an action? How do we represent and recognize actions? Most of the current research has focused on a data-driven approach using abundantly available third-person (observer's perspective) videos. But can we really learn how to represent an action without understanding goals and intentions? Can we learn goals and intentions without simulating actions in our own mind? A popular theory in cognitive psychology, the Theory of Mind [30], suggests that humans have the ability to put themselves in each others' shoes, and this is a fundamental attribute of human intelligence. In cognitive neuroscience, the presence of activations in mirror neurons and motor regions even for passive observations suggests the same [33].

Contact IEEE to Subscribe

References

References is not available for this document.