Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition | IEEE Journals & Magazine | IEEE Xplore

Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition


Abstract:

Video-based dimensional emotion recognition aims to map human affect into the dimensional emotion space based on visual signals, which is a fundamental challenge in affec...Show More

Abstract:

Video-based dimensional emotion recognition aims to map human affect into the dimensional emotion space based on visual signals, which is a fundamental challenge in affective computing and human-computer interaction. In this paper, we present a novel encoder-decoder framework to tackle this problem. It adopts a fully convolutional design with the cascaded 2D convolution based spatial encoder and 1D convolution based temporal encoder-decoder for joint spatio-temporal modeling. In particular, to address the key issue of capturing discriminative long-term dynamic dependency, our temporal model, referred to as Temporal Hourglass Convolutional Neural Network (TH-CNN), extracts contextual relationship through integrating both low-level encoded and high-level decoded clues. Temporal Intermediate Supervision (TIS) is then introduced to enhance affective representations generated by TH-CNN under a multi-resolution strategy, which guides TH-CNN to learn macroscopic long-term trend and refined short-term fluctuations progressively. Furthermore, thanks to TH-CNN and TIS, knowledge learnt from the intermediate layers also makes it possible to offer customized solutions to different applications by adjusting the decoder depth. Extensive experiments are conducted on three benchmark databases (RECOLA, SEWA and OMG) and superior results are shown compared to state-of-the-art methods, which indicates the effectiveness of the proposed approach.
Published in: IEEE Transactions on Affective Computing ( Volume: 12, Issue: 3, 01 July-Sept. 2021)
Page(s): 565 - 578
Date of Publication: 10 September 2019

ISSN Information:

Funding Agency:


1 Introduction

Perceiving human beings is an important field in Artificial Intelligence (AI), while understanding emotions is a major branch with consistently increasing attention from both the academia and industry. In recent years, emotion recognition techniques have been applied to Human Computer Interaction (HCI) for the deployment of AI systems collaborating with humans more deeply and swimmingly [1], [2]. They are also in ongoing demand by a wide variety of applications including humanoid robot [3], healthcare [4], [5], etc.

Contact IEEE to Subscribe

References

References is not available for this document.