I. Introduction
Automatic emotion recognition empowers machines with the capability to communicate naturally with humans, which plays an essential role in maintaining long-term human- machine interactions. It has a wide range of applications in modern dyadic interaction scenarios involving various human relationships such as therapist-patient, teacher-student, agent- customer, and employer-employee interactions etc. [1] In recent years, there have been growing interests in exploring automatic technologies to recognize emotional states of individuals in various scenarios especially with the rapid development of deep neural networks.