Skip to Main Content
Feature representation is an important research topic in facial expression recognition from video sequences. In this letter, we propose to use spatiotemporal monogenic binary patterns to describe both appearance and motion information of the dynamic sequences. Firstly, we use monogenic signals analysis to extract the magnitude, the real picture and the imaginary picture of the orientation of each frame, since the magnitude can provide much appearance information and the orientation can provide complementary information. Secondly, the phase-quadrant encoding method and the local bit exclusive operator are utilized to encode the real and imaginary pictures from orientation in three orthogonal planes, and the local binary pattern operator is used to capture the texture and motion information from the magnitude through three orthogonal planes. Finally, both concatenation method and multiple kernel learning method are respectively exploited to handle the feature fusion. The experimental results on the Extended Cohn-Kanade and Oulu-CASIA facial expression databases demonstrate that the proposed methods perform better than the state-of-the-art methods, and are robust to illumination variations.