Loading [MathJax]/extensions/MathMenu.js
CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition | IEEE Conference Publication | IEEE Xplore

CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition


Abstract:

As one of the most informative behaviors of humans, facial expressions are often compound and variable, which is manifested by the fact that different people may express ...Show More

Abstract:

As one of the most informative behaviors of humans, facial expressions are often compound and variable, which is manifested by the fact that different people may express the same expression in very different ways. However, most facial expression recognition (FER) methods still use one-hot or soft labels as the supervision, which lack sufficient semantic descriptions of facial expressions and are less interpretable. Recently, contrastive vision-language pre-training models (e.g., CLIP) use text as the supervision and have injected new vitality into various computer vision tasks, benefiting from the rich semantics in text. Therefore, we propose CLIPER, a unified framework for both static and dynamic facial Expression Recognition based on CLIP. Besides, we introduce multiple expression text descriptors (METD) to learn fine-grained expression representations and a two-stage training paradigm to reserve the interpretability of CLIP. We conduct extensive experiments on several popular FER benchmarks to demonstrates the effectiveness of CLIPER. The source code will be available at https://github.com/muse1998/CLIPER.
Date of Conference: 15-19 July 2024
Date Added to IEEE Xplore: 30 September 2024
ISBN Information:

ISSN Information:

Conference Location: Niagara Falls, ON, Canada

Contact IEEE to Subscribe

References

References is not available for this document.