Skip to Main Content
The study at hand aims at the development of a multimodal, ensemble-based system for emotion recognition. Special attention is given to a problem often neglected: missing data in one or more modalities. In offline evaluation the issue can be easily solved by excluding those parts of the corpus where one or more channels are corrupted or not suitable for evaluation. In real applications, however, we cannot neglect the challenge of missing data and have to find adequate ways to handle it. To address this, we do not expect examined data to be completely available at all time in our experiments. The presented system solves the problem at the multimodal fusion stage, so various ensemble techniques-covering established ones as well as rather novel emotion specific approaches-will be explained and enriched with strategies on how to compensate for temporarily unavailable modalities. We will compare and discuss advantages and drawbacks of fusion categories and extensive evaluation of mentioned techniques is carried out on the CALLAS Expressivity Corpus, featuring facial, vocal, and gestural modalities.