Abstract:
The massive expansion of social media and the rapid growth in multimedia content on it has resulted in a growing interest in visual content analysis and classification. T...Show MoreMetadata
Abstract:
The massive expansion of social media and the rapid growth in multimedia content on it has resulted in a growing interest in visual content analysis and classification. There are now a good number of studies that focus on identifying hateful and offensive content in social media posts. The social media content is often analyzed through automated algorithmic approaches, with respect to being unsuitable or harmful for different groups such as women and children. There is, however, a noticeable gap in the exploration of positive content, particularly in the case of multimodal content such as GIFs. Therefore, the present work attempted to address this gap by introducing a high-quality annotated dataset of animated GIFs. The dataset provides for two subtasks: 1) subtask 1 involves binary classification, determining whether a GIF provides emotional support; and 2) subtask 2 involves multiclass classification, wherein the GIFs are categorized into three different emotional support categories. The data annotation quality is assessed using Fleiss' kappa. Various unimodal models, utilizing text-only and image-only approaches, are implemented. Additionally, an effective multimodal approach is proposed that combines visual and textual information for detecting emotional support in animated GIFs. Both sequence and frame-level visual features are extracted from animated GIFs and utilized for classification tasks. The proposed multimodal long-term spatiotemporal model employs a weighted late fusion technique. The results obtained show that the proposed multimodal model outperformed the implemented unimodal models for both subtasks. The proposed LTST model achieved a weighted F1-score of 0.8304 and 0.7180 for subtask 1 and subtask 2, respectively. The experimental work and analysis confirm the suitability of the dataset and proposed algorithmic model for the task.
Published in: IEEE Transactions on Computational Social Systems ( Early Access )