Loading [MathJax]/extensions/MathMenu.js
Beyond Vision: A Multimodal Recurrent Attention Convolutional Neural Network for Unified Image Aesthetic Prediction Tasks | IEEE Journals & Magazine | IEEE Xplore

Beyond Vision: A Multimodal Recurrent Attention Convolutional Neural Network for Unified Image Aesthetic Prediction Tasks


Abstract:

Over the past few years, image aesthetic prediction has attracted increasing attention because of its wide applications, such as image retrieval, photo album management a...Show More

Abstract:

Over the past few years, image aesthetic prediction has attracted increasing attention because of its wide applications, such as image retrieval, photo album management and aesthetic-driven image enhancement. However, previous studies in this area only achieve limited success because 1) they primarily depend on visual features and ignore textual information. 2) they tend to focus equally on to each part of images and ignore the selective attention mechanism. This paper overcomes these limitations by proposing a novel multimodal recurrent attention convolutional neural network (MRACNN). More specifically, the MRACNN consists of two streams: the vision stream and the language stream. The former employs the recurrent attention network to tune out irrelevant information and focuses on some key regions to extract visual features. The latter utilizes the Text-CNN to capture the high-level semantics of user comments. Finally, a multimodal factorized bilinear (MFB) pooling approach is used to achieve effective fusion of textual and visual features. Extensive experiments demonstrate that the proposed MRACNN significantly outperforms state-of-the-art methods for unified aesthetic prediction tasks: (i) aesthetic quality classification; (ii) aesthetic score regression; and (iii) aesthetic score distribution prediction.
Published in: IEEE Transactions on Multimedia ( Volume: 23)
Page(s): 611 - 623
Date of Publication: 06 April 2020

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.