Overview of the proposed graph embedding learning framework. The framework combines a supervised learning with a graph context learning process.
Abstract:
The main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventiona...Show MoreMetadata
Abstract:
The main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventional approaches tend to construct and optimize a linear projection or train a complex architecture of deep layers, yet it is difficult to compromise between accuracy and efficiency on modeling multimodal data. This paper proposes a novel graph-embedding learning framework implemented by neural networks. The learned embedding directly approximates the cross-modal aligned representation to perform cross-modal retrieval and image classification combining text information. Proposed framework extracts learned representation from a graph model and, simultaneously, trains a classifier under semi-supervised settings. For optimization, unlike previous methods based on the graph Laplacian regularization, a sampling strategy is adopted to generate training pairs to fully explore the inter-modal and intra-modal similarity relationship. Experimental results on various datasets show that the proposed framework outperforms other state-of-the-art methods on crossmodal retrieval. The framework also demonstrates convincing improvements on the new issue of image classification combining text information on Wiki dataset.
Overview of the proposed graph embedding learning framework. The framework combines a supervised learning with a graph context learning process.
Published in: IEEE Access ( Volume: 6)