Skip to Main Content
This paper proposes the concept of image and text association, a cornerstone of cross media web information fusion. Two learning methods for discovering the underlying associations between images and texts based on small training data sets are proposed. The first method is based on vague transformation measure; information similarity between the visual features and the textual features through a set of predefined domain specific information categories. Another method uses a neural network to learn direct mapping between the visual and textual features by automatically and incrementally summarizing the associated features into a set of information templates. The visual and the textual features are extracted based on the cosine similarity for which the resonance score is computed and the image text pair with the highest resonant score which matches the domain category similar to the image and the text will be retrieved. Gabor texture is used for clustering the images where every cluster represents some information. Based on the information, the samples would be trained and mapped to the corresponding category which in turn retrieves the actual text sentences. Finally image text pair is obtained and the proposed method exhibits better performance in terms of iterations and relevance.