Skip to Main Content
In classical image classification approaches, low-level features have been used. But the high dimensionality of feature spaces poses a challenge in terms of feature selection and distance measurement during the clustering process. In this paper, we propose an approach to generate visual keyword and combine both visual and text keywords of the image to form a multimodal vector for image classification. This multimodality helps in extracting the image to image, text to text and text to image relations. A visual keyword is derived using vector quantization of image tiles. We arrange the visual keywords in a manner analogous to the term-document matrix in information retrieval. The visual keywords when combined with text keywords result in improvement in the quality of classification. We use a recently proposed nonlinear dimensionality reduction technique, diffusion maps, to reduce the dimensionality of the image representation. Our method is evaluated on two public datasets: LabelMe and Corel. The results support the conclusion that the proposed method of combining visual and text keywords is robust and produces good quality clusters.