By Topic

Cross-Media Image Retrieval via Latent Semantic Indexing and Mixed Bagging

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Jing Guo ; Grad. Univ., Manage. Sch., Chinese Acad. of Sci., Beijing, China ; Xianjun Liao

Semantic image retrieval using text such keywords or captions at different semantic levels has attracted considerable research attention in recent years. Automatic image annotation (AIA) has been proved to be an effective and promising solution to automatically deduce the high-level semantics from low-level visual features. Due to the complexity of image-label mapping, it has become a challenge to systematically develop precise models with better performance. In this paper, we try to address the problem using latent semantic indexing (LSI) together with mixed bagging (bootstrap aggregating). Given a training set of annotated images, we first perform vector quantizing of visual space to construct a visual vocabulary consisting of visual terms based on K-means constrained by the semantic relationships derived from the associated annotations using latent semantic indexing; then the statistical hidden correlation between the visual terms and keywords is modeled by a two-level mixed bagging model. At the base level, each multi-class SVM is trained independently using the replicate training set via Bootstrap technique, and the decision of each multi-class SVM is integrated based on average aggregating to generate a combined confidence label vector (CLV) associated with each visual term. All the CLVs are then propagated to the subsequent language model fusion level. At his level, WordNet is used for re-weighting the CLV for each visual term by incorporating the word correlations to obtain an appropriate and consistent image annotation. We carried out experiments on a medium-sized image collection with about 1000 images from corel stock photo CDs. The experimental results demonstrated that the annotation performance of this method outperforms some traditional approaches, showing the feasibility and effectiveness of the proposed unified framework into which semantic image classification and language models are seamlessly integrated.

Published in:

Computer Science and Information Engineering, 2009 WRI World Congress on  (Volume:4 )

Date of Conference:

March 31 2009-April 2 2009