Skip to Main Content
Semantic image retrieval using text such keywords or captions at different semantic levels has attracted considerable research attention in recent years. Automatic image annotation (AIA) has been proved to be an effective and promising solution to automatically deduce the high-level semantics from low-level visual features. Due to the complexity of image-label mapping, it has become a challenge to systematically develop precise models with better performance. In this paper, we try to address the problem using latent semantic indexing (LSI) together with mixed bagging (bootstrap aggregating). Given a training set of annotated images, we first perform vector quantizing of visual space to construct a visual vocabulary consisting of visual terms based on K-means constrained by the semantic relationships derived from the associated annotations using latent semantic indexing; then the statistical hidden correlation between the visual terms and keywords is modeled by a two-level mixed bagging model. At the base level, each multi-class SVM is trained independently using the replicate training set via Bootstrap technique, and the decision of each multi-class SVM is integrated based on average aggregating to generate a combined confidence label vector (CLV) associated with each visual term. All the CLVs are then propagated to the subsequent language model fusion level. At his level, WordNet is used for re-weighting the CLV for each visual term by incorporating the word correlations to obtain an appropriate and consistent image annotation. We carried out experiments on a medium-sized image collection with about 1000 images from corel stock photo CDs. The experimental results demonstrated that the annotation performance of this method outperforms some traditional approaches, showing the feasibility and effectiveness of the proposed unified framework into which semantic image classification and language models are seamlessly integrated.