Skip to Main Content
The increasing number and resolution of earth observation (EO) imaging sensors has had a significant impact on both the acquired image data volume and the information content in images. There is consequently a strong need for highly efficient search tools for EO image databases and for search methods to automatically identify and recognize structures within EO images. Content Based Image Retrieval (CBIR) and automatic image annotation systems have been designed to tackle the problem of image retrieval in large image databases. These two systems achieve a common goal, that is to learn the mapping function between low-level visual features and high-level image semantics. A setup, which has hardly been explored in annotating systems and which is the rule rather than the exception, is the case when the training database used to learn the mapping function is not exhaustive regarding semantic classes present in the images. This means that there exists unknown image classes for which there is no training examples in the training database. In this paper, we propose a semi-supervised method for auto-annotating satellite image databases and discovering unknown semantic image classes in these databases. The idea is to incorporate into the learning process the unannotated data which by definition contain the unknown image classes. The latter are considered to be latent structures in the data that appear when we train a hierarchical latent variable model with both the labeled and unlabeled data. We also show that, in our case, the use of unlabeled data leads to more reliable estimates regarding the model parameters. We present experimental results on a synthetic dataset, making a comparison of our algorithm with a semi-supervised Support Vector Machine (S3VM) on this dataset. We also demonstrate the effectiveness of our unknown image classes discovery procedure on a database of SPOT5 satellite images. We show that the results obtained on this database are rather positive since t- - he new structures detected correspond to semantic classes which are not represented in the training database.