Skip to Main Content
Content-based image retrieval techniques, although promising for handling large quantities of geospatial image data, are prone to creating overfitted models. This is due to the fact that supervised models most often capture patterns of existing observations and not those related to the whole population. This results in models that do not generalize well to new, undiscovered images. This article proposes a methodology to reduce overfitting when ranking high-resolution satellite images by domain semantics. Our approach uses PathFinder Network Scaling ensemble methods. We generate cross-fold co-occurrence measures for relevance of feature subspaces to each semantic. Each matrix is then reduced using the PathFinder network scaling algorithm. Irrelevant nodes are removed using node strength metrics resulting in an optimized model for ranking by semantic that generalizes better to new images. The results show that, when using this approach, the quality of ranking by semantic can be significantly improved. Mean Average Precision (MAP) of ranking over cross-fold experiments increased by 13.2% while standard deviation of MAP was reduced by 16.8% relative to experiments without PathFinder network scaling.