Loading [MathJax]/extensions/MathMenu.js
MiniCPM-V LLaMA Model for Image Recognition: A Case Study on Satellite Datasets | IEEE Journals & Magazine | IEEE Xplore

MiniCPM-V LLaMA Model for Image Recognition: A Case Study on Satellite Datasets

;

Abstract:

This study evaluates the performance of the MiniCPM-V model on four distinct satellite image datasets: MAI, RSICD, RSSCN7, and a newly created merged dataset that combine...Show More

Abstract:

This study evaluates the performance of the MiniCPM-V model on four distinct satellite image datasets: MAI, RSICD, RSSCN7, and a newly created merged dataset that combines these three. The merged dataset was developed to expand the generalization and variation of data distribution associated with the labeling and training processes inherent in satellite image analysis. We systematically collected prediction results for each individual dataset and conducted a comparative analysis against results reported in previous studies to benchmark the model's effectiveness. The findings indicate that large language models (LLMs), such as MiniCPM-V, exhibit promising capabilities in the realm of satellite image recognition. On the RSSCN7 dataset, MiniCPM-V achieved an accuracy of 70.57%, while on RSICD it reached 62.19%, on MAI 7.01%, and on the merged dataset 43.49% . Specifically, the model demonstrated mostly high accuracy (more than 80% ) in identifying a majority of object classes across the datasets. Also, we identified, it underperformed in accurately classifying certain object categories and recognizing all objects in multilabeled images, which suggests that while the model is robust overall, there are specific areas where its performance can be enhanced. Despite these limitations, the successful recognition of most objects underscores the potential of LLMs in advancing satellite imagery analysis. These results highlight the significant potential of integrating LLMs into remote sensing applications, offering a foundation for future research aimed at improving classification accuracy and expanding the range of detectable object classes by having caption level textual information.
Page(s): 7892 - 7903
Date of Publication: 03 March 2025

ISSN Information:

Funding Agency: