Loading [a11y]/accessibility-menu.js
Object Detection through Simulated Annealing for CLIP-YOLO Integration | IEEE Conference Publication | IEEE Xplore

Object Detection through Simulated Annealing for CLIP-YOLO Integration


Abstract:

With the development of multimodal learning technology, the demand for combining text information to achieve more accurate object detection tasks is growing. Traditional ...Show More

Abstract:

With the development of multimodal learning technology, the demand for combining text information to achieve more accurate object detection tasks is growing. Traditional object detection models usually rely on specific visual features. However, with the increase in task complexity, pure visual information is often not enough to meet actual needs. Therefore, it has become a trend to introduce semantic information of natural language. This paper proposes a multimodal object detection model that combines the CLIP model with the YOLO model and optimizes it using the simulated annealing algorithm. First, YOLO is used for preliminary object detection to obtain bounding boxes and category predictions. Then, CLIP, which has been fine-tuned using the Chinese dataset, is used to enhance the features of each detected object region, calculate the similarity between the image and the text description, and set and optimize the image-text matching threshold to 86% using the simulated annealing algorithm to achieve more accurate reclassification. We introduce the contrast loss of CLIP into the YOLO loss function to form a joint optimization framework, which enables the model to improve the semantic understanding ability while retaining the efficiency of YOLO. Experimental results show that the combination of CLIP and YOLO significantly improves the performance of object detection, especially in complex scenes and multimodal information understanding. This research provides new ideas and methods for the future integration of computer vision and natural language processing.
Date of Conference: 29-31 December 2024
Date Added to IEEE Xplore: 03 March 2025
ISBN Information:
Conference Location: Changchun, China

Contact IEEE to Subscribe

References

References is not available for this document.