I. Introduction
Deep learning has made tremendous progress in the field of object detection [2], [44], which can be applied in autonomous vehicles, surveillance systems, and security, identifying and tracking objects of interest. However, one major criticism is the heavy reliance on large-scale annotated datasets, which are both resource- and time-intensive to acquire. Obtaining a sufficient amount of labeled data can often be challenging [31], and the lack of labeled data presents a substantial obstacle in real-world scenarios such as medical image analysis, deep-sea exploration, and rare object recognition. Few-shot object detection (FSOD), which aims to train an object detector that can generalize effectively with just a few numbers of annotated samples, has emerged as a solution to these problems.