Abstract:
Object detection on images can find benefit from coupling multiple spectra, each presenting specific useful features. However, building an efficient architecture coupling...Show MoreMetadata
Abstract:
Object detection on images can find benefit from coupling multiple spectra, each presenting specific useful features. However, building an efficient architecture coupling the different modalities is a complex task. Transformers, due to their ability to extract meaningful correlations between the different regions of the inputs appear as a promising way to perform features fusion across different spectra. This work presents a multi-spectral object detection architecture based on cross-attention features fusion (CAFF), combined with a transformer based detector (DINO). We demonstrate here the performance of the proposed approach in object detection compared with state-of-the-art approaches, on infrared-visible multi-spectral datasets. Moreover the robustness to systematic misalignment between image pairs is studied. The proposed approach is generic to any mono-spectrum transformer based detectors. The model developed in this study will be available in a dedicated github repository.
Date of Conference: 17-18 June 2024
Date Added to IEEE Xplore: 27 September 2024
ISBN Information: