Multimodal Crowd Counting with Mutual Attention Transformers | IEEE Conference Publication | IEEE Xplore

Multimodal Crowd Counting with Mutual Attention Transformers


Abstract:

Crowd counting is a fundamental yet challenging task that aims to automatically estimate the number of people in crowded scenes. Nowadays, with the rapid development of t...Show More

Abstract:

Crowd counting is a fundamental yet challenging task that aims to automatically estimate the number of people in crowded scenes. Nowadays, with the rapid development of thermal and depth sensors, thermal images and depth maps become more accessible, which are proven to be beneficial information in boosting the performance of crowd counting. Consequently, we propose a Mutual Attention Transformer (MAT) module to fully leverage the complementary information of different modalities. Specifically, our MAT employs a cross-modal mutual attention mechanism to utilize the features of one modality to enhance the features of the other. Moreover, to improve performance by learning better visual representation and further exploiting modality-wise comple-mentarity, we design a self-supervised pre-training method based on cross-modal image reconstruction. Extensive experiments on two standard benchmarks (i.e., RGBT-CC and ShanghaiTechRGBD) show that the proposed method is effective and universal for multimodal crowd counting, outper-forming previous state-of-the-art methods.
Date of Conference: 18-22 July 2022
Date Added to IEEE Xplore: 26 August 2022
ISBN Information:

ISSN Information:

Conference Location: Taipei, Taiwan

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.