Loading [MathJax]/extensions/MathMenu.js
DTTR: Detecting Text with Transformers | IEEE Conference Publication | IEEE Xplore

DTTR: Detecting Text with Transformers


Abstract:

Recently, most transformer-based approaches have achieved considerable success on vision tasks, even better than those with convolution neural networks (CNNs). In this pa...Show More

Abstract:

Recently, most transformer-based approaches have achieved considerable success on vision tasks, even better than those with convolution neural networks (CNNs). In this paper, we present a novel transformer-based model, named detecting text with transformers (DTTR), for scene text detection. In DTTR, a CNN backbone extracts local connectivity features and a transformer decoder captures global context information from a scene text, effectively. In addition, we propose a dynamic scale fusion (DSF) module that can fuse multiscale feature maps dynamically, thus significantly improving the scale robustness and rendering powerful representations for subsequent decoding. Experimental results show that DTTR achieves 0.5% H-mean improvements and 20.0% faster in inference speed than the SOTA model with a backbone of ResNet-50 on MMOCR. Code will be released at: https://github.com/ahsdx/DTTR.
Date of Conference: 04-10 June 2023
Date Added to IEEE Xplore: 05 May 2023
ISBN Information:

ISSN Information:

Conference Location: Rhodes Island, Greece

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.