Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning | IEEE Conference Publication | IEEE Xplore

Core-Text: Improving Scene Text Detection with Contrastive Relational Reasoning


Abstract:

Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision. Nevertheless, owing to the extremely varied aspect ratios and scale...Show More

Abstract:

Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision. Nevertheless, owing to the extremely varied aspect ratios and scales of text instances in real scenes, most conventional text detectors suffer from the sub-text problem that only localizes the fragments of text instance (i.e., sub-texts). In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module, to mitigate that issue. CORE first leverages a vanilla relation block to model the relations among all text proposals (sub-texts of multiple text instances) and further enhances relational reasoning via instance-level sub-text discrimination in a contrastive manner. Such way naturally learns instance-aware representations of text proposals and thus facilitates scene text detection. We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text. Extensive experiments on four benchmarks demonstrate the superiority of CORE-Text.
Date of Conference: 05-09 July 2021
Date Added to IEEE Xplore: 09 June 2021
ISBN Information:

ISSN Information:

Conference Location: Shenzhen, China
References is not available for this document.

Select All
1.
Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie, “Feature pyramid networks for object detection,” in CVPR, 2017.
2.
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE Trans. on PAMI, 2017.
3.
Qi Cai, Yingwei Pan, Yu Wang, et al, “Learning a unified sample weighting network for object detection,” in CVPR, 2020.
4.
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick, “Mask R-CNN,” in ICCV, 2017.
5.
Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015.
6.
Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao, “Detecting text in natural image with connectionist text proposal network,” in ECCV, 2016.
7.
Baoguang Shi, Xiang Bai, and Serge J. Belongie, “Detecting oriented text in natural images by linking segments,” in CVPR, 2017.
8.
Chuhui Xue, Shijian Lu, and Fangneng Zhan, “Accurate scene text detection through border semantics awareness and boot-strapping,” in ECCV, 2018.
9.
Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, et al, “Deep relational reasoning graph network for arbitrary shape text detection,” in CVPR, 2020.
10.
Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, and Qingjie Liu, “Pyramid mask text detector,” in CVPR, 2019.
11.
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, and Yichen Wei, “Relation networks for object detection,” in CVPR, 2018.
12.
Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao, “Shape robust text detection with progressive scale expansion network,” in CVPR, 2019.
13.
Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai, “Real-time scene text detection with differentiable binarization,” in AAAI, 2020.
14.
Qi Cai, Yingwei Pan, et al, “Exploring object relation in mean teacher for cross-domain detection,” in CVPR, 2019.
15.
Jiajun Deng, Yingwei Pan, Ting Yao, et al, “Relation distillation networks for video object detection,” in ICCV, 2019.
16.
Xiaolong Wang, Ross B. Girshick, Abhinav Gupta, and Kaiming He, “Non-local neural networks,” in CVPR, 2018.
17.
Yingwei Pan, Yehao Li, Ting Yao, Tao Mei, et al, “Learning deep intrinsic video representation by exploring temporal coherence and graph structure,” in IJCAI, 2016.
18.
Yingwei Pan, Ting Yao, Yehao Li, and Tao Mei, “X-linear attention networks for image captioning,” in CVPR, 2020.
19.
Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei, “Exploring visual relationship for image captioning,” in ECCV, 2018.
20.
Raia Hadsell, Sumit Chopra, et al, “Dimensionality reduction by learning an invariant mapping,” in CVPR, 2006.
21.
Qi Cai, Yu Wang, Yingwei Pan, et al, “Joint contrastive learning with infinite possibilities,” in NeurIPS, 2020.
22.
Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, and Tao Mei, “Seco: Exploring sequence supervision for unsupervised representation learning,” in AAAI, 2021.
23.
Aäron van den Oord, Yazhe Li, and Oriol Vinyals, “Representation learning with contrastive predictive coding,” arXiv:1807.03748, 2018.
24.
Nibal Nayef, Fei Yin, et al, “ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT,” in ICDAR, 2017.
25.
Dimosthenis Karatzas, Lluis Gomez-Bigorda, et al, “ICDAR 2015 competition on robust reading,” in ICDAR, 2015.
26.
Yuliang Liu, Lianwen Jin, Shuaitao Zhang, and Sheng Zhang, “Detecting curve text in the wild: New dataset and new solution,” arXiv:1712.02170, 2017.
27.
Chee Kheng Chng and Chee Seng Chan, “Total-text: A comprehensive dataset for scene text detection and recognition,” in ICDAR, 2017.
28.
Olga Russakovsky, Jia Deng, et al, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., 2015.
29.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in CVPR 2016.

Contact IEEE to Subscribe

References

References is not available for this document.