Journals & Magazines >IEEE Transactions on Pattern ... >Volume: 43 Issue: 2

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant...Show More

Metadata

Abstract:

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 43, Issue: 2, 01 February 2021)

Page(s): 532 - 548

Date of Publication: 26 August 2019

ISSN Information:

PubMed ID: 31449005

DOI: 10.1109/TPAMI.2019.2937086

Funding Agency:

Minghui Liao

School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China

Minghui Liao received the BS degree from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2016. He is currently working toward the PhD degree in the School of Electronic Information and Communications, HUST. His main research interests include scene text detection and recognition.

Pengyuan Lyu

School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China

Pengyuan Lyu received the BS and MS degrees from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2015 and 2018, respectively. He is currently with Tencent YouTu Lab, Shenzhen, China. His main research interests include scene text detection and recognition.

Minghang He

School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China

Minghang He is currently working toward the graduate degree in the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China. His main research interests include text detection and segmentation.

Cong Yao

Megvii (Face++) Inc., Beijing, China

Cong Yao received the BS and PhD degrees in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2008 and 2014, respectively. He is currently with Megvii Inc., Beijing, China. He was a research intern at Microsoft Research Asia (MSRA), Beijing, China, from 2011 to 2012. He was a visiting research scholar with Temple University, Philadelphia, PA, in 2013. H...Show More

Wenhao Wu

Megvii (Face++) Inc., Beijing, China

Wenhao Wu received the BE degree in mechanical engineering from Tsinghua University. He is senior vice president of Megvii (Face++), in charge of cloud services and product strategy of the company. He started his career in 2002 and has served as a series of role from software development, pre-sales, to product management and marketing on large scale-out clusters and cloud service based products. He won the 1st class prize...Show More

Xiang Bai

School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China

Xiang Bai received the BS, MS, and PhD degrees from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2003, 2005, and 2009, respectively, all in electronics and information engineering. He is currently a professor with the School of Electronic Information and Communications, HUST. His research interests include object recognition, shape analysis, and OCR. He received IAPR/ICDAR Young Investigator ...Show More