Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes | IEEE Journals & Magazine | IEEE Xplore

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes


Abstract:

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant...Show More

Abstract:

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network named as Mask TextSpotter is presented. Different from the previous text spotters that follow the pipeline consisting of a proposal generation network and a sequence-to-sequence recognition network, Mask TextSpotter enjoys a simple and smooth end-to-end learning procedure, in which both detection and recognition can be achieved directly from two-dimensional space via semantic segmentation. Further, a spatial attention module is proposed to enhance the performance and universality. Benefiting from the proposed two-dimensional representation on both detection and recognition, it easily handles text instances of irregular shapes, for instance, curved text. We evaluate it on four English datasets and one multi-language dataset, achieving consistently superior performance over state-of-the-art methods in both detection and end-to-end text recognition tasks. Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 43, Issue: 2, 01 February 2021)
Page(s): 532 - 548
Date of Publication: 26 August 2019

ISSN Information:

PubMed ID: 31449005

Funding Agency:

Author image of Minghui Liao
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Minghui Liao received the BS degree from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2016. He is currently working toward the PhD degree in the School of Electronic Information and Communications, HUST. His main research interests include scene text detection and recognition.
Minghui Liao received the BS degree from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2016. He is currently working toward the PhD degree in the School of Electronic Information and Communications, HUST. His main research interests include scene text detection and recognition.View more
Author image of Pengyuan Lyu
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Pengyuan Lyu received the BS and MS degrees from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2015 and 2018, respectively. He is currently with Tencent YouTu Lab, Shenzhen, China. His main research interests include scene text detection and recognition.
Pengyuan Lyu received the BS and MS degrees from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2015 and 2018, respectively. He is currently with Tencent YouTu Lab, Shenzhen, China. His main research interests include scene text detection and recognition.View more
Author image of Minghang He
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Minghang He is currently working toward the graduate degree in the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China. His main research interests include text detection and segmentation.
Minghang He is currently working toward the graduate degree in the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China. His main research interests include text detection and segmentation.View more
Author image of Cong Yao
Megvii (Face++) Inc., Beijing, China
Cong Yao received the BS and PhD degrees in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2008 and 2014, respectively. He is currently with Megvii Inc., Beijing, China. He was a research intern at Microsoft Research Asia (MSRA), Beijing, China, from 2011 to 2012. He was a visiting research scholar with Temple University, Philadelphia, PA, in 2013. H...Show More
Cong Yao received the BS and PhD degrees in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2008 and 2014, respectively. He is currently with Megvii Inc., Beijing, China. He was a research intern at Microsoft Research Asia (MSRA), Beijing, China, from 2011 to 2012. He was a visiting research scholar with Temple University, Philadelphia, PA, in 2013. H...View more
Author image of Wenhao Wu
Megvii (Face++) Inc., Beijing, China
Wenhao Wu received the BE degree in mechanical engineering from Tsinghua University. He is senior vice president of Megvii (Face++), in charge of cloud services and product strategy of the company. He started his career in 2002 and has served as a series of role from software development, pre-sales, to product management and marketing on large scale-out clusters and cloud service based products. He won the 1st class prize...Show More
Wenhao Wu received the BE degree in mechanical engineering from Tsinghua University. He is senior vice president of Megvii (Face++), in charge of cloud services and product strategy of the company. He started his career in 2002 and has served as a series of role from software development, pre-sales, to product management and marketing on large scale-out clusters and cloud service based products. He won the 1st class prize...View more
Author image of Xiang Bai
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Xiang Bai received the BS, MS, and PhD degrees from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2003, 2005, and 2009, respectively, all in electronics and information engineering. He is currently a professor with the School of Electronic Information and Communications, HUST. His research interests include object recognition, shape analysis, and OCR. He received IAPR/ICDAR Young Investigator ...Show More
Xiang Bai received the BS, MS, and PhD degrees from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2003, 2005, and 2009, respectively, all in electronics and information engineering. He is currently a professor with the School of Electronic Information and Communications, HUST. His research interests include object recognition, shape analysis, and OCR. He received IAPR/ICDAR Young Investigator ...View more

Author image of Minghui Liao
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Minghui Liao received the BS degree from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2016. He is currently working toward the PhD degree in the School of Electronic Information and Communications, HUST. His main research interests include scene text detection and recognition.
Minghui Liao received the BS degree from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2016. He is currently working toward the PhD degree in the School of Electronic Information and Communications, HUST. His main research interests include scene text detection and recognition.View more
Author image of Pengyuan Lyu
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Pengyuan Lyu received the BS and MS degrees from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2015 and 2018, respectively. He is currently with Tencent YouTu Lab, Shenzhen, China. His main research interests include scene text detection and recognition.
Pengyuan Lyu received the BS and MS degrees from the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China, in 2015 and 2018, respectively. He is currently with Tencent YouTu Lab, Shenzhen, China. His main research interests include scene text detection and recognition.View more
Author image of Minghang He
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Minghang He is currently working toward the graduate degree in the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China. His main research interests include text detection and segmentation.
Minghang He is currently working toward the graduate degree in the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), China. His main research interests include text detection and segmentation.View more
Author image of Cong Yao
Megvii (Face++) Inc., Beijing, China
Cong Yao received the BS and PhD degrees in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2008 and 2014, respectively. He is currently with Megvii Inc., Beijing, China. He was a research intern at Microsoft Research Asia (MSRA), Beijing, China, from 2011 to 2012. He was a visiting research scholar with Temple University, Philadelphia, PA, in 2013. His research has focused on computer vision and machine learning, in particular, the area of text detection and recognition in natural images.
Cong Yao received the BS and PhD degrees in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2008 and 2014, respectively. He is currently with Megvii Inc., Beijing, China. He was a research intern at Microsoft Research Asia (MSRA), Beijing, China, from 2011 to 2012. He was a visiting research scholar with Temple University, Philadelphia, PA, in 2013. His research has focused on computer vision and machine learning, in particular, the area of text detection and recognition in natural images.View more
Author image of Wenhao Wu
Megvii (Face++) Inc., Beijing, China
Wenhao Wu received the BE degree in mechanical engineering from Tsinghua University. He is senior vice president of Megvii (Face++), in charge of cloud services and product strategy of the company. He started his career in 2002 and has served as a series of role from software development, pre-sales, to product management and marketing on large scale-out clusters and cloud service based products. He won the 1st class prize, in 1994 China Mathematics Olympics (CMO).
Wenhao Wu received the BE degree in mechanical engineering from Tsinghua University. He is senior vice president of Megvii (Face++), in charge of cloud services and product strategy of the company. He started his career in 2002 and has served as a series of role from software development, pre-sales, to product management and marketing on large scale-out clusters and cloud service based products. He won the 1st class prize, in 1994 China Mathematics Olympics (CMO).View more
Author image of Xiang Bai
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Xiang Bai received the BS, MS, and PhD degrees from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2003, 2005, and 2009, respectively, all in electronics and information engineering. He is currently a professor with the School of Electronic Information and Communications, HUST. His research interests include object recognition, shape analysis, and OCR. He received IAPR/ICDAR Young Investigator Award, in 2019. He is an associate editor for the Pattern Recognition, the Pattern Recognition Letters, and the Frontiers of Computer Science.
Xiang Bai received the BS, MS, and PhD degrees from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2003, 2005, and 2009, respectively, all in electronics and information engineering. He is currently a professor with the School of Electronic Information and Communications, HUST. His research interests include object recognition, shape analysis, and OCR. He received IAPR/ICDAR Young Investigator Award, in 2019. He is an associate editor for the Pattern Recognition, the Pattern Recognition Letters, and the Frontiers of Computer Science.View more

Contact IEEE to Subscribe

References

References is not available for this document.