Abstract:
Recently, weakly supervised methods for scene text spotter are increasingly popular with researchers due to their potential to significantly reduce dataset annotation eff...Show MoreMetadata
Abstract:
Recently, weakly supervised methods for scene text spotter are increasingly popular with researchers due to their potential to significantly reduce dataset annotation efforts. The latest progress in this field is text spotter based on single or multi-point annotations. However, this method struggles with the sensitivity of text recognition to the precise annotation location and fails to capture the relative positions and shapes of characters, leading to impaired recognition of texts with extensive rotations and flips. To address these challenges, this paper develops a novel method named Coarse-point-supervised Scene Text Spotter (Cps-STS). Cps-STS first utilizes a few approximate points as text location labels and introduces a learnable position modulation mechanism, easing the accuracy requirements for annotations and enhancing model robustness. Additionally, we incorporate a Spatial Compatibility Attention (SCA) module for text decoding to effectively utilize spatial data such as position and shape. This module fuses compound queries and global feature maps, serving as a bias in the SCA module to express text spatial morphology. In order to accurately locate and decode text content, we introduce features containing spatial morphology information and text content into the input features of the text decoder. By introducing features with spatial morphology information as bias terms into the text decoder, ablation experiments demonstrate that this operation enables the model to effectively identify and utilize the relationship between text content and position to enhance the recognition performance of our model. One significant advantage of Cps-STS is its ability to achieve full supervision-level performance with just a few imprecise coarse points at a low cost. Extensive experiments validate the effectiveness and superiority of Cps-STS over existing approaches.
Published in: IEEE Transactions on Multimedia ( Volume: 27)
Funding Agency:

FIST LAB, School of Information Science and Engineering, Yunnan University, Kunming, Yunnan, China
Yunnan United Vision Innovations Technology Company Ltd., Kunming, Yunnan, China
Weida Chen received the B.S. degree from the Kunming University of Science and Technology, Kunming, China, in 2019,and the master’s degree from Yunnan University, Kunmingin 2023. His research interests include scene text localization, multimodal large models, and embodied intelligence.
Weida Chen received the B.S. degree from the Kunming University of Science and Technology, Kunming, China, in 2019,and the master’s degree from Yunnan University, Kunmingin 2023. His research interests include scene text localization, multimodal large models, and embodied intelligence.View more

Tencent, Shenzhen, Guangdong, China
Jie Jiang received the Ph.D. degree from Peking University, Beijing, China and has long been devoted to distributed architecture and Big Data computing. He is well-known in China for his expertise in data science, and a member of CCF Task Force on Big Data. He has been invited to give keynote speeches at SACC and Hadoop in China for many times.
Jie Jiang received the Ph.D. degree from Peking University, Beijing, China and has long been devoted to distributed architecture and Big Data computing. He is well-known in China for his expertise in data science, and a member of CCF Task Force on Big Data. He has been invited to give keynote speeches at SACC and Hadoop in China for many times.View more

FIST LAB, School of Information Science and Engineering, Yunnan University, Kunming, Yunnan, China
Yunnan United Vision Innovations Technology Company Ltd., Kunming, Yunnan, China
Linfei Wang received the B.S. degree from the Kunming University of Science and Technology, Kunming, China, in 2017. He is currently working toward the Ph.D. degree with the School of Information Science and Engineering, Yunnan University, Kunming, China. His research interests include object detection, computer vision, robot perception, and machine learning.
Linfei Wang received the B.S. degree from the Kunming University of Science and Technology, Kunming, China, in 2017. He is currently working toward the Ph.D. degree with the School of Information Science and Engineering, Yunnan University, Kunming, China. His research interests include object detection, computer vision, robot perception, and machine learning.View more

School of Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
Huafeng Li received the M.S. degree in applied mathematics major from Chongqing University, Chongqing, China in 2009 and received the Ph.D. degree in control theory and control engineering major from Chongqing University in 2012. He is currently a Professor with the School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China. He has authored or coauthored more than 50 sci...Show More
Huafeng Li received the M.S. degree in applied mathematics major from Chongqing University, Chongqing, China in 2009 and received the Ph.D. degree in control theory and control engineering major from Chongqing University in 2012. He is currently a Professor with the School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China. He has authored or coauthored more than 50 sci...View more

JD Explore Academy, Beijing, China
Yibing Zhan (Member, IEEE) received the bachelor's and doctor's degrees from the School of Information Science and Technology, University of Science and Technology of China, Hefei, China, in 2012 and 2018, respectively. From 2018 to 2020, he was an Associate Researcher with the School of Computer Science, Hangzhou Dianzi University, Hangzhou, China. He is currently with the JD Explore Academy as an Algorithm Scientist. He...Show More
Yibing Zhan (Member, IEEE) received the bachelor's and doctor's degrees from the School of Information Science and Technology, University of Science and Technology of China, Hefei, China, in 2012 and 2018, respectively. From 2018 to 2020, he was an Associate Researcher with the School of Computer Science, Hangzhou Dianzi University, Hangzhou, China. He is currently with the JD Explore Academy as an Algorithm Scientist. He...View more

FIST LAB, School of Information Science and Engineering, Yunnan University, Kunming, Yunnan, China
Yunnan United Vision Innovations Technology Company Ltd., Kunming, Yunnan, China
Dapeng Tao (Member, IEEE) is currently a Professor with the School of Information Science and Engineering, Yunnan University, Kunming, China. He was a Doctoral Advisor of Computer Science and Technology and a Doctoral Advisor of Control Science and Engineering with the University of the Chinese Academy of Sciences, Beijing, China. He is mainly engaged in research in the field of artificial intelligence. Prof. Tao was a Sp...Show More
Dapeng Tao (Member, IEEE) is currently a Professor with the School of Information Science and Engineering, Yunnan University, Kunming, China. He was a Doctoral Advisor of Computer Science and Technology and a Doctoral Advisor of Control Science and Engineering with the University of the Chinese Academy of Sciences, Beijing, China. He is mainly engaged in research in the field of artificial intelligence. Prof. Tao was a Sp...View more

FIST LAB, School of Information Science and Engineering, Yunnan University, Kunming, Yunnan, China
Yunnan United Vision Innovations Technology Company Ltd., Kunming, Yunnan, China
Weida Chen received the B.S. degree from the Kunming University of Science and Technology, Kunming, China, in 2019,and the master’s degree from Yunnan University, Kunmingin 2023. His research interests include scene text localization, multimodal large models, and embodied intelligence.
Weida Chen received the B.S. degree from the Kunming University of Science and Technology, Kunming, China, in 2019,and the master’s degree from Yunnan University, Kunmingin 2023. His research interests include scene text localization, multimodal large models, and embodied intelligence.View more

Tencent, Shenzhen, Guangdong, China
Jie Jiang received the Ph.D. degree from Peking University, Beijing, China and has long been devoted to distributed architecture and Big Data computing. He is well-known in China for his expertise in data science, and a member of CCF Task Force on Big Data. He has been invited to give keynote speeches at SACC and Hadoop in China for many times.
Jie Jiang received the Ph.D. degree from Peking University, Beijing, China and has long been devoted to distributed architecture and Big Data computing. He is well-known in China for his expertise in data science, and a member of CCF Task Force on Big Data. He has been invited to give keynote speeches at SACC and Hadoop in China for many times.View more

FIST LAB, School of Information Science and Engineering, Yunnan University, Kunming, Yunnan, China
Yunnan United Vision Innovations Technology Company Ltd., Kunming, Yunnan, China
Linfei Wang received the B.S. degree from the Kunming University of Science and Technology, Kunming, China, in 2017. He is currently working toward the Ph.D. degree with the School of Information Science and Engineering, Yunnan University, Kunming, China. His research interests include object detection, computer vision, robot perception, and machine learning.
Linfei Wang received the B.S. degree from the Kunming University of Science and Technology, Kunming, China, in 2017. He is currently working toward the Ph.D. degree with the School of Information Science and Engineering, Yunnan University, Kunming, China. His research interests include object detection, computer vision, robot perception, and machine learning.View more

School of Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
Huafeng Li received the M.S. degree in applied mathematics major from Chongqing University, Chongqing, China in 2009 and received the Ph.D. degree in control theory and control engineering major from Chongqing University in 2012. He is currently a Professor with the School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China. He has authored or coauthored more than 50 scientific articles in CVPR, IJCV, AAAI, ICME. His research interests include image processing, computer vision, and information fusion.
Huafeng Li received the M.S. degree in applied mathematics major from Chongqing University, Chongqing, China in 2009 and received the Ph.D. degree in control theory and control engineering major from Chongqing University in 2012. He is currently a Professor with the School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China. He has authored or coauthored more than 50 scientific articles in CVPR, IJCV, AAAI, ICME. His research interests include image processing, computer vision, and information fusion.View more

JD Explore Academy, Beijing, China
Yibing Zhan (Member, IEEE) received the bachelor's and doctor's degrees from the School of Information Science and Technology, University of Science and Technology of China, Hefei, China, in 2012 and 2018, respectively. From 2018 to 2020, he was an Associate Researcher with the School of Computer Science, Hangzhou Dianzi University, Hangzhou, China. He is currently with the JD Explore Academy as an Algorithm Scientist. He has authored or coauthored many scientific papers in top conferences and journals such as NeurIPS, CVPR, ACM MM, ICCV. His research interests include graph generation, foundation model, and graph neural networks.
Yibing Zhan (Member, IEEE) received the bachelor's and doctor's degrees from the School of Information Science and Technology, University of Science and Technology of China, Hefei, China, in 2012 and 2018, respectively. From 2018 to 2020, he was an Associate Researcher with the School of Computer Science, Hangzhou Dianzi University, Hangzhou, China. He is currently with the JD Explore Academy as an Algorithm Scientist. He has authored or coauthored many scientific papers in top conferences and journals such as NeurIPS, CVPR, ACM MM, ICCV. His research interests include graph generation, foundation model, and graph neural networks.View more

FIST LAB, School of Information Science and Engineering, Yunnan University, Kunming, Yunnan, China
Yunnan United Vision Innovations Technology Company Ltd., Kunming, Yunnan, China
Dapeng Tao (Member, IEEE) is currently a Professor with the School of Information Science and Engineering, Yunnan University, Kunming, China. He was a Doctoral Advisor of Computer Science and Technology and a Doctoral Advisor of Control Science and Engineering with the University of the Chinese Academy of Sciences, Beijing, China. He is mainly engaged in research in the field of artificial intelligence. Prof. Tao was a Special Reviewer and a Guest Editor for more than ten international academic journals, including IEEE Transactions on Multimedia.
Dapeng Tao (Member, IEEE) is currently a Professor with the School of Information Science and Engineering, Yunnan University, Kunming, China. He was a Doctoral Advisor of Computer Science and Technology and a Doctoral Advisor of Control Science and Engineering with the University of the Chinese Academy of Sciences, Beijing, China. He is mainly engaged in research in the field of artificial intelligence. Prof. Tao was a Special Reviewer and a Guest Editor for more than ten international academic journals, including IEEE Transactions on Multimedia.View more