Cps-STS: Bridging the Gap Between Content and Position for Coarse-Point-Supervised Scene Text Spotter | IEEE Journals & Magazine | IEEE Xplore

Cps-STS: Bridging the Gap Between Content and Position for Coarse-Point-Supervised Scene Text Spotter


Abstract:

Recently, weakly supervised methods for scene text spotter are increasingly popular with researchers due to their potential to significantly reduce dataset annotation eff...Show More

Abstract:

Recently, weakly supervised methods for scene text spotter are increasingly popular with researchers due to their potential to significantly reduce dataset annotation efforts. The latest progress in this field is text spotter based on single or multi-point annotations. However, this method struggles with the sensitivity of text recognition to the precise annotation location and fails to capture the relative positions and shapes of characters, leading to impaired recognition of texts with extensive rotations and flips. To address these challenges, this paper develops a novel method named Coarse-point-supervised Scene Text Spotter (Cps-STS). Cps-STS first utilizes a few approximate points as text location labels and introduces a learnable position modulation mechanism, easing the accuracy requirements for annotations and enhancing model robustness. Additionally, we incorporate a Spatial Compatibility Attention (SCA) module for text decoding to effectively utilize spatial data such as position and shape. This module fuses compound queries and global feature maps, serving as a bias in the SCA module to express text spatial morphology. In order to accurately locate and decode text content, we introduce features containing spatial morphology information and text content into the input features of the text decoder. By introducing features with spatial morphology information as bias terms into the text decoder, ablation experiments demonstrate that this operation enables the model to effectively identify and utilize the relationship between text content and position to enhance the recognition performance of our model. One significant advantage of Cps-STS is its ability to achieve full supervision-level performance with just a few imprecise coarse points at a low cost. Extensive experiments validate the effectiveness and superiority of Cps-STS over existing approaches.
Published in: IEEE Transactions on Multimedia ( Volume: 27)
Page(s): 1652 - 1664
Date of Publication: 04 April 2025

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.