Visual-Semantic Refinement Network: Towards Exploring the Capabilities of Decoder in Scene Text Recognition | IEEE Conference Publication | IEEE Xplore

Visual-Semantic Refinement Network: Towards Exploring the Capabilities of Decoder in Scene Text Recognition


Abstract:

Traditional scene text recognition (STR) is usually regarded as a visual unimodal recognition task, which has made some progress using the encoder-decoder framework. Intr...Show More

Abstract:

Traditional scene text recognition (STR) is usually regarded as a visual unimodal recognition task, which has made some progress using the encoder-decoder framework. Introducing the language model (LM) that taps into semantic contextual relationships has significantly promoted the task from the language modality. However, in existing works, LM seriously relies on the output of the decoder in the vision model (VM), and the vision decoder itself lacks semantic and global context awareness. In this paper, we explore the capability of the vision decoder, which is generally ignored in previous works. We propose a Visual-Semantic Refinement Network (VSRN) to provide context and semantic guidance to the decoder, fully supporting the recognition capability. With the semantic refine module, the recognition results in the LM, in return, can be introduced to the VM. It provides semantic information while further facilitating the union of these two modalities. In the visual refinement module, we propose an adaptive mask strategy and explore visual features’ global contextual relationships to assist the VM further. The two complementary clues jointly promote the VM and iteratively improve the recognition performance. Experimental results on several scene text recognition benchmarks show that our proposed method is effective and achieves state-of-the-art performance.
Date of Conference: 05-07 November 2023
Date Added to IEEE Xplore: 16 January 2024
ISBN Information:
Conference Location: Wuhan, China

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.