Conferences >2023 International Conference...

Visual-Semantic Refinement Network: Towards Exploring the Capabilities of Decoder in Scene Text Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Traditional scene text recognition (STR) is usually regarded as a visual unimodal recognition task, which has made some progress using the encoder-decoder framework. Intr...Show More

Metadata

Abstract:

Traditional scene text recognition (STR) is usually regarded as a visual unimodal recognition task, which has made some progress using the encoder-decoder framework. Introducing the language model (LM) that taps into semantic contextual relationships has significantly promoted the task from the language modality. However, in existing works, LM seriously relies on the output of the decoder in the vision model (VM), and the vision decoder itself lacks semantic and global context awareness. In this paper, we explore the capability of the vision decoder, which is generally ignored in previous works. We propose a Visual-Semantic Refinement Network (VSRN) to provide context and semantic guidance to the decoder, fully supporting the recognition capability. With the semantic refine module, the recognition results in the LM, in return, can be introduced to the VM. It provides semantic information while further facilitating the union of these two modalities. In the visual refinement module, we propose an adaptive mask strategy and explore visual features’ global contextual relationships to assist the VM further. The two complementary clues jointly promote the VM and iteratively improve the recognition performance. Experimental results on several scene text recognition benchmarks show that our proposed method is effective and achieves state-of-the-art performance.

Published in: 2023 International Conference on Intelligent Education and Intelligent Research (IEIR)

Date of Conference: 05-07 November 2023

Date Added to IEEE Xplore: 16 January 2024

ISBN Information:

DOI: 10.1109/IEIR59294.2023.10391251

Conference Location: Wuhan, China

Funding Agency:

Contents

References is not available for this document.

Visual-Semantic Refinement Network: Towards Exploring the Capabilities of Decoder in Scene Text Recognition

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Visual-Semantic Refinement Network: Towards Exploring the Capabilities of Decoder in Scene Text Recognition

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?