Conferences >2022 International Conference...

A Hybrid Vision Transformer Approach for Mathematical Expression Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

One of the crucial challenges taken in document analysis is mathematical expression recognition. Unlike text recognition which only focuses on one-dimensional structure i...Show More

Metadata

Abstract:

One of the crucial challenges taken in document analysis is mathematical expression recognition. Unlike text recognition which only focuses on one-dimensional structure images, mathematical expression recognition is a much more complicated problem because of its two-dimensional structure and different symbol size. In this paper, we propose using a Hybrid Vision Transformer (HVT) with 2D positional encoding as the encoder to extract the complex relationship between symbols from the image. A coverage attention decoder is used to better track attention's history to handle the under-parsing and over-parsing problems. We also showed the benefit of using the [CLS] token of ViT as the initial embedding of the decoder. Experiments performed on the IM2LATEX-100K dataset have shown the effectiveness of our method by achieving a BLEU score of 89.94 and outperforming current state-of-the-art methods.

Published in: 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Date of Conference: 30 November 2022 - 02 December 2022

Date Added to IEEE Xplore: 10 February 2023

ISBN Information:

DOI: 10.1109/DICTA56598.2022.10034626

Conference Location: Sydney, Australia

Funding Agency:

Contents

I. Introduction

Mathematical expression recognition is one of the important processes in scientific documents analysis [1]. Despite the importance of this task, solving mathematical expression recognition is still very challenging. One of the reasons for the difficulty of math recognition compared to normal text recognition is that math formula usually has 2-D spatial structure relationship [2] instead of 1-D ones from normal text data. The spatial structure relationship of math formula is presented by many math symbols such as superscript, subscript, fraction symbol, etc. The traditional approach usually solves this problem in two stages. First, the character segmentation stage is used to segment each character in math formula and then classify it based on the given vocabulary. Second, the structural analysis stage is used to identify the spatial relationships between all characters of the math formula.

References is not available for this document.

A Hybrid Vision Transformer Approach for Mathematical Expression Recognition

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Hybrid Vision Transformer Approach for Mathematical Expression Recognition

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?