A Hybrid Vision Transformer Approach for Mathematical Expression Recognition | IEEE Conference Publication | IEEE Xplore

A Hybrid Vision Transformer Approach for Mathematical Expression Recognition


Abstract:

One of the crucial challenges taken in document analysis is mathematical expression recognition. Unlike text recognition which only focuses on one-dimensional structure i...Show More

Abstract:

One of the crucial challenges taken in document analysis is mathematical expression recognition. Unlike text recognition which only focuses on one-dimensional structure images, mathematical expression recognition is a much more complicated problem because of its two-dimensional structure and different symbol size. In this paper, we propose using a Hybrid Vision Transformer (HVT) with 2D positional encoding as the encoder to extract the complex relationship between symbols from the image. A coverage attention decoder is used to better track attention's history to handle the under-parsing and over-parsing problems. We also showed the benefit of using the [CLS] token of ViT as the initial embedding of the decoder. Experiments performed on the IM2LATEX-100K dataset have shown the effectiveness of our method by achieving a BLEU score of 89.94 and outperforming current state-of-the-art methods.
Date of Conference: 30 November 2022 - 02 December 2022
Date Added to IEEE Xplore: 10 February 2023
ISBN Information:
Conference Location: Sydney, Australia

Funding Agency:


I. Introduction

Mathematical expression recognition is one of the important processes in scientific documents analysis [1]. Despite the importance of this task, solving mathematical expression recognition is still very challenging. One of the reasons for the difficulty of math recognition compared to normal text recognition is that math formula usually has 2-D spatial structure relationship [2] instead of 1-D ones from normal text data. The spatial structure relationship of math formula is presented by many math symbols such as superscript, subscript, fraction symbol, etc. The traditional approach usually solves this problem in two stages. First, the character segmentation stage is used to segment each character in math formula and then classify it based on the given vocabulary. Second, the structural analysis stage is used to identify the spatial relationships between all characters of the math formula.

Contact IEEE to Subscribe

References

References is not available for this document.