I. Introduction
Document Understanding is a broad area that involves many tasks involving the extraction of information from documents. Recent research has focused on building a robust representation in an unsupervised pre-training phase and using supervised fine-tuning of the model for downstream tasks [1], [2]. The pre-training is usually done with very large datasets, such as [3], which has millions of documents, a practice borrowed from NLP [4], [5].