I. Introduction
Contextual Large Language Models (LLMs) represent a significant breakthrough in Natural Language Processing (NLP), by generating contextualized word representations to disambiguate words with multiple meanings (such as BERT [1]). However, training these models is time-consuming and computationally intensive. The largest publicly available models are trained on general knowledge sources such as Wikipedia and BookCorpus. Some of these largest models have been fine-tuned on domain-specific corpora that include specialized terminology. For instance, the bert-based-cased model was fine-tuned on medical insurance text documents to develop the bert-fine-tuned-medical-insurance-ner model [2].