Abstract:
Information extraction is a key corner-stone in the digitization of office data which requires the conversion of unstructured to structured data. However, in the actual a...Show MoreMetadata
Abstract:
Information extraction is a key corner-stone in the digitization of office data which requires the conversion of unstructured to structured data. However, in the actual application to business cases, there is a big deadlock to adapt common extraction systems to domain-specific documents due to the limitation of preparation of training data. To overcome this issue, we introduce a model, which employs pre-trained language models with a customized CNN layer for domain adaptation. The model is validated on three Japanese domain-specific and two benchmark machine reading comprehension data sets (SQuADs). Experimental results confirm that our model achieves promising results which are applicable for actual business scenarios.
Date of Conference: 18-22 July 2021
Date Added to IEEE Xplore: 21 September 2021
ISBN Information:
ISSN Information:
CINNAMON LAB, Dong Da, Hanoi, Vietnam
Hung Yen University of Technology and Education, Hung Yen, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
Hung Yen University of Technology and Education, Hung Yen, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam
CINNAMON LAB, Dong Da, Hanoi, Vietnam