Pre-Trained or Adversarial Training: A Comparison of NER Methods on Chinese Drug Specifications | IEEE Conference Publication | IEEE Xplore

Pre-Trained or Adversarial Training: A Comparison of NER Methods on Chinese Drug Specifications


Abstract:

Named Entity Recognition (NER) is widely used for Natural Language Processing (NLP) but most of the current work focus on analyzing English-based text. This paper compare...Show More

Abstract:

Named Entity Recognition (NER) is widely used for Natural Language Processing (NLP) but most of the current work focus on analyzing English-based text. This paper compares different NER models in extracting key contents from Chinese drug specifications. These key contents help identify important information about the drugs to the users. Three models were initially chosen for this research, namely BiLSTM-CRF, MiniRBT-BiLSTM-CRF, and MiniRBT-CRF. Experimental results show that MiniRBT-CRF outperforms the other two models, achieving high precision and F1 scores. We then worked on optimizing this model with word embedding and adversarial training. Firstly, we replaced MiniRBT with BERT-Base-Chinese model and the results show that the BERT-CRF has a 2% growth in F1 scores over the MiniRBT-CRF. Next, we augmented adversarial training to the BERT-CRF model. However, the results show that BERT-CRF-Adv only increased the precision score by 1 %, but not the F1 score. The results thus suggest that in order to enhance an NER model for Chinese-based text, optimizing the underlying model is the better choice.
Date of Conference: 25-27 July 2024
Date Added to IEEE Xplore: 01 October 2024
ISBN Information:
Conference Location: Bali, Indonesia
No metrics found for this document.

I. Introduction

The use of information technology in the pharmaceutical industry is evolving at an unprecedented pace. This evolution has resulted in the creation and availability of a wide array of pharmaceutical-related documents. Pharmaceutical instructions, especially drug specifications have their own unique jar-gon, abbreviations and technical terms. They pose a significant challenge to individuals in understanding the proper usage and application of the drugs. This complexity is even more prevalent for Chinese drug specifications, due to thousands of different Chinese characters. Deciphering the often complex pharmaceutical verbiage is a daunting task. Consequently, there is a heightened need for an automatic, intelligent information extraction system to assist these individuals in their understanding of the drugs that they will most likely be partaking.

No metrics found for this document.
Contact IEEE to Subscribe

References

References is not available for this document.