Abstract:
State-of-the-art biomedical named entity recognition (BioNER) systems apply supervised machine learning models (i.e., relying on human effort for training data annotation...Show MoreMetadata
Abstract:
State-of-the-art biomedical named entity recognition (BioNER) systems apply supervised machine learning models (i.e., relying on human effort for training data annotation) which are not easy to be generalized to new entity types and datasets. We propose a distantly supervised approach, AutoBioNER, that automatically recognizes biomedical entities from massive corpora with user-input dictionaries. AutoBioNER does not need any human annotated data. It relies on incomplete entity dictionaries to provide seeds for each entity type and performs a novel entity set expansion step for corpus-level new entity recognition and dictionary completion. The expanded dictionaries are used as distant supervision to train a neural model for BioNER. Experimental results show that AutoBioNER achieves the best performance among the methods that only use dictionaries with no additional human effort on BioNER benchmark datasets. It is also demonstrated that the dictionary expansion step plays an important role in the great performances.
Date of Conference: 18-21 November 2019
Date Added to IEEE Xplore: 06 February 2020
ISBN Information: