Corpus Development for Indonesian Product Named Entity Recognition Using Semi-supervised Approach | IEEE Conference Publication | IEEE Xplore

Corpus Development for Indonesian Product Named Entity Recognition Using Semi-supervised Approach


Abstract:

PRONER is a NER system that is aimed to recognize product entities in a text. To develop a PRONER system using supervised approach, we need labeled dataset. However, buil...Show More

Abstract:

PRONER is a NER system that is aimed to recognize product entities in a text. To develop a PRONER system using supervised approach, we need labeled dataset. However, building a labeled dataset needs a lot of effort and expensive. In this paper, we propose an approach to build PRONER dataset on beauty product domain using semi-supervised method. The dataset was built based on posts on Indonesian beauty online forum. We implemented a semi-supervised learning approach using Conditional Random Field (CRF) classifier to expand small labeled dataset. We used list lookup and handcrafted features for our CRF classifier. The overall process consists of two steps: initial manual labeling and automatic labeling. We started with 2,2759 manually labeled sentences and expanded it to 9,165 sentences using automatic labeling process. The experimental results show that the automatically labeled dataset has 73.01% accuracy. Based on our analysis of miss-labeled tokens, other features from the text should be explored to improve the accuracy.
Date of Conference: 05-06 August 2020
Date Added to IEEE Xplore: 06 October 2020
ISBN Information:
Conference Location: Bandung, Indonesia

Contact IEEE to Subscribe

References

References is not available for this document.