Abstract:
Entities of an article play a vital role in understanding the essence of the article. In recent years there have been many advancements done in the field of Natural Langu...Show MoreMetadata
Abstract:
Entities of an article play a vital role in understanding the essence of the article. In recent years there have been many advancements done in the field of Natural Language Processing to identify entities of a document. However, only a few entities contribute to the central topic of the document. These entities are termed the salient entities of the document. Salient entities are the most noticeable or important topics that an article is fundamentally about. In this paper, we proposed a novel supervised Binary Entity Salience Classifier (BESC) that effectively identifies the salient Wikipedia entities occurring in a document using entity and document features. Our experiments using three different manually annotated datasets show our classifier's effectiveness in determining the salient entities of the document over the baseline method. We further validate the effectiveness of BESC by running A/B test on Yahoo media properties that used our classifier to surface salient article topics. Online A/B results show a huge improvement in user engagement. BESC has not only improved the entity relevance of our news article but also has helped in mitigating risks involved in misidentifying entities.
Date of Conference: 15-18 December 2021
Date Added to IEEE Xplore: 13 January 2022
ISBN Information: