Loading [a11y]/accessibility-menu.js
Research on Chinese Text Summarization Methods Based on Topic Filtering | IEEE Conference Publication | IEEE Xplore

Research on Chinese Text Summarization Methods Based on Topic Filtering


Abstract:

A good abstract should be able to accurately express the meaning of the document and fully retain the topic information of the document with good generalization and low r...Show More

Abstract:

A good abstract should be able to accurately express the meaning of the document and fully retain the topic information of the document with good generalization and low redundancy. To address this problem, we propose a summary sentence extraction algorithm based on topic filtering. By using Sentence-BERT to combine entity features and sentence position features, more comprehensive text feature information is obtained from the sentence and word level, making the extracted sentences more representative. The fused features are then used to rank the importance of the sentences. Since the extracted summary should comprehensively and accurately summarize multiple aspects of the original text, this paper obtains the topic of the source document through topic modeling and extracts the sorted sentences according to the topic, to extract the most representative and comprehensive sentences as the summary.
Date of Conference: 29-31 March 2024
Date Added to IEEE Xplore: 11 July 2024
ISBN Information:
Conference Location: Nanjing, China

I. Introduction

With the rapid development of the Internet, there has been an abundance of Chinese news articles, greatly facilitating people's access to information. However, the problem of information overload often requires people to invest a significant amount of time in reading to obtain the desired information. Manual text summarization is time-consuming. Consequently, automatic text summarization has emerged as a solution. Text summarization is the process of extracting important salient features from the original text document and combining them into a meaningful summary [1]. According to Gambhir [2], automatic text summarization techniques can be classified into abstract text summarization and extractive text summarization. Abstract text summarization involves complex semantic understanding and relies heavily on natural language generation techniques, making it a more challenging endeavor. In contrast, extractive summarization aims to select a subset of existing words, phrases, or sentences from the original text that best represent the core content of the document, making it a more direct and relatively straightforward approach. Currently, due to the limitations of natural language processing methods, the dominant approach in this field is extractive summarization. Gong et al [3] propose the SeburSum summary method, which selects summaries by checking the semantic similarity between the summaries and the mutually exclusive candidate summaries instead of the similarity with the source document, with the disadvantage that it leads to the loss of important topic information. Srivastava et al [4] explored an unsupervised summary extraction method that combines clustering and topic modeling to reduce topic bias. They used latent Dirichlet allocation for topic modeling and K-Medoids clustering algorithm to generate summaries. But didn't focus on semantic relevance.

Contact IEEE to Subscribe

References

References is not available for this document.