Loading [MathJax]/extensions/MathMenu.js
A Chi-Square Statistics Based Feature Selection Method in Text Classification | IEEE Conference Publication | IEEE Xplore

A Chi-Square Statistics Based Feature Selection Method in Text Classification


Abstract:

Text classification refers to the process of automatically determining text categories based on text content in a given classification system. Text classification mainly ...Show More

Abstract:

Text classification refers to the process of automatically determining text categories based on text content in a given classification system. Text classification mainly includes several steps such as word segmentation, feature selection, weight calculation and classification performance evaluation. Among them, feature selection is a key step in text classification, which affects the classification accuracy. Feature selection can help indicate the relevance of text contents and can better classify the text. Meanwhile feature selection has a great influence on the classification result. Text classification is a very important module in text processing, and it is widely applied in areas like spam filtering, news classification, sentiment classification, and part-of-speech tagging. This paper proposes a method for extracting feature words based on Chi-square Statistics. Because the feature words that appear together or separately may differ in different situations, we classify texts by using single word and double words as features at the same time. Based on our method, we performed experiments using classical Naive Bayes and Support Vector Machine classification algorithms. The efficiency of our method was demonstrated by the comparison and analysis of experimental results.
Date of Conference: 23-25 November 2018
Date Added to IEEE Xplore: 10 March 2019
ISBN Information:

ISSN Information:

Conference Location: Beijing, China

Contact IEEE to Subscribe

References

References is not available for this document.