A Turkish Text Classification Based Feature Selection and Density Peaks Clustering | IEEE Conference Publication | IEEE Xplore

A Turkish Text Classification Based Feature Selection and Density Peaks Clustering


Abstract:

Text classification, a well-known Natural Language Processing (NLP) task, can be defined as the process of categorizing documents according to their content. In this proc...Show More

Abstract:

Text classification, a well-known Natural Language Processing (NLP) task, can be defined as the process of categorizing documents according to their content. In this process, the selection of classification algorithms and the determination of the correct variables for classification are very important for an efficient classification. The texts to be classified in this study are first preprocessed using the IG (Information gain) method, taking into account the Tf (Term frequency) and Idf (Reverse document frequency) values, and then they are divided into different categories using the DPC (Clustering Density Peaks) algorithm which is a semi-supervised algorithm. In the study, TTC-3600 dataset, which includes texts obtained from 6 well-known Turkish news portals and 6 different fields, was used. The study performed better than the previous results in the selected dataset.
Date of Conference: 05-08 July 2023
Date Added to IEEE Xplore: 28 August 2023
ISBN Information:
Print on Demand(PoD) ISSN: 2165-0608
Conference Location: Istanbul, Turkiye

Contact IEEE to Subscribe