Loading [MathJax]/extensions/MathMenu.js
Lifting the Curse: Exploring Dimensionality Reduction on Text Clustering Applications | IEEE Conference Publication | IEEE Xplore

Lifting the Curse: Exploring Dimensionality Reduction on Text Clustering Applications


Abstract:

Nowadays, huge amounts of text are being generated on the Web by a vast number of applications. Examples of such applications include instant messengers, social networks,...Show More

Abstract:

Nowadays, huge amounts of text are being generated on the Web by a vast number of applications. Examples of such applications include instant messengers, social networks, e-mail clients, news portals, blog communities, commercial platforms, and so forth. The requirement for effectively identifying documents of similar content in these services rendered text clustering one of the most emerging problems of the machine learning discipline. Nevertheless, the high dimensionality and the natural sparseness of text introduce significant challenges that threat the feasibility of even the most successful algorithms. Consequently, the role of dimensionality reduction techniques becomes crucial for this particular problem. Motivated by these challenges, in this article we investigate the impact of dimensionality reduction on the performance of text clustering algorithms. More specifically, we experimentally analyze its effects in the effectiveness and running times of eight clustering algorithms by employing six high-dimensional text datasets. The results indicate that, in most cases, dimensionality reduction may significantly improve the algorithm execution times, by sacrificing only small amounts of clustering quality.
Date of Conference: 18-20 July 2022
Date Added to IEEE Xplore: 30 September 2022
ISBN Information:
Conference Location: Corfu, Greece

Contact IEEE to Subscribe

References

References is not available for this document.