Abstract:
In the recently developed document clustering, word embedding has the primary role in constructing semantics, considering and measuring the times a specific word appears ...Show MoreMetadata
Abstract:
In the recently developed document clustering, word embedding has the primary role in constructing semantics, considering and measuring the times a specific word appears in its context. Word2vect and Glove word embedding are the two most used word embeddings in document clustering. Previous works do not consider the use of glove word embedding with DBSCAN clustering algorithm in document clustering. In this work, a preprocessing with and without stemming of Wikipedia and IMDB datasets applied to glove word embedding algorithm, then word vectors as a result are applied to the DBSCAN clustering algorithm. For the evaluation of experiments, seven metrics have been used: Silhouette average, purity, accuracy, F1, completeness, homogeneity, and NMI score. The experimental results are compared with the results of TFIDF and K-means algorithms on six datasets. The results of this work outperform the results of the TFIDF and K-means approach using the four main evaluation metrics and CPU time consuming.
Date of Conference: 23-24 December 2020
Date Added to IEEE Xplore: 31 May 2021
ISBN Information: