Abstract:
To extract key topics from news articles, this paper researches into a new method to discover an efficient way to construct text vectors and improve the efficiency and ac...Show MoreMetadata
Abstract:
To extract key topics from news articles, this paper researches into a new method to discover an efficient way to construct text vectors and improve the efficiency and accuracy of document clustering based on Word2Vec model. This paper proposes a novel algorithm, which combines Jaccard similarity coefficient and inverse dimension frequency to calculate the importance degree between each dimension in text vector and the corresponding document. Text vectors is constructed based on the importance degree and improve the accuracy of text cluster and key topics extraction. The algorithm is also implemented on MapReduce and the efficiency is improved.
Date of Conference: 26-29 June 2017
Date Added to IEEE Xplore: 10 August 2017
ISBN Information: