Loading [MathJax]/extensions/MathMenu.js
Extracting Topics Based on Word2Vec and Improved Jaccard Similarity Coefficient | IEEE Conference Publication | IEEE Xplore

Extracting Topics Based on Word2Vec and Improved Jaccard Similarity Coefficient


Abstract:

To extract key topics from news articles, this paper researches into a new method to discover an efficient way to construct text vectors and improve the efficiency and ac...Show More

Abstract:

To extract key topics from news articles, this paper researches into a new method to discover an efficient way to construct text vectors and improve the efficiency and accuracy of document clustering based on Word2Vec model. This paper proposes a novel algorithm, which combines Jaccard similarity coefficient and inverse dimension frequency to calculate the importance degree between each dimension in text vector and the corresponding document. Text vectors is constructed based on the importance degree and improve the accuracy of text cluster and key topics extraction. The algorithm is also implemented on MapReduce and the efficiency is improved.
Date of Conference: 26-29 June 2017
Date Added to IEEE Xplore: 10 August 2017
ISBN Information:
Conference Location: Shenzhen, China

Contact IEEE to Subscribe

References

References is not available for this document.