Loading [MathJax]/extensions/MathMenu.js
Big data topic modeling with mahout for managing business analysis services | IEEE Conference Publication | IEEE Xplore

Big data topic modeling with mahout for managing business analysis services


Abstract:

Topic modeling for big data provides a key opportunity to address the needs of data-driven businesses in a way to deliver genuine value to business users simplifying sear...Show More

Abstract:

Topic modeling for big data provides a key opportunity to address the needs of data-driven businesses in a way to deliver genuine value to business users simplifying search and summary processes via the vast amount of information. Many businesses have already worked with Hadoop paradigm in order to rapidly apply computational processing to merge data from several operational systems and analyze large volumes of multi-structured data. In this paper, we extended the features of collapsed variational Bayesian (CVB) inference algorithm for Latent Dirichlet Allocation (LDA) to discover the hidden topical patterns through statistical regularities and eliminate noises on Hadoop framework. The approach captures the evolution of topics in a sequentially organized corpus of documents into two mainly phases, mapping and reducing phases. In the mapping phase the probabilistic on each word, in collected documents, is calculated by using collapsed space of latent variables and parameters for summarizing words in each topic, and reducing phase to utilize the various results from map phase while predicting a new topic model from a given trained models. The study conducts the experiments based on a Reuters-21578 text categorization collection corpus on Hadoop clustering with 64 nodes to improve the computationally in a more efficient and accurate approach.
Date of Conference: 13-15 December 2014
Date Added to IEEE Xplore: 02 February 2015
ISBN Information:
Conference Location: Tokyo, Japan

Contact IEEE to Subscribe

References

References is not available for this document.