By Topic

A smoothed Latent Dirichlet Allocation model with application to Business Intelligence

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Zhihua Wei ; Department of Computer Science and Technology, Tongji University, Shanghai, China ; Rui Zhao ; Ying Wang ; Duoqian Miao
more authors

As a kind of intelligent component, text classification plays an important role in Business Intelligence (BI) application such as client opinion classification, market feedback analysis and so on. Latent Dirichlet Allocation (LDA) model, which is a kind of excellent text representation model, has been widely used in various document processing applications. However, its performance is affected by the data sparseness problem. Existing smoothing techniques usually make use of statistic theory to assign a uniform distribution to absent words. They don't concern the real word distribution or distinguish between words. In this paper, a method based on Tolerance Rough Set Theory (TRST) is proposed, which makes use of upper approximation and lower approximation theory in Rough Set to assign different values for absent words in different approximation regions. Theoretically, our algorithms can estimate smoothing value for absent words according to their relation with respect to existing words. Text classification experiments on public corpora have shown that our algorithms greatly improve the performance of LDA model, especially for unbalanced corpus.

Published in:

Business Management and Electronic Information (BMEI), 2011 International Conference on  (Volume:5 )

Date of Conference:

13-15 May 2011