By Topic

Topic discovery based on dual EM merging

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Jianping Zeng ; School of Computer Science, Fudan University, Shanghai, China ; Jiangjiao Duan ; Chengrong Wu

Facing the enormous text on the Internet, automatic topic discovery out of large text corpus becomes an important task for advanced intelligence information analysis, such as opinion recognition, Web user interest analysis, etc. Although many topic mining methods have shown great success in dealing with topic-based analysis tasks, it is desired to discover meaningful topic descriptions for informatics analysis. To avoid words with different granularity to explain a topic, a mechanism for separating text corpus into two subsets with equal semantic topics is proposed. EM algorithm is employed to infer topics models for the subsets. Then a merging process is devised to generate topic descriptions based on the output of EM. Experiments on standard AP text corpus shows that the proposed topic discovery method can achieve better perplexity, which means better ability in predicting topics. Furthermore, a test of topics extraction on a collection of news documents about recent Expo 2010 Shanghai China shows that the description key words in topics are more meaningful and reasonable than that of tradition topic mining method.

Published in:

Intelligence and Security Informatics (ISI), 2011 IEEE International Conference on

Date of Conference:

10-12 July 2011