Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Probabilistic Metrics for Soft-Clustering and Topic Model Validation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Ramirez, E.H. ; Center for Intell. Syst., Tecnol. de Monterrey, Monterrey, Mexico ; Brena, R. ; Magatti, D. ; Stella, F.

In this paper the problem of performing external validation of the semantic coherence of topic models is considered. The Fowlkes-Mallows index, a known clustering validation metric, is generalized for the case of overlapping partitions and multi-labeled collections, thus making it suitable for validating topic modeling algorithms. In addition, we propose new probabilistic metrics inspired by the concepts of recall and precision. The proposed metrics also have clear probabilistic interpretations and can be applied to validate and compare other soft and overlapping clustering algorithms. The approach is exemplified by using the Reuters-21578 multi-labeled collection to validate LDA models, then using Monte Carlo simulations to show the convergence to the predicted results. Additional statistical evidence is provided to better understand the relation of the metrics presented.

Published in:

Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on  (Volume:1 )

Date of Conference:

Aug. 31 2010-Sept. 3 2010