Skip to Main Content
In this paper, we proposed a new generalized Multivariate Probalistic Modeling (MPM) to automatically extract topics from text collection and attach them with existing ontology. Specially, we first make use of KeyConcept which is a classification system classify documents into a set of predefined concepts. Then, by modeling documents cluster based MPM, we extract latent concepts and corrensponding sub-clusters from document collection. We compare our MPM with Probabilistic Latent Semantic Indexing (PLSI) and other clustering algorithm on Citeseerx data sets. Experiment results show that MPM outperforms PLSI in terms of time efficiency and provides better topics representation. Clustering analysis also prove the advantages of our MPM over other clustering technique in precision.