Topic discovery described here is used to determine the topic that a document or a segment discusses. It is very important for some applications of natural language processing (NLP), such as information retrieval/extraction, summarization and topic analysis etc. The paper extracts topic words based on Shannon information, in which latent Dirichlet allocation (LDA) is employed to represent word distribution. The estimation of the parameters is speeded up by fast Gibbs sampling. Words which do not appear in the analyzed document can be inferred as topic with the help of word clustering of background. Topics are represented by means of word groups. The experiment results show that our approach performs far better than other methods.
Published in:
Artificial Intelligence and Computational Intelligence, 2009. AICI '09. International Conference on
(Volume:3
)
Date of Conference: 7-8 Nov. 2009