Abstract:
In this paper, we have developed a probabilistic approach using PLSA for the discovery and analysis of contextual keyword relevance based on the distribution of keywords ...Show MoreMetadata
Abstract:
In this paper, we have developed a probabilistic approach using PLSA for the discovery and analysis of contextual keyword relevance based on the distribution of keywords across a training text corpus. We have shown experimentally, the flexibility of this approach in classifying keywords into different domains based on their context. We have developed a prototype system that allows us to project keyword queries on the loaded PLSA model and returns keywords that are closely correlated. The keyword query is vectorized using the PLSA model in the reduce aspect space and correlation is derived by calculating a dot product. We also discuss the parameters that control PLSA performance including a) number of aspects, b) number of EM iterations c) weighting functions on TDM (pre-weighting). We have estimated the quality through computation of precision-recall scores. We have presented our experiments on PLSA application towards document classification.
Published in: 2006 Annual IEEE India Conference
Date of Conference: 15-17 September 2006
Date Added to IEEE Xplore: 12 February 2007
ISBN Information: