In order to precisely procure the Chinese person information on the web, especially distinguish from the namesake, this paper propose a clustering algorithm based on latent semantic model. It establishes for every document a latent semantic model of sentence-word matrix based on central distance, central segment, document length, etc, by building the central word library of person attributes. It clusters the similar documents by means of dynamic-extending clustering algorithm. Experiments prove that the algorithm gives high accuracy to documents clustering as well as maintaining the coherence of the person's semantic information and highlighting the importance of semantic information under different sequences.
Published in:
Apperceiving Computing and Intelligence Analysis, 2009. ICACIA 2009. International Conference on
Date of Conference: 23-25 Oct. 2009