The aim of topic tracking is to monitor the stream of news stories to find additional stories on a topic that was identified using several sample stories. We propose a method that using NER information for improved topic tracking. We call it multi-vector. We extract proper names, locations and normal terms into distinct sub-vectors of the document representation. Measuring the similarity of two documents is conducted by comparing two sub-vectors at a time. We use TDT4 corpus as test corpus and compare the topic tracking system performance between the system based on multi-vector model and the system based on traditional vector space model. We also analyze the number of features that effect topic tracking performance. The experimental result shows that the tracking performance will be improved by using multi-vector model.
Published in:
Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
(Volume:3
)
Date of Conference: 26-28 Nov. 2008