Skip to Main Content
This paper uses Vector Space Model to represent topic, and focuses on the creation of the model. Based on the analysis of the characteristics of the English news stories, the paper proposed two methods to improve the topic's representation. Firstly, we propose a news story-oriented feature extraction algorithm based on the combination of word analysis and the location characteristic of the news stories. The basic idea of word analysis is to divide the words into capital words and common words. The location characteristic decides the importance of the words based on the inverse-pyramidal structure of the news stories. Secondly, we present a new method to compute the feature's weight based on the fusion of several feature extraction methods. This method gives the feature bigger weight, which is selected by more feature extraction algorithms. Experimental results indicate these two proposed methods perform well.