Skip to Main Content
Text representation is a preliminary step to text filtering, while VSM is the most commonly used method in this field. However, the document feature set, which produced by VSM, usually has a very high dimensionality. As a result, the distribution of feature value tends to be highly skewed. In this paper some new mechanisms are presented to abate such problems. Using these mechanisms, document features are extracted from some smaller feature windows rather than a full text, such as sentences, graphs and blocks, and the correlative texts are finally evaluated by local similarity. They are gotten by the analysis of documentpsilas linguistics structures in documents. As a result, it can give a remarkable effect on the precision of text filtering.