Skip to Main Content
In this paper we propose a statistics approach for hot topic detection in Chinese web forum. In order to solve the fundamental obstacles of Chinese web data mining, such as new words, nonstandard syntax and Chinese word segmentation, we present the longest common segmented consecutive subsequence (LCSCS) and other techniques. The algorithm can run even without prior knowledge. Our experiments show the satisfying results both in performance and quality.