Skip to Main Content
Clustering technology is the core technology of text mining. Through text clustering, a large number of text messages can be divided into several meaningful classes or clusters. According to the features of Chinese documents, this paper designs and implements the Chinese Text Clustering System to perform automatic clustering of Chinese documents. Firstly, this system will carry out Chinese word automatic segmentation for the input Chinese document sets by using reverse maximum matching method. Secondly, further text preprocessing is performed. Finally the K-means clustering algorithm is used to obtain the clustering results. The prototype system can also be used in clustering Chinese Web pages to search for user's interest model by search engines, which will improve the efficiency of searching the target content.