Skip to Main Content
Chinese features extraction is indispensable in a processing of Chinese natural language because it is beneficial to Chinese text knowledge discovery and information retrieval. Chinese Segmentation is the precondition of features extraction. To conquer the disadvantage of current Chinese segmentation methods, such as lexicon-based scheme, syntax and rules-based scheme, statistics-based scheme and the integration method of the above scheme, the maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics was put forward. To extract shorter words and phrases included in longer ones, a novel Chinese hierarchy feature extraction method based on MMFS and iterative learning was proposed. This method can obtain hierarchy feature according to morphology with no lexicon, no acquiring the probability between words in advance and no Chinese character index. Experimental results confirmed the efficiency of this statistical method in extracting Chinese hierarchy feature.