By Topic

Automatic Segmentation of Hierarchy Feature without Lexicon for Chinese Text Based on Iterative Learning

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Shaohua Jiang ; Sch. of Civil & Hydraulic Eng., Dalian Univ. of Technol., Dalian ; Yanzhong Dang

Chinese features extraction is indispensable in a processing of Chinese natural language because it is beneficial to Chinese text knowledge discovery and information retrieval. Chinese Segmentation is the precondition of features extraction. To conquer the disadvantage of current Chinese segmentation methods, such as lexicon-based scheme, syntax and rules-based scheme, statistics-based scheme and the integration method of the above scheme, the maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics was put forward. To extract shorter words and phrases included in longer ones, a novel Chinese hierarchy feature extraction method based on MMFS and iterative learning was proposed. This method can obtain hierarchy feature according to morphology with no lexicon, no acquiring the probability between words in advance and no Chinese character index. Experimental results confirmed the efficiency of this statistical method in extracting Chinese hierarchy feature.

Published in:

Computer Science and Software Engineering, 2008 International Conference on  (Volume:1 )

Date of Conference:

12-14 Dec. 2008