Cart (Loading....) | Create Account
Close category search window
 

A comparison study of candidate generation for Chinese word segmentation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Kaixu Zhang ; State Key Lab. of Intell. Technol. & Syst., Tsinghua Univ., Beijing, China ; Maosong Sun

Chinese word segmentation can be implemented in a coarse-to-fine schema. In such schema, a candidate set containing multiple segmentations of a sentence (rather than only one segmentation) is used as the output of a coarse-grained CWS model. Then a more sophisticated CWS model or other models of downstream tasks will reconsider all the segmentations in the candidate set to determine the best segmentation. This paper discussed and compared three candidate generation methods, namely boundary level method, word level method and sentence level method, in a unified form. The oracle F1-measures of the candidate sets of these methods were compared. The performances were also compared in a joint CWS and POS-tagging task. The results showed that the word level method has the best performance among these three candidate generation methods. Results also showed that the coarse-to-fine schema outperforms the pipeline schema in which only one segmentation is used for the downstream task and the joint schema in which all possible segmentation is used for the downstream task. Moreover, the speed of the coarse-to-fine schema is closed to the speed of the pipeline schema and much higher than the speed of the joint schema.

Published in:

Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on

Date of Conference:

27-29 Nov. 2011

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.