By Topic

An approach to improving the quality of part-of-speech tagging of Chinese text

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
2 Author(s)
Yi-li Qian ; Dept. of Comput. Sci., Shanxi Univ., Taiyuan, China ; Jia-heng Zheng

The disambiguation of multicategory words is one of the difficulties in part-of-speech tagging, which greatly affects the processing quality of corpora. Aiming at this question, we describe an approach to correcting the part-of-speech tagging of multicategory words automatically. It acquires correction rules for the part-of-speech tagging of multicategory words from right-tagged corpora based on the theory of rough sets and data mining, and then automatically corrects the corpora's part-of-speech tagging of multicategory words based on these rules. According to the results of close-test and open-test on the corpus of 500,000 Chinese characters, the accuracy of corpora can be increased by 11.32% and 5.97% respectively.

Published in:

Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on  (Volume:2 )

Date of Conference:

5-7 April 2004