By Topic

A comparison of part of speech taggers in the task of changing to a new domain

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
L. Boggess ; Dept. of Comput. Sci., Mississippi State Univ., MS, USA ; J. S. Hamaker ; R. Duncan ; L. Klimek
more authors

Part-of-speech tagging in real-world applications is performed on text in domains which are different from the publicly available large training data sets. The two most successful part-of-speech taggers are trained on the Wall Street Journal corpus, a corpus of millions of words. We compare their performance on a test set from a different domain-astronomy-from documents that are available on the World Wide Web. The Maximum Entropy Part of Speech Tagger (MXPOST) and the Transformation-Based Learning Tagger are well-known and widely used in language research and development systems. The two taggers were tested in several modes: (1) after training on the Wall Street Journal corpus only, (2) after training on only a small body of text from our astronomy domain, (3) with and without an auxiliary lexicon derived from many astronomy-related Web documents, and (4) after incremental training-that is, having been trained on the Wall Street Journal, with additional training from the specific domain. One conclusion from the experiment is that different taggers exhibit different biases when trained on the same data

Published in:

Information Intelligence and Systems, 1999. Proceedings. 1999 International Conference on

Date of Conference: