By Topic

Evolved Apache Lucene SpanFirst queries are good text classifiers

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Hirsch, L. ; Dept. of Comput., Sheffield Hallam Univ., Sheffield, UK

Human readable text classifiers have a number of advantages over classifiers based on complex and opaque mathematical models. For some time now search queries or rules have been used for classification purposes, either constructed manually or automatically. We have performed experiments using genetic algorithms to evolve text classifiers in search query format with the combined objective of classifier accuracy and classifier readability. We have found that a small set of disjunct Lucene SpanFirst queries effectively meet both goals. This kind of query evaluates to true for a document if a particular word occurs within the first N words of a document. Previously researched classifiers based on queries using combinations of words connected with OR, AND and NOT were found to be generally less accurate and (arguably) less readable. The approach is evaluated using standard test sets Reuters-21578 and Ohsumed and compared against several classification algorithms.

Published in:

Evolutionary Computation (CEC), 2010 IEEE Congress on

Date of Conference:

18-23 July 2010