Skip to Main Content
This paper introduces methods to discriminatively learn phrase patterns for use as features in text classification. An efficient solution is described using a recursive algorithm with a mutual information selection criterion. The algorithm automatically determines when word classes are useful in specific locations of a phrase pattern, allowing for variable specificity depending on the amount of labeled data available. Experiments are carried out on three text classification tasks in both English and Chinese, resulting in improved performance when adding the phrase patterns to the existing n-gram features.