The authors present a statistical-heuristic feature selection criterion for constructing multibranching decision trees in noisy real-world domains. Real world problems often have multivalued features. To these problems, multibranching decision trees provide a more efficient and more comprehensible solution that binary decision trees. The authors propose a statistical-heuristic criterion, the symmetrical τ and then discuss its consistency with a Bayesian classifier and its built-in statistical test. The combination of a measure of proportional-reduction-in-error and cost-of-complexity heuristic enables the symmetrical τ to be a powerful criterion with many merits, including robustness to noise, fairness to multivalued features, and ability to handle a Boolean combination of logical features, and middle-cut preference. The τ criterion also provides a natural basis for prepruning and dynamic error estimation. Illustrative examples are also presented
Published in:
Pattern Analysis and Machine Intelligence, IEEE Transactions on
(Volume:13
,
Issue:
8
)
Date of Publication: Aug 1991