Abstract:
Continuous attributes are hard to handle and require special treatment in decision tree induction algorithms. In this paper, we present a multisplitting algorithm, RCAT, ...Show MoreMetadata
Abstract:
Continuous attributes are hard to handle and require special treatment in decision tree induction algorithms. In this paper, we present a multisplitting algorithm, RCAT, for continuous attributes based on statistical information. When calculating information gain for a continuous attribute, it first splits the value range of the attribute into some initial intervals, computes the probability estimation of every class at each interval and finds the best threshold in the probability space, uses this threshold to separate the initial intervals into two sets, combines adjacent intervals in the same set, optimizes the boundary of every combined interval, and finally obtains the information gain of the continuous attribute. We also provide a pruning method to simplify the decision trees. Empirical results show that the RCAT algorithm can realise decision trees with much higher intelligibility than C4.5 while retaining their accuracy.
Date of Conference: 04-05 November 2002
Date Added to IEEE Xplore: 19 February 2003
Print ISBN:0-7803-7508-4