Skip to Main Content
A shortcoming of univariate decision tree learners is that they do not learn intermediate concepts and select only one of the input features in the branching decision at each intermediate tree node. It has been empirically demonstrated that cascading other classification methods, which learn intermediate concepts, with decision tree learners can alleviate such representational bias of decision trees and potentially improve classification performance. However, a more complex model that fits training data better may not necessarily perform better on unseen data, commonly referred to as the overfitting problem. To find the most appropriate degree of such cascade generalization, a decision forest (i.e., a set of decision trees with other classification models cascaded to different degrees) needs to be generated, from which the best decision tree can then be identified. In this paper, the authors propose an efficient algorithm for generating such decision forests. The algorithm uses an extended decision tree data structure and constructs any node that is common to multiple decision trees only once. The authors have empirically evaluated the algorithm using 32 data sets for classification problems from the University of California, Irvine (UCI) machine learning repository and report on results demonstrating the efficiency of the algorithm in this paper.