Skip to Main Content
A complexity-based pruning procedure for classification trees is described, and bounds on its finite sample performance are established. The procedure selects a subtree of a (possibly random) initial tree in order to minimize a complexity penalized measure of empirical risk. The complexity assigned to a subtree is proportional to the square root of its size. Two cases are considered. In the first, the growing and pruning data sets are identical, and in the second, they are independent Using the performance bound, the Bayes risk consistency of pruned trees obtained via the procedure is established when the sequence of initial trees satisfies suitable geometric and structural constraints. The pruning method and its analysis are motivated by work on adaptive model selection using complexity regularization.