Skip to Main Content
Some of the most influential factors in the quality of the solutions found by an evolutionary algorithm (EA) are a correct coding of the search space and an appropriate evaluation function of the potential solutions. EAs are often used to learn decision rules from datasets, which are encoded as individuals in the genetic population. In this paper, the coding of the search space for the obtaining of those decision rules is approached, i.e., the representation of the individuals of the genetic population and also the design of specific genetic operators. Our approach, called "natural coding," uses one gene per feature in the dataset (continuous or discrete). The examples from the datasets are also encoded into the search space, where the genetic population evolves, and therefore the evaluation process is improved substantially. Genetic operators for the natural coding are formally defined as algebraic expressions. Experiments with several datasets from the University of California at Irvine (UCI) machine learning repository show that as the genetic operators are better guided through the search space, the number of rules decreases considerably while maintaining the accuracy, similar to that of hybrid coding, which joins the well-known binary and real representations to encode discrete and continuous attributes, respectively. The computational cost associated with the natural coding is also reduced with regard to the hybrid representation. Our algorithm, HlDER*, has been statistically tested against C4.5 and C4.5 Rules, and performed well. The knowledge models obtained are simpler, with very few decision rules, and therefore easier to understand, which is an advantage in many domains. The experiments with high-dimensional datasets showed the same good behavior, maintaining the quality of the knowledge model with respect to prediction accuracy.