Skip to Main Content
Symbolic regression is aimed at discovering mathematical expressions, in symbolic form, that fit a given sample of data points. While genetic programming (GP) constitutes a powerful tool for solving this class of problems, its effectiveness is still severely limited when the data sample requires different expressions in different regions of the input space - i.e., when the approximating function should be discontinuous. In this paper we present a new GP-based approach for symbolic regression of discontinuous functions in multivariate data-sets. We identify the portions of the input space that require different approximating functions by means of a new algorithm that we call hyper-volume error separation (HVES). To this end we run a preliminary GP evolution and partition the input space based on the error exhibited by the best individual across the data-set. Then we partition the data-set based on the partition of the input space and use each such partition for driving an independent, preliminary GP evolution. The populations resulting from such preliminary evolutions are finally merged and evolved again. We compared our approach to the standard GP search and to a GP search for discontinuous functions in univariate data-sets. Our results show that coupling HVES with GP is an effective approach and provides significant accuracy improvements while requiring less computational resources.