Skip to Main Content
Variable selection is the problem of choosing the subset of explanatory variables for a regression or classification model such that the resulting model is best according to some criterion. Here we consider the use of population-based incremental learning (PBIL) to select the variables for a linear regression model to predict a quantitative trait in living organisms. The data here is simulated to represent a genome-wide association study (GWAS) using single nucleotide polymorphisms (SNPs) as explanatory variables and height as an example trait. PBIL was effective in optimizing a variety of model fitness criteria. The resulting models were found to have true positive and false negative rates comparable to those of competing methods.