I. Introduction
Due to the demand of computation feasibility and result interpretability, variable selection associated with high-dimensional data has attracted increasing attentions in the statistics and machine learning communities [1], [2], [3], [4]. There is a wide spectrum of variable selection methods, which can be divided mainly into linear models, nonlinear additive models, and partial linear models (PLMs). Under linear model assumption, active variables are selected directly by the information metric of covariates (e.g., the Bayesian information criterion (BIC) [5] and Akaike information criterion (AIC) [6]), or by Tikhonov regularization schemes with sparse penalty on regression coefficients (e.g., least absolute shrinkage and selection operator (LASSO) [1], smoothly clipped absolute deviation (SCAD) [7], and least angle regression (LARS) [8]). As a natural extension of linear models, additive models are proposed for nonlinear approximation and variable selection [9], [10], [11], where popular algorithms include component selection and smoothing operator (COSSO) [12], nonparametric independence screening (NIS) [13], sparse additive models (SpAM) [14], GroupSpAM [15], and sparse modal additive model (SpMAM) [16], [17]. As a tradeoff between the linear and nonlinear models, PLMs assume some covariates are linearly related to the response while the others are nonlinear [18]. Some efforts have been made for the PLM-based variable selection and function estimation, such as linear and nonlinear discoverer (LAND) [19], the model pursuit approach [20], and the sparse PLMs [21].