Skip to Main Content
Given a large number of basis functions that can be potentially more than the number of samples, we consider the problem of learning a sparse target function that can be expressed as a linear combination of a small number of these basis functions. We are interested in two closely related themes: · feature selection, or identifying the basis functions with nonzero coefficients; · estimation accuracy, or reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. For least squares regression, we develop strong theoretical results for the new procedure showing that it can effectively solve this problem under some assumptions. Experimental results support our theory.