By Topic

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block \ell _{1}/\ell _{\infty } -Regularization

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Negahban, S.N. ; Dept. of Electr. Eng. & Comput. Sci., Univ. of California, Berkeley, CA, USA ; Wainwright, M.J.

Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports of size at most s. This set-up suggests the use of ℓ1/ℓ-regularized regression for joint estimation of the p×r matrix of regression coefficients. We analyze the high-dimensional scaling of ℓ1/ℓ-regularized quadratic programming, considering both consistency rates in ℓ-norm, and how the minimal sample size n required for consistent variable selection scales with model dimension, sparsity, and overlap between the supports. We first establish bounds on the ℓ-error as well sufficient conditions for exact variable selection for fixed design matrices, as well as for designs drawn randomly from general Gaussian distributions. Specializing to the case r = 2 linear regression problems with standard Gaussian designs whose supports overlap in a fraction α ∈ [0,1] of their entries, we prove that ℓ1/ℓ-regularized method undergoes a phase transition characterized by the rescaled sample size θ1,∞(n, p, s, α) = n/{(4 - 3 α) s log(p-(2- α) s)}. An implication is that the use of ℓ1/ℓ-regularization yields improved statistical efficiency if the overlap parameter is large enough ( α >; 2/3), but has worse statistical efficiency than a naive Lasso-based approach for moderate to small overlap (α <; 2/3 ). Empirical simulations illustrate the close agreement between theory and actual behavior in practice. These results show that caution must be exercised in applying 1/ℓ∞ block regularization: if the data does not match its structure very closely, it can impair statistical performance relative to computationally less expensive schemes.

Published in:

Information Theory, IEEE Transactions on  (Volume:57 ,  Issue: 6 )