By Topic

A Dirty Model for Multiple Sparse Regression

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Jalali, A. ; Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA ; Ravikumar, P. ; Sanghavi, S.

The task of sparse linear regression consists of finding an unknown sparse vector from linear measurements. Solving this task even under “high-dimensional” settings, where the number of samples is fewer than the number of variables, is now known to be possible via methods such as the LASSO. We consider the multiple sparse linear regression problem, where the task consists of recovering several related sparse vectors at once. A simple approach to this task would involve solving independent sparse linear regression problems, but a natural question is whether one can reduce the overall number of samples required by leveraging partial sharing of the support sets, or nonzero patterns, of the signal vectors. A line of recent research has studied the use of ℓ1/ℓq norm block-regularizations with q > 1 for such problems. However, depending on the level of sharing, these could actually perform worse in sample complexity when compared to solving each problem independently. We present a new “adaptive” method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. We show how to achieve this using a very simple idea: decompose the parameters into two components and regularize these differently. We show, theoretically and empirically, that our method strictly and noticeably outperforms both ℓ1 or ℓ1/ℓq methods, over the entire range of possible overlaps (except at boundary cases, where we match the best method), even under high-dimensional scaling.

Published in:

Information Theory, IEEE Transactions on  (Volume:59 ,  Issue: 12 )