Stable Recovery and Separation of Structured Signals by Convex Geometry

We consider the ill-posed problem of identifying the signal that is structured in some general dictionary (i.e., possibly redundant or over-complete matrix) from corrupted measurements, where the corruption is structured in another general dictionary. We formulate the problem by applying appropriate convex constraints to the signal and corruption according to their structures and provide conditions for exact recovery from structured corruption and stable recovery from structured corruption with added stochastic bounded noise. In addition, this paper provides estimates of the number of the measurements needed for recovery. These estimates are based on computing the Gaussian complexity of a tangent cone and the Gaussian distance to a subdifferential. Applications covered by the proposed programs include the recovery of signals that is disturbed by impulse noise, missing entries, sparse outliers, random bounded noise, and signal separation. Numerical simulation results are presented to verify and complement the theoretical results.


I. INTRODUCTION
We study the recovery problem of a signal x ∈ R p from the corrupted measurements where e ∈ R n is a bounded noise with e 2 δ, ∈ R n×p and ∈ R n×q are general dictionaries, such as bases, redundant or over-complete matrices. In the corrupted sensing, the vector x is a structured signal that we desire to recover, v ∈ R q represents structured noise. The model (1) also allows us to consider signal separation. In the signal separation setting, x and v are two distinct features from noisy measurements y and vectors x and v need to be simultaneously recovered.
In the classical compressed sensing, the fundamental problem is to consider the recovery of sparse signals from a relatively small number of linear measurements The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wang .
where is a known sensing matrix, x is an unknown sparse signal and e is a measurement error with e 2 δ. Recovery guarantees for the model (2) have been successfully developed in [7], [8], [10], [12], [15], [19], [27], [46], [47], [49]. A well studied example in the literature is the recovery of sparse vectors via l 1 norm minimization and we can refer to [6], [11], [16], [18], [20], [21], [39]. The work in [13] provided a general framework for recovering an arbitrary structured vector x ∈ R p from n unreliable measurements y = x + e via convex regularizer, where is a Gaussian matrix and e 2 δ. The convex regularizer formulation is based on minimizing the norm induced by the convex hull of the atomic set and this norm is referred to as the atomic norm. They also obtained estimates of the number of generic measurements required for exact and robust recovery. These estimates are based on the geometric properties of the norm used for x.
In practical settings, the signal can be impaired by impulse noise, narrow-band interference or contaminated during transmission. Therefore, the noise may not be bounded. The authors in [10] proved that as the noise energy gets larger, the solution might be very different from the ground truth.
To overcome this limitation, many specific corrupted sensing instances have been studied in recent years, for example, signal recovery and low-rank matrix recovery from corrupted observations [9], [14], [34], [36], [37], [40], [48]. In the context of signal contamination, we have observation where v is an unknown but structured noise vector. For measurements y = x + v, Donoho and Stark in [22] gave a guarantee for the exact recovery of x when is the discrete Fourier transform matrix and the support of sparse corruption v is known. Li in [36] showed that if the m × n sensing matrix has independent Gaussian entries, then one can recover a sparse signal x exactly by l 1 minimization min x,v provided the number of nonzero elements in x is O(m/(log(n/m) + 1)). Foygel and Mackey in [26] extended the work of [13] to a more challenging setting, i.e., they took into account unreliable measurements y = x + v + e. They used a convex programming method to recover x and v with or without prior information and provided new bounds for the Gaussian complexity of structured signals, leading to a sharper restoration guarantee. In addition, a generalized corrupted model is as follows where ∈ R n×p and ∈ R n×q are general dictionaries and v ∈ R q is a structured corruption. The model (5) enables us to consider signal separation, the recovery of saturated or clipped signals [1], [2], [33], sparsity-based super-resolution and in-painting [23], [38], etc. In signal separation problems, the dictionaries and are chosen such that they allow for sparse or approximately sparse representation of the two distinct features. Signal separation problems arise in applications such as the separation of texture from cartoon parts in images [4], [24] and the separation of neuronal calcium transients from smooth signals caused by astrocytes in calcium imaging [30]. Recovery guarantees in the deterministic setting for noiseless measurements y = x + v have been studied in [3], [32], [41], [44], [45]. The work in [32], [45] considered the problem of recovering the sparse signal x from n linear corrupted measurements y in (5). They provided deterministic recovery guarantees based on an uncertainty relation for pair of general dictionaries and presented corresponding practicable recovery algorithms. Their recovery guarantees depend on the coherence parameters of and and the prior knowledge about the support sets of the signal and noise. Studer and Baraniuk in [44] presented efficient algorithms for signal restoration and separation and derived corresponding recovery guarantees that depend on the knowledge of the supports of x and v and the coherence parameters of the dictionaries and .
In this paper, we consider the model (1), which is a more general model than that in [26], where x is structured under a general dictionary, the unbounded noise vector v is structured under another general dictionary and e is a stochastic bounded noise. We use convex programming approach to recover both x and v with or without prior information and provide recovery conditions through the geometry measures of signal and corruption.
The rest of the paper is arranged as follows. In Section II, we recall some concepts from analysis needed in this paper. We state the main results with or without prior information and give some applications for special signal and corruption in Section III. In Section IV, we conduct a series of simulation experiments to reinforce our theoretical results. The proofs of main results will be detailed in Section V. Conclusion and future work are presented in Section VI.

A. NOTATIONS
Boldface lowercase and uppercase letters stand for vectors and matrices, respectively. We use µ m to represent the expected length of an m-dimensional Gaussian random vector. Further, one can show that the expectation is tightly bounded as √ m − 1/2 < µ m < √ m for all m and it is known that µ 1 = √ 2/π while µ m ≈ √ m for large m [17], [26]. B m and S m−1 represent the unit ball and unit sphere in R m under the l 2 norm. (c) + means taking the positive part of c.

Let the matrix
: R p → R n be a random map with i.i.d. zero-mean Gaussian entries having variance 1/n, the matrix : R q → R n also be a random map with i.i.d. zero-mean Gaussian entries having variance 1/n. We use appropriate norms · sig and · cor for vectors x and v, respectively, according to their structures. And any convex complexity measure can be used for · sig and · cor . For example, l 1 norm for sparse vectors, trace norm · * (the sum of the singular values of a matrix) for low-rank matrices, l ∞ norm for binary vectors.
When there is no prior knowledge about the ground truth x * and v * in model (1), we consider the following minimization program where λ > 0 is a regularization parameter that provides a trade-off between x sig and v cor .
When some prior knowledge of x * or v * is available, we take prior information into account in the following optimization programs and min x∈R p ,v∈R q Our results depend on two notions of convex geometry. Next, in order to present our results, we recall some concepts from convex analysis with respect to a generic norm · on R m as in [25], [26].
The subdifferential of · at x ∈ R m \{0} is the set of vectors Given x ∈ R m \{0}, we define the tangent cone of · at x as The tangent cone T is equal to the set of descent (or nonascent) directions of norm · at the point x. The normal cone to · at x is defined to be the set of all directions s that form obtuse angles with each element in the tangent cone at x, given by In this paper, we use T sig and T cor to represent the tangent cones of norms · sig and · cor at corresponding points, respectively.
To quantify the complexity of a structured vector, we review two geometric measures of structure, the Gaussian complexity and the Gaussian distance.
We know that there is a relationship between Gaussian complexity and Gaussian distance of the scaled subdifferential of x from [42], [43] So we can find the bound of Gaussian complexity by bounding η(t · ∂ x ) at a value of t. And different choices of t can produce different bounds.

III. MAIN RESULTS
In this section, we will present the main results with or without prior information and give some applications for special signal and corruption. The proofs of the theorems can be found in Section V.

A. RECOVERY GUARANTEE WITH PRIOR INFORMATION
We give sufficient conditions for exact and stable recovery whenever prior information of x and v is available by the following result.

B. RECOVERY GUARANTEE WITHOUT PRIOR INFORMATION
Next we present sufficient conditions for exact and stable recovery when prior information of x or v is unavailable.

C. COMPARISON WITH RELATED LITERATURE
For Gaussian matrices ∈ R n×p and ∈ R n×n , we can write In this sense, [ ] ∈ R n×(p+n) is a large Gaussian matrix and x v ∈ R p+n is a large vector. When x is s 1 -sparse and v is s 2 -sparse, then x v is (s 1 + s 2 )-sparse. In this subsection, we use T 1 , T 2 and T U to represent the tangent cones of norms x 1 , v 1 and x v 1 at the corresponding points, respectively.
43258 VOLUME 10, 2022 According to the work in [13], we can recover x v from (9) provided n ω 2 (T U ∩ B p+n ) (ignoring the small constant).
On the other hand, the theoretical recovery bounds obtained from our models can all be approximated as n ω 2 (T 1 ∩ B p ) + ω 2 (T 2 ∩ B n ). We will show that the theoretical recovery threshold obtained by our results for · sig = · cor = · 1 is less than the theoretical recovery threshold obtained by the work [13] for · A = · 1 in some cases, albeit the main contribution of our results is for different measures for the signal and the corruption.
For an s-sparse p-dimensional vector, there are two ways to calculate its Gaussian squared complexity: 2s log( p s )+ 3 2 s and p 1 − 2 π (1 − s p ) 2 according to [26]. The threshold required to recover x form model (9) accroding to [13] is denoted as b o and b n designates the threshold needed by our methods. Next, we compare b o and b n in some cases.
First, we take Gaussian squared complexity So we have b n is always less than b o . This means that our condition is weaker than that in [13].
3): When n = p, s 1 = αp, s 2 = βn, We cannot directly get the size relationship between b o and b n from their expressions. However, for some special values of α and β, we can visually compare the size of b o and b n . For example, we let α = 0.01, β = 0.4 in Figure 1. We can see that in this case b n is always less than b o . Second, we take Gaussian squared complexity 2s log( p s ) + 2): When n = p, s 1 = αp, s 2 = βn and s 1 = s 2 , For α = 0.1, β = 0.01, we can see that in this case b n is always less than b o from Figure 2. Let α = 0.01, β = 0.1. In Figure 3, b n is always less than b o . VOLUME 10, 2022 . For problems with prior information, we have the following two tangent cones If (x,v) is the solution of program (7) or (8), we know that (x− x * ,v − v * ) ∈ T sig × T cor from the definitions of T sig and T cor . For problems without prior information, we have a joint tangent cone from [26] Then the lower bounds of (10) and (11) can be applied to the error vector (a, b) = (x − x * ,v − v * ) to bound its l 2 norm.

E. SOME EXAMPLES
For binary signal and sparse corruption, we can formulate the recovery problem as follows Following from Theorem 1, we present the exact recovery of binary signal x * via program (12).
where T sparse(q,s) is the tangent cone of an s-sparse qdimensional vector under the l 1 norm, satisfying Then the binary signal x * is exactly recovered by the sign of x with probability at least 1 − e −(µ n −2δ √ n−τ ) 2 /2 , where (x,v) is any solution of optimization problem (12).
If both the signal x and corruption v are sparse vectors, we consider the recovery via program min x∈R p ,v∈R q where we set λ = 1.
Corollary 2: If x ∈ R p has at most s sig nonzero entries, v ∈ R q has at most s cor nonzero entries. Suppose that (x,v) is the solution of program (14), then Proof: Using the upper bounds of the Gaussian distance in [26] and Theorem 2, we can easily get the result.

IV. NUMERICAL EXPERIMENTS
In this section, we verify and reinforce the theoretical results of Section III with a series of simulations. We present experiments for both problem models with and without prior information. In each experiment, we used the CVX Matlab package [28], [29] to specify and solve the convex recovery programs. Figures 4-7 report the empirical probability of success for different structured vectors and different problem models. Figure 8 plot the recovery error x − x * 2 + v − v * 2 for sparse signal recovery under noise and sparse corruption with prior information.  In Figure 4, we consider binary signal and sparse corruption in the absence of noise (δ = 0). Fixing the message length p = 300, we vary the number of measurements n ∈ [200, 300] and the number of corruptions s cor = v 0 ∈ [2, 82] and perform the following experiment 10 times for each (n, s cor ) pair: 1): Generate matrices ∈ R n×p and ∈ R n×q with independent N (0, 1/n) entries, respectively.
2): Generate a binary signal vector x * uniformly from 3): Generate a corruption vector v * whose s cor nonzero entries are generated from a standard Gaussian distribution. 4): Solve the optimization problem (12) with y = x + v and δ = 0. 5): Declare success if x − x * 2 / x * 2 < 0.001. In Figure 5, we consider sparse signal and sparse corruption in the absence of noise (δ = 0). Fixing the signal length and the measurement size n = p = 200, we vary the sparsity levels (s sig , s cor ) ∈ [1, 101] 2 . Then we perform the following experiment 10 times for each (s sig , s cor ) pair: 1): Generate matrices ∈ R n×p and ∈ R n×q with independent N (0, 1/n) entries, respectively.
2): Generate a sparse signal vector x * whose s sig nonzero entries are generated from a standard Gaussian distribution.
3): Generate a corruption vector v * whose s cor nonzero entries are generated from a standard Gaussian distribution. 4): For y = x + v and δ = 0, solve the optimization problem as follows In Figure 6 and Figure 7, we consider recovery problem (6) in the absence of noise (δ = 0) without any prior information. We focus on the case where both the signal and the corruption vector have sparse structure, i.e. x s sig = x 1 , v s cor = v 1 . Fixing the signal length and the measurement size n = p = 200, we vary the sparsity levels (s sig , s cor ) ∈ [1, 101] 2 . Set λ = λ den = (1 − s cor/n )/(1 − s sig /p) in Figure 6 and λ = λ spa = √ log(n/s cor )/ log(p/s sig ) in Figure 7. Then we perform the following experiment 10 times for each (s sig , s cor ) pair: 1): Generate matrices ∈ R n×p and ∈ R n×q with independent N (0, 1/n) entries, respectively. VOLUME 10, 2022 FIGURE 7. In the absence of prior information, phase transition for sparse signal recovery under sparse corruption, λ = λ spa = log(n/s cor )/ log(p/s sig ).
2): Generate a sparse signal vector x * whose s sig nonzero entries are generated from a standard Gaussian distribution.
3): Generate a corruption vector v * whose s cor nonzero entries are generated from a standard Gaussian distribution. 4): For y = x + v and δ = 0, solve the optimization problem as follows

5):
Declare success if x − x * 2 / x * 2 < 0.001. In Figure 8, we consider recovery problem (7) in the presence (δ = 0) of noise without any prior information. We also focus on the case where both the signal and the corruption vector have sparse structure. For fixed noise level δ = 0.1 and sparsity fractions (γ sig , γ cor ) = (0.01, 0.4), we vary the message length p ∈ {200, 300, 400} and the number of measurements n ∈ [50, 350]. Then we perform the following experiment 50 times for each (p, n) pair:
2): Generate a signal vector x * whose p·γ sig nonzero entries are generated from a standard Gaussian distribution.
3): Generate a corruption vector v * whose n · γ cor nonzero entries are generated from a standard Gaussian distribution. 4): A dense noise e is generated from a standard Gaussian distribution with e 2 = 0.1. 5): For y = x + v + e, solve the optimization problem as follows The red lines in Figures 4-7 are the theory thresholds of theorems in Section III and the values of Gaussian squared complexity and Gaussian squared distance used here come from Chandrasekaran et al. [13] and Foygel and Mackey [26]. These theoretical recovery thresholds reflect the observed empirical phase transition accurately. In the noisy measurement setting, we see as n increases, the lines for three different values p all converge to a common value from Figure 8.

Take any
⊂ B p+q . Let a ∈ R p and b ∈ R q , we first derive a lower bound on E min [a T b T ] T ∈ a + b 2 via the following lemma.
Lemma 1: Let ∈ R n×p and ∈ R n×q have i.i.d. N (0, 1 n ) entries, and let g ∈ R p+q and h ∈ R n have i.i.d. N (0, 1) entries. Then for any ⊂ B p+q , there holds √ nE min Proof: For a b ∈ and w ∈ B n , let where v ∼ N (0, 1) is independent from the other random variables. Then . Therefore, using Theorem 1.1 of Gordon [31] and simplifying the process, we obtain P min by maximizing over w on each side, we obtain P min Since this is true for all C ∈ R, we can integrate over C ∈ (0, ∞) to obtain E min This concludes the proof. Let˜ = (T sig × T cor ) ∩ S p+q−1 . We use ω 2 and ω 2 to represent the relevant Gaussian squared complexities ω 2 (T sig ∩ B p ) and ω 2 (T cor ∩ B q ), respectively. Next, we relate E min (a,b)∈˜ a+ b 2 to the associated Gaussian square complexity by the following lemma.

Lemma 2:
(16) Proof: According to Lemma 1, we obtain In order to get the desired result, we have to lower bound the right-hand side of (17) for˜ = (T sig × T cor ) ∩ S p+q−1 . Let ω = max a∈T sig ∩B p −g 1 , a and ω = max b∈T cor ∩B q −g 2 , b be the observed Gaussian complexities with E ω = ω and E ω = ω , where g 1 ∈ R p and g 2 ∈ R q are independent vectors with i.i.d. standard normal entries. The proof consists of the following two cases. First, if where we made a substitution in the second equality and (1) is minimized over a and b for a fixed c. Combining the above two cases, we complete the proof.
Similarly, we want to derive a lower bound on E min (a,b)∈T λ Lemma 3: Let λ = t cor t sig for parameters t sig > 0, t cor > 0. Then 2π . VOLUME 10, 2022 Proof: According to Lemma 1, there holds √ nE min We want to derive a lower bound for For σ = ±1 and g = g 1 g 2 where g 1 ∈ R p and g 2 ∈ R q , define d (σ ) Maximizing the right-hand side over the signs, where d sig = max{d  all (a, b) where the last inequality is due to (19). Taking expectations, we obtain where the last inequality is due to the Lemma 8 in [26].

VI. CONCLUSION AND FUTURE WORK
We presented a more general signal recovery model than that in [26]. Our convex programming approaches can recover both x and v with or without prior information, and we give recovery guarantees for both the two cases. Using geometry measures of signal and corruption, the number of measurements required for recovery from any convex program can be computed. Our simulations reinforced our theoretical results. Considering that the constrained models are generally difficult to solve in practice, as pointed out in the future work of [26], it would be of great practical interest to analyze the unconstrained model min x,v x sig + λ v cor + θ y − ( x + v) 2 (21) or min x,v x sig + λ v cor + θ y − ( x + v) 2 2 (22) when no prior bound δ on the noise level is available, where λ and θ are two balance parameters. Moreover, it is an interesting and also challenging problem to analyze the unconstrained model in our general form and model their performance in theory. This will be included in our future research work.