On the Error Probability of Support Recovery for Orthogonal Matching Pursuit With a Random Measurement Matrix

In this paper, an asymptotic bound on the recovery error probability of a sparse signal is derived for the orthogonal matching pursuit algorithm. The proposed bound is based on the support recovery analysis with a random measurement matrix, which gets closer to the empirical bound tightly in a large system and high signal-to-noise ratio regime. During recovery, all signal associated parameters introduced in the existing analysis are considered together. Furthermore, the necessary conditions for the conventional bound derivation such as the minimum value limit of non-zero coefficients in the sparse signal can be relaxed in our proposed approach. Through numerical evaluations, our theoretical performance bounds are demonstrated to be close to the simulated results, notably closer than those obtained previously.


I. INTRODUCTION
In general, signals are sampled at least at a Nyquist sampling rate, corresponding to twice the maximum frequency of interest for signal reconstruction. However, it may be inefficient to sample at a Nyquist rate for a signal, when the signal can be sparsely represented in a transformed domain. For this reason, compressive sensing (CS) has received a considerable attention in that a sparse signal can be recovered from fewer samples than the Nyquist requirement. CS has been applied to various applications such as wireless communications, and image compressions [1], [2].
In most applications of CS, a noisy measurement can be expressed as where ∈ R L×N is a measurement matrix, and s ∈ R N ×1 is a Q-sparse vector (i.e., s 0 = Q) with a support set denoted by = {i|s i = 0} whose indices indicate The associate editor coordinating the review of this manuscript and approving it for publication was Nianqiang Li . the positions of non-zero coefficients of s. Here, s i stands for the i-th element of s. The recovery problem in CS is underdetermined because there are fewer measurements than unknowns. However, the problem in CS can be solved by taking advantage of the sparsity of recovering signal. Suppose that s belongs to the set obtained by the union of all the In the recovery problem, although an l 0 -optimization is good to seek the sparsest solution, it requires a combinatorial search. Alternatively, the recovery problem of finding the coefficients of Q basis vectors in the signal decomposition can be formulated as an l 1 -optimization problem [3]. Besides, there are low-complexity greedy algorithms to recover a sparse signal such as orthogonal matching pursuit (OMP) and compressive sampling matching pursuit (CoSaMP) [4].
OMP is an iterative algorithm to find out the best matching projection with low-complexity, which is frequently used for sparse signal recovery in CS because of its efficiency and effectiveness. For the OMP, the best column index can be chosen by searching for the most closely correlated column in the measurement matrix with the current residual vector at VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ each step. At each step, the index of the selected column is included in the estimated support set and the residual vector is updated by projecting the observation onto the linear subspace spanned by the columns in the set. Detailed procedures are provided in Algorithm 1.

Algorithm 1 OMP Algorithm
Input: y, , and stopping criterion Output:ŝ ∈ R N ×1 , andˆ Notation : q is an iteration index.ŝ andˆ denote estimated sparse signal and estimated support set, respectively. In addition, r (q) is the q-th residual vector. Initialization : q = 0,ˆ (0) = ∅, and r (0) = y 1: while Stopping criterion (e.g., q < Q) do 2: Despite the weaknesses such as poor reconstruction performance in a low signal-to-noise ratio (SNR), OMP has been widely used with other strengths owing to its simple procedures, such easy and fast implementability with low complexity. In particular, it is suitable for CS recovery in wireless communications and data compressions for Internet of things (IoT) because IoT devices have limited hardware resources (e.g., small memory, low battery) with low processing power [5], [6]. Thus, it is important to investigate theoretical performance predictions of OMP for applications, which is a primary motivation of this study.
The accuracy of the support recovery is one of the most important issues in CS. Once the support set is correctly reconstructed, (1) is well-posed and provides an accurate estimate of s using the least square approach [7]. Therefore, in some applications, the estimation of a sparse signal refers to the recovery of the support set. For example, distributed detection problem can be solved using CS and the estimation performance is directly related to the accuracy of support recovery [8]. Furthermore, statistical performance of support recovery with a random measurement matrix is required in some applications. For example, in CS encryption, a measurement matrix used as a secret key can be randomly changed to enhance the security (e.g., one-time pad) [9]- [11]. To evaluate the performance of these applications, a statistical recovery performance for ensemble should be analyzed.
Two features of a measurement matrix are used to analyze the support recovery of OMP. One is the restricted isometry property (RIP) and the other is the mutual coherence. For the RIP, the restricted isometry constant (RIC) of order Q can be defined as the smallest constant of δ Q such that (2) There exist several studies using the RIP to see the performance of the support recovery of OMP in [12]- [20]. Most of these provide sufficient conditions for the exact support recovery of OMP theoretically. In [14], a sufficient condition of the RIC for the exact recovery in a noiseless case is given by δ Q+1 < 1

√
Q . In addition, in a noisy case, when δ Q+1 < 1 √ Q+1 , a sufficient condition of the minimum value of non-zero coefficients in s for the exact support recovery is provided as follows: where T n ∞ ≤ in [20]. However, as shown in [21], determining whether or not a measurement matrix satisfies the RIP is NP-hard. In addition, most of the analyses are focused on derivations of sufficient conditions for the support recovery. On the other hand, mutual coherence-based analyses have been performed to derive the upper bounds on the error probability of the support recovery in OMP in [22], [23], where the mutual coherence can be defined by However, they have a non-negligible gap between the derived theoretical bounds and empirical results. Besides, the information-theoretic limits of the sparsity recovery problem are investigated in [24]. In this paper, we derive an asymptotic upper bound on the recovery error probability in the high SNR regime and large system for a random ensemble in OMP. The proposed analysis is not affected by a single realization of a measurement matrix; however, it depends on statistical features of the random measurement matrix. We are motivated to provide an acceptable performance guarantee for identifying a support set of a sparse signal. To this end, our analysis is based on the statistical characteristics of a random measurement matrix in CS. The derivation can be used for some applications such as CS encryption [9]- [11] and wireless communications [25]- [27], which can directly exploit sparsity with an acceptable prediction of the support recovery.
Through numerical evaluations, it is demonstrated that the proposed theoretical analysis closely agrees with the empirically obtained results compared to the conventional mutual coherence-based analysis [23] in terms of the recovery error probability. Furthermore, it is shown that the proposed analysis is more valid in a large system and high-SNR regime.
Notation: Upper-case and lower-case boldface letters are used for matrices and vectors, respectively. X T denotes the transpose of X. For a matrix X (a vector x), X m (x m ) and X n,m represent the m-th row (element, resp.) and the m-th row and the n-th column element of X, respectively. The p-norm of a vector a is denoted by a p (If p = 2, the norm is denoted by a without the subscript). N (µ, R) and U (a, b) represent the distribution of Gaussian random vectors with mean vector µ and covariance matrix R and the uniform distribution with an interval [a, b], respectively. x, y denotes the inner product between x and y.

II. SUPPORT RECOVERY ANALYSIS IN OMP
In this section, we first introduce an existing recovery analysis for a support set that utilizes the mutual coherence of a measurement matrix in OMP. Then, the proposed support recovery analysis is presented as the main contribution in this paper and compared with a conventional approach in [23].

A. CONVENTIONAL ANALYSIS OF SUPPORT RECOVERY
In previous studies [22], [23], the support recovery in OMP is analyzed with the mutual coherence of a measurement matrix, denoted by the maximum absolute cross-correlation between columns of a measurement matrix. Then, recent results on support recovery in [23] can be abstracted by the following theorem, providing good insights on coherencebased analysis.
Theorem 1 (Miandji et al. [23]): Consider a noisy CS measurement in (1) with n ∼ N 0, σ 2 I . Let λ = Pr{| ˜ j , n | ≤ β}, for some constant β ≥ 0 and j ∈ {1, 2, · · · , N }. If s min ≥ β, then an upper bound on the recovery error probability is given by where γ = µ max s max . Here, s min and s max are the minimum and maximum values of non-zero coefficients in s, respectively. Moreover, λ is lower bounded by In Theorem 1, the mutual coherence, µ max , is used to derive the correlation between a column of a measurement matrix and a noisy measurement (for the proof of this theorem, see [23] for details). Theorem 1 is an updated version of the support recovery analysis derived in [22] to take into account various parameters (L, N , Q, s min , s max , σ 2 and µ max ) and to improve the performance bound. However, it requires to meet a certain condition, s min ≥ β. In addition, there is a significant difference between theoretical and empirical results, making it difficult to apply the analysis with (5) to actual applications. In particular, if a Gaussian measurement matrix having a relatively high mutual coherence (e.g., E[µ max ] = 0.3899 with L = 128 and N = 256) is used, the upper bound on the error probability in (5) approaches one even though the other parameters are set for favorable conditions. Motivated by these problems, we present a new analysis of support recovery in OMP, as follows.

B. PROPOSED ANALYSIS OF SUPPORT RECOVERY
In this subsection, we derive an asymptotic upper bound on the recovery error probability in high-SNR regime and large system with a random measurement matrix in OMP.
Theorem 2: Consider a noisy CS measurement in (1), where a measurement matrix, is a random matrix whose elements are independent and identically distributed random variables (e.g., Gaussian, Bernoulli) of zero mean and 1/L variance, s (q) ∼ U (s min , s max ), and n ∼ N 0, σ 2 I . Here, (q) denotes an index of the q-th non-zero element in s. Then, an asymptotic upper bound on the recovery error probability in the high-SNR regime and large system is given by 12 (s max − s min ) 2 ) + σ 2 , and erf(·) denotes the error function. Proof of Theorem 2: The following two lemmas will be used to prove Theorem 2 and the proofs of lemmas are presented at Appendices A and B, respectively.
Lemma 1: Let k, = ˜ k , s + n and j, Then, the distributions of k, and j, • become N 0,σ 2 κ and N 0,σ 2 η , respectively, in a large system. Lemma 2: Pr(min j∈ | ˜ j , s + n | > min j∈ | ˜ j , s +n |) approaches zero in a large-system and high-SNR regime.
In OMP, a support set cannot be recovered exactly when where r (q) is the residual vector in the q-th iteration and 1 ≤ k ≤ N hereinafter. Here, r (1) = y and r (q) = y − (q−1)ŝ(q−1) , whereˆ (q−1) andŝ (q−1) are the measurement matrix with the estimated support set and the estimated sparse vector in the (q − 1)-th iteration, respectively. However, it is difficult to derive the probability of an event due to the recursively computed residual vector. Alternatively, in [22], [23], a recovery error event can be approximated as The error event of (9) is defined using a measurement vector, y in [22], [23], while the indices of a support set can be found using the residual vectors in OMP. It is unfavorable for the support recovery to use a measurement vector instead of residual vectors, which can reduce the interferences caused by the superposition with other atoms for the support set in y. However, the probability of (9) can be used as an asymptotic upper bound on a recovery error probability (i.e., a probability of an error event of (8) for Q iterations) because the interferences become negligibly small with a large L. In addition, since Pr(min j∈ | ˜ j , s + n | > min j∈ | ˜ j , s + n |) approaches zero with a large L and a high SNR (from Lemma 2), the recovery error event in (9) can be modified to easily derive an upper bound on the recovery error probability in a large-system and high-SNR regime as follows: The term on the left-hand side of (10) can be rewritten as Then, from (10), (11) and Lemma 1, we can rewrite the recovery error event as follows: For convenience, let η = |s min + j, • |, κ = | k, |. Thus, to obtain a recovery error probability, the distributions of η and κ have to be derived. From Lemma 1, the distribution of k, becomes N 0,σ 2 κ in a large-system. Thus, κ follows the half-normal distribution, where the probability density function and cumulative distribution function of κ can be approximated by and Then, the mean and variance of κ are given by, respectively, and In the same manner, from Lemma 1, the distribution of s min + j, • becomes a Gaussian distribution with a mean of s min and a variance ofσ 2 η in a large system (i.e., with a large L). Thus, η follows the folded normal distribution, where the probability density function and cumulative distribution function of η can be approximated by and Then, the mean and variance of η are given by, respectively, and Let = min j∈ η − max k ∈ κ. Then, according to order statistics [28], the probability density function In addition, the cumulative distribution function of min j∈ η is given by 1 where | | = Q. From the distributions, an asymptotic upper bound on the recovery error probability in the high-SNR regime can be approximated as Then, our proof can be completed by substituting (13), (14), and (18) into (21).

C. DISCUSSIONS ON ANALYSIS
In this subsection, detailed descriptions for the analysis of the support recovery are presented with a particular focus on the features of Theorem 2.
Correlation is an important factor in the support recovery of OMP since an index associated with the highest correlation coefficient between a measurement vector, y, and a column vector of a measurement matrix, , is iteratively selected. Note that η y and κ y are the correlation coefficients for the correct and incorrect supports associated with j ∈ and k ∈ , respectively. Then, from (13) and (17), it is shown that η follows a different distribution from κ. In detail, η is an absolute value of a Gaussian random variable with the mean of s min and variance of s 2 min (Q−1) L + σ 2 . In addition, κ is an absolute value of a Gaussian random variable with zero mean and variance of Q L (( s max +s min 2 ) 2 + 1 12 (s max −s min ) 2 )+σ 2 . Then, we can intuitively know that as s min and L increase (or Q and σ 2 decrease), the recovery error probability decreases because Pr (η > κ) becomes high with a large |µ η − µ κ | and small σ 2 η and σ 2 κ . From (21), an upper bound on the recovery error probability which is an integral expression is derived. Although it is difficult to transform the upper bound to a closed-form expression due to an error function, it is possible to ensure the stability of the integral expression. In (21), if z → ∞, a term in the integral approaches zero due to f κ (∞) = 0, F κ (∞) = 1 and F η (∞) = 1. Therefore, with a sufficiently large number for the integral upper limit, a numerical technique can be used to perform the integral in (21).
Let us compare Theorem 1 and 2 theoretically. A recovery error in CS is defined asˆ = , whereˆ is an estimated support set. Instead ofˆ = , a recovery error event in (9) is alternatively used to derive an upper bound on the recovery error probability in both analyses. In addition, the same parameters (i.e., L, N , Q, s min , s max , σ 2 ) are considered in both the analyses. However, statistical features of a random measurement matrix are used to derive an upper bound on the recovery error probability in the proposed derivation, while the conventional derivation in [23] is based on mutual coherence of a measurement matrix (i.e., µ max ). Above all, Theorem 2 can ensure a significantly improved bound, compared to the conventional bound (i.e., Theorem 1) (Detailed performance evaluation will be presented in Section III).

III. NUMERICAL RESULTS
In this section, we present the numerical results to compare the theoretical results of Theorem 1 and 2 with the empirical results. For simulations, it is assumed that s min and s max are included in a non-zero coefficient set of s and the other coefficients follow a uniform distribution on the interval [s min s max ] and elements of a measurement matrix follow the Gaussian distribution (i.e., l,n ∼ N (0, 1/L)). The SNR is defined as for j ∈ . The empirical results for the recovery error probability are divided into two cases: 1) empirical results for = obtained by OMP algorithm, 2) empirical results for the alternative recovery error event in (9) used to derive Theorems 1 and 2. As mentioned in Section II, the second case has a larger recovery error probability than that of the first case.   Meanwhile, since the mutual coherence of a random measurement matrix is relatively high (e.g., E[µ max ] = 0.3899 when L = 128), all theoretical results of Theorem 1 converge to one in the following simulation environment. Even if σ 2 → 0 (which results in SNR → ∞), an upper bound on the recovery From the following figures, we can find an improved performance guarantee in Theorem 2 over various parameters (i.e., L, N , Q, σ 2 , s min and s max ) in terms of the recovery error probability. Fig. 2 and Fig. 3 show that as the SNR and the sparsity, defined as ϕ = N −Q Q , increase (i.e., Q decreases), the recovery error probability decreases. In addition, by comparing Fig. 2 with Fig. 3, it is shown that Theorem 2 can yield theoretical results that are much closer to the empirically obtained results with a large L because it is easier to discriminate between j and k with a large dimension of a column vector of in a recovery error event defined in (9). From Fig. 4(a), we can clearly find that the prediction accuracy of Theorem 2 is significantly improved in a large system (i.e., large L and N ). Furthermore, as seen in Fig. 4(b), the theoretical results in Theorem 2 are sufficiently close to empirical results with a high sparsity (i.e., small Q) because the superposition interference of a measurement vector in (9) becomes small with a small Q. A ratio of s min to s max , i.e., ζ = s min s max , is also a crucial parameter that influences on the recovery error probability. If ζ is small (e.g., 0.1 ∼ 0.5), the recovery performance of OMP deteriorates significantly. As shown in (3), it is difficult to obtain a moderate recovery performance with a low ζ . So, we deal with the recovery performance for a high ζ . From Fig. 5, we can find that a large ζ , which is maintained to obtain a reasonable recovery performance in CS is favorable for theoretical results in Theorem 2. Furthermore, in Fig. 6, we show the gap between the theoretical and empirical results (i.e., accuracy of Theorem 2) over various undersampling (i.e., L/N ) and relative sparsity ratios (i.e., Q/L) when ζ = 0.7 and SNR = 18dB. From the figure, it is found that Theorem 2 is valid with large undersampling and small relative sparsity ratios (i.e., a large system).

IV. CONCLUSION
In this paper, we analyzed support recovery with a random measurement matrix in OMP. An asymptotic upper bound on the recovery error probability for the random ensemble was derived using statistical features of a random measurement matrix. The upper bound can ensure a more precise performance prediction than the conventional analysis without any required condition. From the numerical results, we showed that the upper bound is sufficiently close to the empirical results which is desirable for its applicability. Furthermore, for a large system (i.e., large L and N ), a gap between the theoretical and empirical results becomes negligibly small. Note that • is supported on • which has Q − 1 indices excluding j in . Then, as k, , the distribution of j, • becomes N 0,σ 2 η due to the CLT for large L and Q, wherẽ To show that Pr(min j∈ | ˜ j , s + n | < min j∈ | ˜ j , s + n |) approaches zero in a large-system and high-SNR regime, Pr(| ˜ j , s + n | < | ˜ j , s + n |) is alternatively analyzed in this subsection. To this end, consider For convenience, let˜ j, • =˜ T j • s • +˜ T j n. From Lemma 1, it can be found that the distribution of s min + j, • becomes N (s min ,σ 2 η ) in a large system. Then, Pr(s min + j, • < 0) approaches zero in a large-system and high-SNR regime because s min > 0 andσ 2 η (= s 2 min (Q−1) L + σ 2 ) → 0. In the same manner, as k, and j, • , the distribution of From this, the distribution of s min +˜ j, • also becomes N (s min , σ 2 ). Then, in a large-system and high-SNR regime, Pr(s j +˜ j, • < 0) approaches zero, because Pr(s j +˜ j, • < 0) < Pr(s min + j, • < 0) and σ 2 → 0 when L → ∞ and σ 2 → 0. Then, it can be shown that where α = s max − s min . For convenience, let a = α σ˜s √ 2 , it can be shown that When L → ∞, σ 2 s approaches zero. Furthermore, a = ∞, b = −∞, erf(a) = 1 and exp(−a 2 ) = 0, when σ 2 s = 0. Then, from (27), Pr(s j +˜ T j •s • < 0) approaches zero in a large-system and high-SNR regime. Professor with the School of Mechatronics/Electrical Engineering and Computer Science/Artificial Intelligence, Gwangju Institute of Science and Technology (GIST), South Korea. He has published over 100 journal articles and conference papers in information and signal processing area. He holds over 20 granted U.S. patents. His research interests include data channel signal processing and coding, energy informatics, intelligence implementations for smart grid, and information processing for system intelligence in emerging the ICT/IoT applications. VOLUME 8, 2020