Incoherent and Robust Projection Matrix Design Based on Equiangular Tight Frame

Designing a projection matrix to optimally select the informative samples from high-dimensional data is a challenging task. Several approaches have been proposed for this task, however conventional methods obtain the projection matrix from the corresponding Gram matrix without considering the underlying structure of the equiangular frame. The study propose a framework to optimize the projections based on the equivalent tight frame, which is in turn constructed from the target Gram matrix. The proposed work optimizes the projection matrix by restricting the eigenvalues of the corresponding Gram matrix to ensure reduced pairwise correlation and tightness of the frame. Additionally, an $\ell _{2,1}$ -norm based regularization term and a projection matrix energy constraint are incorporated to reduce the effect of outliers and noisy data. This unified optimization problem results in an incoherent and robust projection matrix. Experiments are performed on synthetic data as well as real images. The performance evaluation is carried out in terms of mutual coherence, signal reconstruction accuracy, and peak signal-to-noise ratio (PSNR). The results show that the sensing error constraint enables the design of optimized projections especially when the signals are noisy and not exactly sparse which is the case in real-world scenarios.


I. INTRODUCTION
In many areas of science and technology, one of the key tasks is to deduce quantities of interest from the measured or sensed data. For instance, in signal and image processing applications, we would like to obtain the signal of interest from the observed measurements. In mathematical terms, the measurement vector y ∈ M ×1 can be related to the signal of interest x ∈ N ×1 via a projection matrix ∈ M ×N as: Conventional techniques suggest that the number of measurements M , must be at least as large as the length of the underlying signal. This concept is related to the Nyquist-Shannon sampling theorem. However using compressed sensing (CS) techniques, it is possible to recover signals of length N with the number of measurements being much smaller than N , provided the signal is sparse [1]- [6]. In the CS framework, we can reconstruct the signal x from the measurement y assuming the relation in (1) with M N .
The associate editor coordinating the review of this manuscript and approving it for publication was Larbi Boubchir .
This under-determined system has an infinite number of solutions for x and the sparsity constraint enables us to determine a unique solution.
A signal is said to be sparse if a large number of its values are zero or can be discarded without significant loss of information. Many real signals are compressible and can be approximated using sparse signals either directly or after application of an appropriate transform. If x is a real compressible signal, then it can be expressed as: where = [ψ 1 , . . . , ψ L ] ∈ N ×L is the (sparsifying) transform basis, θ ∈ L is the vector of sparse coefficients, and e ∈ N is the sparse encoding error (SEE). If e = 0, then the signal x is said to be exactly sparse in with sparsity level s = θ 0 , whereas if e is non-zero but has relatively low energy then x is said to be approximately s-sparse with s = θ 0 . Here θ 0 denotes the number of non-zero elements in θ and 0 represents a vector with all the elements being zero.
A well-approximated representation of the signal x requires appropriately designed projections. From (1) and (2), y = θ + e, where is the projection (or sensing 1 ) matrix, e is the projected noise as a result of the SEE, and D = is the equivalent dictionary of the CS system. Let D ∈ M ×L be a matrix with unit norm columns d 1 , d 2 , . . . , d L , also known as atoms of the frame D. Recovery of the underlying signal x requires the equivalent dictionary D to be as incoherent as possible. The coherence measure µ is given by: which refers to the largest (or worst-case) absolute correlation between any two element vectors of D [7]- [9]. The Gram matrix of D is defined as G = D T D, which is a positive semidefinite and symmetric matrix with unit diagonal elements and rank M . The off-diagonal elements of G correspond to the inner product of two distinct atoms of D. To recover the signal from under-sampled measurements, CS requires sparsity of x and incoherence of D.
Assuming y and D are known, x can be obtained by solving: which is NP-hard. Greedy algorithms, such as orthogonal matching pursuit (OMP) [10], matching pursuit (MP) [11], generalized OMP [12], and others [13] can be employed to solve (4) for θ under certain conditions (with theoretical guarantees) and recover x. For a fixed transform basis , projection matrix is a key determinant in the performance of the recovery algorithms, both convex relaxation-based and greedy algorithms. Optimized projections allow improved signal recovery performance from under-sampled measurements [14]. If y is captured using the frame D, the signal x with sparsity s can be recovered by solving (4), provided the following constraint is satisfied: The largest absolute value of the pairwise correlation is lower bounded by the Welch bound (µ W ) [15], and when the signal dimension N ≤ M 2 , is given by: The t-averaged mutual coherence µ t [14], for a given coherence threshold t, is an alternative metric for evaluating recovery performance of the projection matrix and given by: 1 We use the terms projection matrix and sensing matrix interchangeably.
These coherence measures will be used as the performance metrics for evaluating the efficacy of the optimized projections.
Restricted isometry property (RIP) and spark of the projection matrix can be effective in evaluating the effect of optimized projections [3], [6], [9], [16], [16]. The restricted isometry constant (RIC) δ s ∈ (0, 1) is the smallest value of δ that satisfies [8]: where θ ∈ L is an arbitrary s-sparse signal. Spark of a matrix is defined as the smallest number of linearly dependent columns. The projection matrix plays two key roles. Firstly, it determines how close the sparsest solution obtained using (4) would be to the original signal. For a noise-free CS system, the RIC δ 2s < 1 guarantees that the solution of (4) will equal the original signal [4]. The condition for successful recovery is expressed as µ < 1 2s−1 , or µ(s) + µ(s − 1) < 1 [17]. Secondly, projection matrix affects the recovery performance. With RIC δ s < √ 4s+1−1 2s ensures the successful recovery of a sparse signal using OMP when there is no measurement noise [18].
In this paper, we apply concepts from frame theory for designing incoherent tight frames. A frame D in a real or complex M -dimensional Hilbert space can be represented as a sequence of L (≥ M ) basis vectors {d k } L k=1 , d k ∈ H M , satisfying the Parseval's condition [19], [20]: where α and β are positive constants. If all the frame vectors have unit 2 -norm i.e., d k 2 = 1 ∀k, the frame is known as unit norm frame (UNF). A frame is an equiangular frame if there exists a constant c ∈ (0, 1) such that | d i , d j | = c, A frame with the smallest maximum correlation among all UNFs ∈ M ×L in a finite dimension is said to be a Grassmannian frame [21]. The measure of over-completeness or redundancy of a frame D is given by ρ = L/M [21]. If α = β, then D is said to be an α-tight frame which satisfies the relation L k=1 d k 2 2 = αM . In CS, Welch bound for mutual coherence can be achieved with an equiangular tight frame (ETF) as it has low averaged mutual coherence which is essential for the minimal dependency property. An equiangular tight frame is a unit-norm tight frame (UNTF) in which each pair of column vectors has the same absolute inner product [21].

A. OUR CONTRIBUTION
We propose an incoherent and robust projection matrix design (IR-PMD) method. In IR-PMD, an alternating minimization approach is used for constructing an ETF and designing a robust projection matrix using the designed frame based on an 2,1 -norm regularization and weighted penalty term. Our main contributions are three fold: • We propose a method for constructing an ETF based on the structural and spectral constraints for reduced VOLUME 9, 2021 mutual coherence and improved tightness properties by restricting the eigenvalues of the Gram matrix to achieve a smaller RIC.
• We formulate the projection matrix design as an 2,1 -norm based optimization problem with sensing matrix energy constraint. The 2,1 -norm not only ensures robustness to outliers and noisy data but also promotes row sparsity in order to identify the informative samples which to the best of our knowledge is the first instance of the 2,1 -norm being employed for projection matrix design.
• An adaptive weighted penalty term is introduced to minimize the sensing matrix energy together with the error term for optimizing the projection matrix.
• We present an alternating minimization algorithm to optimize the ETF and design a robust projection matrix using the designed ETF.
We will demonstrate the performance of the proposed approach through a set of experiments on synthetic data and real images. The visual recovery results will be shown for some standard images.

B. OUTLINE OF THE PAPER
Section II describes related work in the area of optimized projection matrix design. In Section III, design of ETF based on the structural and spectral constraints is presented. Section IV describes the proposed robust projection matrix design approach. We discuss the sparse encoding error followed by the proposed framework for designing optimized projections using the target frame designed based using the method in Section III. In Section V, we present simulation results using synthetic data and real images for demonstrating the performance of the proposed method and comparison with existing approaches. Concluding remarks are presented in Section VI.

C. NOTATIONS
Throughout this paper, lowercase letters denote scalars, lowercase bold letters denote vectors, and uppercase bold letters denote matrices. A T , A −1 , and A * denote the transpose, inverse, conjugate transpose of A, respectively, and diag[a] denotes a diagonal matrix with the vector a as its diagonal elements. p -norm is denoted by · p . For a matrix A, a ij denotes the element at the intersection of i th row and j th column, a i denotes the i th column and a j denotes the j th row of matrix A. The Frobenius norm of A is A F = j a j 2 2 , row-wise 2,1 norm is given by A 2,1 = i j |a ij | 2 , and the spectral norm is defined as the square root of the largest eigenvalue λ max of the positive semi-definite matrix A T A. [N ] denotes the set {1, 2, . . . , N }, card(P) represents cardinality of the set P, A P is the sub-matrix of A containing the columns indexed by P. The support of x ∈ N is the index set of its non-zero elements i.e., supp(x) = {j ∈ [N ] : x j = 0}.

II. RELATED WORK
Next we discuss some of the key frameworks in the literature for the design of an ETF and optimization of the corresponding projection matrix. Tropp et al. [22] proposed a general alternating projection method that is capable of solving a large class of inverse eigenvalue problems which includes the construction of tight frames. For constructing a tight frame, they solve a matrix nearness problem based on the application requirements. In [23], the tight frames are constructed based on mutual coherence. The achieved mutual coherence is found to be close to the lower bound when the frame redundancy is not very high and can be employed for constructing frame of any dimension. In this paper, in addition to mutual coherence, we focus on restricting the eigenvalues of the Gram matrix in order to reduce RIC, which in turn ensures tightness.
Elad [14] initiated the work on projection matrix optimization based on the averaged mutual coherence measure using a shrinkage scheme. Since mutual coherence (3) considers worst-case correlation it is not representative of the average reconstruction performance and the t-averaged mutual coherence is more appropriate. However, this shrinkage scheme is computationally intensive and introduces few large off-diagonal elements in the Gram matrix. Due to this the worst-case guarantees for the recovery algorithms do not hold. To address the issue of large off-diagonal elements and to ensure improved tightness of the frame, we apply a shrinkage function followed by spectral constraints. Xu et al. [24] proposed a shrinkage function that projects the Gram matrix onto a non-empty convex set and reduces the off-diagonal elements towards the Welch bound.
Authors in [25] proposed a method for designing D by making a subset of its columns orthogonal, which is equivalent to minimizing the difference between the Gram matrix G and the identity matrix (target ETF). The design of the optimal projection matrix for a given transform basis is formulated as: where V V T is the eigen decomposition of T and = V . From (10), V V T T V V T ≈ V V T and thus V T T V ≈ with the optimization being over ( ). A major advantage of this method is that it is non-iterative and offers significant computational improvement compared to [14]. Similarly, Hong and Zhu [26] optimize the projection matrix based on the equivalent Gram matrix with a penalty term to minimize the sensing matrix energy. In [27], an optimized sparse projection matrix is constructed based on the target Gram matrix with a sensing matrix energy constraint. However, this results in only a slight reduction in the reconstruction error as the projection matrix is obtained directly from the designed Gram matrix.
In our proposed framework, we optimize the frame matrix based on the target Gram matrix and followed by the optimization of the projection matrix using the designed tight frame. To ensure robustness, we incorporate 2,1 -norm to reduce the effect of outliers and a weighted penalty term is used for the sensing matrix energy. Furthermore, Zelnik-Manor et al. [28] proposed an optimized projection matrix design based on block-sparse representation which finds application in block-sparse decoding. The objective function is transformed into a weighted surrogate function given by: where D is represented as a concatenation of B column blocks D[j]. Authors in [29] focus on directly reducing the mutual coherence of the frame, and the optimized projection matrix is designed by solving a non-smooth optimization problem with convergence guarantee. Similar to this, authors in [30], [31], and [32] focus on the optimization of the sensing matrix based on incoherent UNTF. In [31], gradient-based alternating optimization approach is exploited to iteratively optimize the sensing matrix with decrease in the mutual coherence. Similarly, in [32], the goal of optimization is to minimize the mutual coherence between the projection and the sparsifying matrix. An optimal sensing matrix is designed based on incoherent frame vectors by solving a matrix nearness problem with initialization using a partial Fourier basis. Bai et al. [30] optimize the projection matrix such that the corresponding frame matrix D approximates the target ETF. The method includes the sparse representation error to ensure robustness, however it employs the more sensitive Frobenius norm. Similarly, in [33] and [34], authors optimize the projection matrix based on the Gram matrix using alternating optimization. Sparse representation error penalty is included to ensure robustness of the optimized projections. These methods focus on optimizing the projection matrix based on the equivalent Gram matrix rather than using the ETF. Moreover, unlike the proposed approach, they require the training data matrix as well as the sparse encoding error for the optimization of the projection matrix.
In the proposed approach we optimize the sensing matrix with respect to an ETF which provides good interpretability to the sensing matrix. ETF is designed by transforming the eigenvalues to ensure tightness and incoherence. Further, we incorporate the 2,1 -norm in place of the Frobenius norm and an adaptive weighted penalty in the cost function to ensure robustness to noise and outliers while minimizing the sensing matrix energy.

III. PROPOSED ETF DESIGN METHOD
From the preceding discussion on frame design, we note the need for constructing a tight frame is almost equiangular. We formulate an optimization problem that considers the RIP by targeting a smaller value of the RIC and a projection matrix with lower mutual coherence.
Consider the 1 -coherence function µ 1 , a more general concept of coherence, of a frame D defined as: where P ⊂ [L] and 1 ≤ s ≤ L − 1. It can be shown that µ ≤ µ 1 (s) ≤ sµ [20]. Next consider Theorem 5.3 from [20] stated below. Theorem 1: Let D be a frame with 2 -normalized columns and assume s ∈ [L]. For all s-sparse vectors θ: In other words, for each set P ⊂ [L] with card(P) ≤ s, the eigenvalues of the Gram matrix D T P D P lie in the interval Proof: Refer [20] for the proof. In this paper, the primary objective is to construct a frame matrix D which not only satisfies the equiangular property but also exhibits tightness in terms of the spectral constraint. For designing such a frame, we need to constrain the spectral and structural properties of the Gram matrix. Next, we discuss these constraints for frame design and formulate the iterative optimization problem.

A. STRUCTURAL CONSTRAINT
For reduced pairwise mutual correlation, the off-diagonal elements of the Gram matrix should be close to zero and the diagonal elements should be close to unity. The structural constraint set for an ETF with unit norm vectors and pairwise inner product not larger than is given by: where ∈ (0, 1) controls the search space for the desired ETF. Projecting onto the convex projection set H bounds the off-diagonal elements, which in turn implies reduced mutual coherence [14], as below: The structural constraint helps in reducing the mutual correlation but such frames are far from being tight. The shrinkage function (15) results in a Gram matrix G that is full-rank.
To ensure tightness, we use a spectral constraint to reduce the rank of the Gram matrix to M based on Theorem 1. From (13) and mutual coherence measure (3), the eigenvalues of the target Gram matrix should lie in the range [(1 − µ 1 (s − 1)), Thus, we design an ETF based on the sparsity level of the underlying signal and the desired mutual coherence. Let the eigen decomposition of G be Q Q * , where the diagonal elements of (i.e., the eigenvalues of G ) are sorted in descending order. For the spectral constraints, in terms of restricting the eigenvalues of the Gram matrix, consider a set of Gram matrices corresponding to the set of tight frames given by: where the eigenvalues λ ii of the Gram matrix G are sorted in descending order and 1 = 1 − µ 1 (s − 1) and 2 = 1 + µ 1 (s − 1). The sorted eigenvalues are thresholded as in (17) resulting in a spectrally constrained eigenvalue matrix , and the corresponding frame D ETF is obtained as [22]: Thus, D ETF is optimized with respect to the Gram matrix by minimizing the off-diagonal elements of the Gram matrix followed by the spectral constraint on the eigenvalues. An iterative projection algorithm for incoherent frame design (IFD) is presented in Algorithm 1.

Algorithm 1 Incoherent Frame Design (IFD)
Require: D 0 as initialized frame matrix, N frame t = 0 while t ≤ N frame do Project the Gram matrix G t = D T t D t onto the structural constraint set H using (15) to obtain the new Gram matrix G . Compute the eigen decomposition of G = Q Q * Project G onto the spectral constraint set H T by thresholding the eigenvalues using (17). Update the Gram matrix: Remarks: • The ''shrinkage'' process (15) is similar to that in [14] as the structural constraint set H limits the pairwise correlation.
• In this work, the tightness constraint for the frame is characterized by restricting the eigenvalues using the set H T rather than being defined by a tightness constant such as (L/M ) [22], [23] [30] which limits the search space for the best solution. We define the search space for the frame such that the corresponding Gram matrix has eigenvalues restricted to the range [ 1 , 2 ] with each eigenvalue being equal to 1 or 2 as given in (17).
• For constructing an ETF, IFD uses two constraint sets: H for the shrinkage process and H T for restricting the eigenvalues. In addition, it is required to construct an ETF with Gram matrix whose diagonal elements are unity with rank constrained properties as given below: • Alternating projections do not increase the distance between consecutive iterates. However, it may not result in a solution which is closest to the constraint sets [22] [35]. The solution w.r.t. these constraint sets results in an ETF, and the non-increasing nature of the cost function D T t D t − G T 2 F is guaranteed ensuring convergence of the IFD algorithm. Thus, IFD will converge with a random initialization D 0 , but it does not guarantee convergence to a global optimal solution.
• There are extensions of the alternating projection method that have considered non-convex constraint sets such as H rank and H T . In [36], alternating projections on manifolds has been studied and convergence is proved for two smooth manifolds which intersect transversally. In [35], authors considered alternating projections on two non-convex sets and the method converges locally to a point of intersection at a linear rate.
• The constraint sets H and H diag are convex, and H rank and H T are smooth manifolds. Hence, the convergence results for the alternating minimization approach cannot be applied directly here [37]. The convergence proof of alternating projections with more than two constraint sets, some of which are non-convex, is still an open problem. Thus, in Section V, we discuss the convergence of the IFD algorithm mainly based on simulations.

IV. ROBUST PROJECTION MATRIX DESIGN
In this section, we present the proposed robust projection matrix design algorithm. Using the IFD algorithm discussed in the previous section we can construct an ETF D ETF .

A. PROBLEM FORMULATION
The underlying signal or data X is represented in terms of the sparsifying basis by the sparse representation coefficients with the sparse encoding error (SEE) given by E = X − . Depending on the sparsity level and its relationship with mutual coherence (5), optimized projections in [14], [24], [25], [28] are designed to reduce pairwise correlation. Projection matrices designed in this manner have very small correlation with the transform basis and the CS system is able to perform better than with a random sensing matrix, with E = 0 for exactly sparse signals. In the real world, signals are not exactly sparse and are only compressible. Thus it is not possible to learn the ground truth dictionary and even for a learned transform basis, a non-zero SEE E exists. In such scenarios, if is not designed appropriately, then the recovery performance can be adversely affected when the projection matrix projects E onto the measurement domain. Therefore, the SEE must be taken into account while optimizing the projection matrix. A robust CS system, where the objective is to reduce the mutual coherence and the sensing matrix energy, can be designed using the formulation below: where γ is the weight parameter for the encoding error and − D 2,1 is the fidelity term. In conventional methods, the squared 2 -norm and Frobenius norm are usually employed which tend to amplify the effect of outliers and noisy data. The 2,1 norm is more robust than the Frobenius norm and squared 2 -norm. The 2,1 -norm promotes row sparsity with a few non-zero rows. Thus, (19) optimizes the projection matrix based on the ETF while being robust to noise and outliers.
However, (19) optimizes the projection matrix based on the SEE E, which in turn can be obtained from a large training data X. This would require large memory storage and increase the computational complexity [26]. For a learned transform basis which represents the data sparsely, the SEE energy E 2 F is small. However, when projecting the sparse error onto the measurement domain using the projection matrix , E F can become very large if the projection matrix is not designed appropriately. This in turn affects the reconstruction accuracy of the CS system adversely. Based on the norm consistency property, we have E F ≤ F E F . This implies that if is optimized appropriately, then a small F would result in a smaller projected error E F [26]. Moreover, when a large training data set is used to represent the class of underlying signals, energy in the corresponding SEE E is expected to be spread evenly across the columns. Therefore, the resultant SEE E can be viewed as an additive white Gaussian noise. Based on the above discussion, we can minimize F while targeting the SEE and construct an optimized projection matrix using the reformulated objective function given below: Minimizing F directly not only makes the algorithm independent of the training data X and the SEE E but is applicable for any CS system as long as a learned dictionary is available. Further, to allow sensing matrix energy to be adaptively minimized w.r.t. the reconstruction fidelity, we introduce a weight matrix W . The proposed incoherent and robust projection matrix design (IR-PMD) is reformulated as where √ W is a diagonal matrix whose elements are adapted based on , and are given by: Here [ − D] i is the i th row of ( − D). The term √ W 2 F minimizes the sensing matrix energy weighted by ( − D) and also aids in obtaining a stable optimal solution. A large error term [ − D] i 2 results in a small weight w ii for the corresponding row of and vice versa. Hence, the elements of with lower fidelity error have a greater weight or impact on the projection matrix optimization.

B. OPTIMIZATION
We propose an iterative procedure to construct the ETF D and optimize the projection matrix alternately. At each iteration, ETF is constructed using the projection matrix from the previous iteration via the IFD algorithm and then is optimized for the designed ETF. Using the frame D obtained from the IFD algorithm, the solution to (21) gives an optimized projection matrix. For solving (21), we transform the problem as: where W is a diagonal matrix defined in (22).
, where ε is a small constant.
Next, let √ W = ∈ M ×N and √ W D = D ∈ M ×L . Then, we can rewrite (23) as: To update and eventually , we differentiate (24) w.r.t. . The derivative w.r.t. is given as: We can update by setting the derivative to zero to obtain the following closed form expression: where I N ∈ N ×N is the identity matrix. Finally, the optimized projection matrix can be obtained as: The main steps of the incoherent and robust projection matrix design (IR-PMD) algorithm are summarized in Algorithm 2. Remarks: • Different from the approaches in [26], [27] which optimize based on the Gram matrix, IR-PMD algorithm optimizes the projection matrix based on the ETF designed via the IFD algorithm to ensure it is close to the target frame. This approach is more intuitive and computationally efficient. VOLUME 9, 2021
Compute the frame D t−1 = t−1 and corresponding Gram matrix G t−1 Construct ETF D t using IFD algorithm 1. Update W t using (22) Update t using (26) Update t using (27) • IR-PMD algorithm employs the 2,1 -norm to ensure robustness of the optimized projection matrix to outliers. The √ W factor accounts for the reconstruction error while minimizing the sensing matrix energy.
• The term √ W F leads to reduced sensing matrix energy while − D 2,1 determines the fidelity of the approximation. The projection matrix is optimized based on a target frame that possesses reduced mutual coherence and tightness properties, while also ensuring robustness to the encoding error.
• Key difference between the proposed IR-PMD algorithm and the approaches in [26], [27], [30] is introduction of the weighted penalty term and the 2,1 -norm which ensures robustness to the sparse encoding error. In addition, the cost function in [30] depends on the training data matrix X which tends to be very large, imposing a large memory (or storage) and computational cost on the method. The cost function in our proposed method is dependent on and D which is much more efficient in terms of memory and computations. We will show that optimized projections using IR-PMD are robust for real-life applications such as image compression and result in improved recovery performance even in the presence of noise. Simulation results on synthetic and real data sets presented in the next section will confirm these gains.

C. COMPUTATIONAL COMPLEXITY
Computational complexity of the proposed projection matrix design algorithm can be analyzed in two parts, IFD algorithm for designing the ETF and IR-PMD algorithm for optimized projections. The computationally dominant step of the IFD algorithm is the eigen decomposition of the Gram matrix which has complexity O L 3 . For IR-PMD, two steps involve matrix inversion. For updating using (26), computational complexity of the matrix inversion is O N 3 and the cost of matrix multiplication is O MN 2 . Hence the overall cost for updating is O N 3 . To update , cost of matrix inversion of √ W is O M 3 . In [30], the computational cost of updating with J signals in the training data set is dominated by eigen decomposition whose complexity is O min N (L + J ) 2 , N 2 (L + J ) which is larger than the cost of the proposed approach.

V. SIMULATION RESULTS
In this section, we present the results of experiments we have carried out on both synthetic and real data sets. The first set of experiments are performed on synthetic data to demonstrate the recovery performance of the various CS systems being compared. We generate a set of training signals based on the learned dictionary [38] and white Gaussian noise is added for understanding robustness of the CS systems. We also illustrate the effectiveness of optimized projections in real-world applications by considering image compression.
In the proposed scheme, IFD algorithm allows us to design an ETF which is used as the target frame for optimizing the projection matrix using the IR-PMD algorithm. We will demonstrate the significant properties of the designed ETF such as the absolute and averaged mutual coherence. Next, we demonstrate the performance of incoherent and robust projections through reconstruction results under different conditions. In each CS system, we consider signals that are sparse with respect to a learned transform basis . The performance of the proposed method will be compared with the techniques presented in [14], [23], [25], [28]- [30].

A. PERFORMANCE EVALUATION ON SYNTHETIC DATA
We generate an N × L transform basis with Gaussian distributed entries, and initialize the projection matrix to an M × N random matrix 0 . This initial set up is used in each of the following CS systems for optimizing the projection matrix: CS DCS [25], CS ZMRE [28], CS XPC [24], CS CHZ [29], CS BLH [30], CS HZ [26], and CS HLZL [27]. We use 0 for CS rand , and for CS Elad [14] we use γ = 0.95 to optimize the projections.
First, we compare the performance of frame design using IFD algorithm, CS rand , CS Elad , and CS TKK [23]. For training and testing, synthetic data is generated using . Let be s-sparse vectors of dimension L × 1 where each vector is normally distributed with zero mean and unit variance. The signals {x i } 2J i=1 are generated as: where n i is zero-mean white Gaussian noise. The corresponding signal-to-noise ratio is denoted by SNR. Let X = [x 1 , x 2 , . . . , x 2J ] T be the noisy signal and X * = x * 1 , x * 2 , . . . , x *

2J
T be the original noise-free signal. The training and test data sets are obtained by dividing X into two equal parts i.e., X test and X train each of size N × J . Measurement matrix Y = X test is computed using the optimized projection matrix obtained via the CS algorithms. In each case, OMP algorithm is used to recover the sparse signals from the measurement vectors by solving (4). The normalized mean square reconstruction error (NMSE) is computed as: where x test j is the recovered sparse signal. The mean square error is computed as: x test j can be expressed in terms of the corresponding sparse coefficients as x i = θ i and the sparse coefficients θ i can be obtained by solving the OMP problem [10]. The peak signal to noise ratio (PSNR) is defined as [38]: where r = 8 is the number of bits per pixel.

1) CONVERGENCE
For performance evaluation of the IFD algorithm, experiments are carried out with M = 20, L = 120, and sparsity s = 4. We start with an arbitrary M × L matrix D with full rank, and the shrinkage function followed by the spectral constraint are applied iteratively resulting in a tight frame that is closest to being an incoherent matrix. We consider convergence of the alternating projections in the IFD algorithm based on the numerical results. The distance measure d F (j) is used to study convergence of the iterative algorithm: where G T is the target optimal Gram matrix that belongs to the sets satisfying structural and spectral constraints and G j is the designed Gram matrix at the j th iteration. Fig. 1(a) shows the evolution of d MS for the IFD algorithm. It is seen that the algorithm eventually results in a frame that lies at the intersection of the constraint sets. Convergence of the d MS indicates that the frame is close to being an incoherent UNTF. For demonstrating convergence of the IR-PMD algorithm we assume M = 20, L = 120, N = 60, and N SE = 100 with each experiment being repeated 1000 times. Fig. 1(b) shows the convergence of the objective function ( t , D t ) (21) for different values of γ over 10000 iterations. Fig. 2(a) shows the evolution of MSE for different values of γ . Although,  the objective function was seen to converge faster for smaller value of γ , MSE attains the lowest value for γ = 0.1. Note that this is the noiseless case and signals are perfectly sparse for the experiment with synthetic data, thus smaller values of γ are able to achieve reduced pairwise correlation and improved reconstruction accuracy. Fig. 2(b) shows the evolution of the iterate t , measured using the expression: It is seen that t converges well for different values of γ . Next we run a similar experiment using a real data set, namely the Caltech 101 data set. Fig. 3(a) shows the evolution of MSE for different values of γ . For a real data set, sparse encoding error can be significant as the underlying signal is not exactly sparse. Hence, larger value of γ minimizes the sensing matrix energy and leads to reduced MSE. In Fig. 3(b),   FIGURE 3. Performance with Caltech 101 data set. VOLUME 9, 2021 both MSE and peak signal-to-noise ratio (PSNR) are shown as a function of γ . It is noted that γ > 0.2, results in reduced MSE and increased PSNR. It is observed that for real images with no additional noise the MSE decreases for larger values of γ and for exactly sparse synthetic signals γ = 0.1 gives improved performance. However, when the synthetic signals have a non-zero sparse encoding error, higher values of γ minimize the sensing matrix energy with increased PSNR and reduced MSE. Fig. 4 shows the evolution of the averaged mutual coherence for different CS techniques with SNR = 10 dB and s = 4. It shows that IFD results in averaged mutual coherence that is closer to the Welch bound compared to the other algorithms. Fig. 5(a) shows the averaged mutual coherence µ t (with t = ) for the different frame design methods as a function of the redundancy of the frame. As the number of measurements M increases, all the methods achieve lower mutual coherence. The superiority of IFD algorithm is clearly evident for smaller values of M . The reduced mutual coherence achieved by IFD results in a tighter frame, which is then used in the IR-PMD algorithm. Similarly, for projection matrix design approaches, Fig. 5(b) shows the averaged mutual coherence µ t as a function of the number of measurements. It is seen that the IR-PMD algorithm is superior compared to the other approaches. In Figs. 6(a) and 6(b), evolution of the averaged mutual coherence µ t is shown for different values of γ for synthetic and Caltech data sets, respectively. It is seen that smaller value of γ results in reduced mutual coherence as µ t will be determined by the term − D 2 F with 2 F being given less weight.

2) RECONSTRUCTION PERFORMANCE
Next we compare the different CS techniques for optimizing the projection matrix in terms of their ability to reconstruct the original signal from under-sampled measurements. We consider a set of synthetic signals generated as earlier.
The performance of the algorithms is evaluated in terms of mutual coherence and reconstruction error. The maximum number of iterations is set to 100 for each algorithm except for the non-iterative algorithm CS DCS [25] and 1000 iterations for CS HLZL [27]. Each experiment is repeated 500 times and the averaged results are presented.  For CS rand , a normally distributed random sensing matrix is employed and this random matrix is used as the initial sensing matrix for all algorithms except CS ZMRE which employs the optimized matrix from CS DCS as the initial matrix. The parameter values for the different algorithms are selected based on the values mentioned in the corresponding references. If the optimal values were not mentioned, we carried out the parameter tuning to determine the optimal values.
The MSE over a range of parameter values is computed for the methods and the parameter value resulting in minimum MSE is selected as the optimal value. The parameter values selected are as follows: (i) CS Elad : t = 0.4, γ = 0.95, (ii) CS ZMRE : α = 0.99, (iii) CS BLH : α = 0.8 [30], (iv) CS XPC : α = 0.7 [24], (v) CS CHZ : ρ = 0.5, η = 1.2, α = 0.99ρ, β = 2, Iter in = 100 (number of inner iterations), and Iter out = 15 (number of outer iterations) [29], (vi) CS HZ : λ = 0.1 for synthetic data and λ = 0.9 for real data sets [26], and (vii) CS HLZL : λ = 0.25, κ = 20 for synthetic data and λ = 0.5, κ = 30 for real data sets [27]. Fig. 7 shows the histogram of the absolute off-diagonal elements of the normalized Gram matrix corresponding to the optimized projection matrix. The proposed IR-PMD algorithm, compared to the other algorithms, results in a histogram that is centered more towards the origin which implies smaller local correlations and thus improved recovery performance.   Fig. 9, show the NMSE (e r ) as a function of the sparsity s for frame design and projection matrix design algorithms, respectively. It is seen that the proposed IFD and IR-PMD algorithms result in reduced NMSE compared to the other algorithms. IR-PMD algorithm performs better than the mutual coherence-based methods such as [14], [25], [28], [29]. This indicates that incoherent projections are suitable for exactly sparse signals, however for signals which contain significant amount of noise it is also important to minimize the sensing matrix energy.
In Tables 1 and 2, for each CS technique, we study the performance of the optimized sensing matrix in terms of µ, µ t , I − G 2 F , and MSE under noiseless and noisy scenarios. For the noiseless case, projection matrix with reduced mutual coherence leads to improved signal reconstruction, however this is not true for the noisy scenario. Even though mutual coherence using IR-PMD algorithm is not significantly reduced, the use of 2,1 -norm results in improved signal reconstruction at low SNR as shown in Table 3. The improved MSE is due to the robustness of 2,1 -norm to outliers. However, for the noiseless case, CS BLH has reduced MSE than IR-PMD due to the Frobenius norm used to minimize the mutual correlation. This shows that mutual    coherence is not the best measure for estimating signal recovery performance when the signals contain significant amount of noise. For achieving a robust CS system, we consider the encoding error in order to reduce the effect of noise on the signal recovery process. In Fig. 10, we study the recovery performance of CS systems as a function of the number of measurements M with s = 4, and for two different SNRs. Figs. 9 and 10 clearly demonstrate the superior recovery performance of the proposed IR-PMD algorithm. In Fig. 11, we show the recovery performance as a function of the number of dictionary atoms L. For higher SNR, the performance of the mutual coherence based methods and the proposed algorithm are comparable. The 2,1 -norm based IR-PMD algorithm is found to be more robust against the encoding error and therefore is well suited for applications such as image compression where the sparse encoding error is significant and cannot be ignored.

B. OPTIMIZED PROJECTIONS FOR IMAGE COMPRESSION
To demonstrate the practical applicability of the proposed method we perform experiments on standard real images. We investigate the applicability of the proposed CS system for image compression. The objective is to recover the original images from their compressed versions via optimized projections. For sparse representation of the images, we learn the transform basis using KSVD [38]. It is difficult to learn the ground truth dictionary for a set of real images and there will always be a non-zero encoding error even without any additional noise. Similar to the experiments with synthetic data, we set the parameters as s = 4, M = 20, and L = 100. We evaluate the recovery performance of the proposed method via two experiments involving standard images and Caltech 101 data set [39].

1) STANDARD IMAGES
In this experiment, we use a collection of 40 well-known standard images, including cameraman, Elaine, Lena, and pirate. From these images, we obtain 60000 patches each of size 8 × 8 pixels. Of these 50000 patches are used for training and the remaining 10000 are used for testing the recovery performance of the optimized projections and its generalization capability. Each patch is represented as a 64 × 1 vector with the set of patches expressed as a matrix of size 64 × 60000. We evaluate the recovery performance on the test data by adding white Gaussian noise with the resulting SNR ranging from 5 dB to 45 dB. The recovery performance for image compression is typically measured in terms of the peak signal-to-noise ratio (PSNR) given in (31) and MSE given in (30). The MSE performance of different CS systems is shown in Fig. 12. It is seen that at low SNR even random projections perform better than some of the mutual coherence-based approaches. The proposed IR-PMD algorithm has superior performance and is able to reduce the effect of noise. Fig. 13 shows the PSNR performance as  a function of the SNR. Note that IR-PMD outperforms the approaches in [26], [27], [30] which shows the impact of the 2,1 -norm and adaptive weight penalty in achieving improved robustness.

2) CALTECH DATA SET
In this experiment, we use the Caltech 101 data set [39] which contains a total of 9146 images with 101 different object categories and each object category has 40-800 images. As in the previous experiment, these images are converted into training and testing patches. We obtain 60000 patches each of size 8 × 8 pixels with 50000 patches used for training and the remaining 10000 are used for testing the recovery performance. The training data is used to train the sparsifying basis. For experimenting with different values of SNR, 1000 samples are randomly selected from X train for each run. Then, white Gaussian noise is added to each subset of the training samples with SNR varying from 5 dB to 45 dB. The test data set X test is divided into 10 testing subsets each with 1000 samples which are used to evaluate recovery performance of the different CS systems.
In Fig. 14, the output PSNR is shown for SNR ranging from 5 dB to 30 dB. The results show that CS IR-PMD outperforms the other techniques. The performance of the mutual coherence-based methods is poor even compared to CS rand when testing on real images with additional noise. The proposed approach outperforms the robust projection matrix design methods such as [26], [27], [30] and is robust even in low SNR conditions as the sensing matrix energy is minimized adaptively based on the reconstruction fidelity. In Fig. 15, MSE is shown as a function of the SNR and CS IR-PMD achieves the best performance among the techniques being compared.   16 compares the performance of CS HLZL and CS IR-PMD algorithms using a projection matrix trained at SNR = 5 dB. Additional noise is not added to these images as real images already contain noise resulting in a sparse encoding error. As seen in the inset zoom-in views, the edges are much better recovered using CS IR-PMD . The proposed method achieves the best qualitative results in these tests. Using the optimized projections designed by the IR-PMD algorithm, PSNR of the reconstructed images is higher than that of the images reconstructed using the other CS systems. Experimental results on synthetic data, real standard images, and the Caltech 101 data set have demonstrated the effectiveness of the proposed method.
Finally, in order to demonstrate the effectiveness of the 2,1 -norm regularization, we compare the performance of CS IR-PMD with CS WF which employs Frobenius norm and optimizes the objective    17 shows that CS IR-PMD based on the 2,1 -norm consistently outperforms the model based on the Frobenius norm.

VI. CONCLUSION
In this paper, we have proposed an algorithm for the design of incoherent and robust projection matrix. We have demonstrated that the 2,1 -norm leads to a more robust formulation while effectively minimizing the sensing matrix energy. In practice, the equivalent dictionary should be close to being a tight frame in order to achieve good recovery performance.
In this work, we proposed a novel ETF design method based on the structural and spectral constraints on the eigenvalue characteristics. The resulting tight frame is used to design optimized projections. In addition to mutual coherence and tightness constraints, the weighted sensing matrix energy term acts as a penalty to avoid amplification of the sparse encoding error in the measurement domain. A closed form solution set for an optimal projection matrix with weighted penalty has been derived. The superiority of the proposed algorithm has been demonstrated with extensive experiments on synthetic data and real images.