Measurement-Driven Framework With Simultaneous Sensing Matrix and Dictionary Optimization for Compressed Sensing

This paper presents a measurement-driven framework to deal with the compressed sensing (CS) system design problem. Under this novel framework, the sparse coefficient matrix is calculated according to the low-dimension measurements, rather than updated along with the dictionary as in the traditional cases. Moreover, a new cost function is proposed to simultaneously optimize the sensing matrix and dictionary. In order to minimize this cost function, an iterative algorithm is carried out. In every iteration, the solutions of the sensing matrix and dictionary are derived analytically. Experiments are executed with real images, especially medical images. The results demonstrate the superiority of the designed CS system composed of the optimized sensing matrix and dictionary with improved performance for image compression and reconstruction.


I. INTRODUCTION
Compressed sensing (CS) has attracted vast attention since its introduction around 2006 [1]- [3] and it is a useful tool in many engineering fields, such as wireless sensor networks [4]- [6] and medical image processing [7], [8]. As the necessary condition of CS, signal sparse representation has a much longer history [9]. In this theory, the original signal x ∈ N ×1 is assumed to be in the form of x s (1) where ∈ N ×L is called a dictionary with the columns referred to as atoms and s is the sparse coefficient vector.
Let v ∈ L×1 with v(l) being its l-th element. The l p -norm of vector v is defined as v p ( L l=1 |v(l)| p ) 1/p , p ≥ 1 The associate editor coordinating the review of this manuscript and approving it for publication was Kai Li . Note that v p is not a norm in a strict sense for 0 ≤ p < 1. For convenience, v 0 is used to denote the number of non-zero elements in v. Then, x given by (1) is said K -sparse in if s 0 = K .
Sparse representation theory leads to a widely investigated problem named dictionary design. Dictionary design is to find a dictionary to represent a class of signals for a given sparsity level K . Typical examples include the Fourier dictionary for frequency-sparse signals, a multiband modulated Discrete Prolate Spheroidal Sequences (DPSS) dictionary for sampled multiband signals [10], [11], and a dictionary learned from a training dataset [12], [13]. Considering learning a dictionary, let {x k } P k=1 be a set of training samples with x k ∈ N ×1 . The basic problem, usually referred to as dictionary learning, can be formulated as (2) is difficult to be solved directly. A two-stage alternating minimization procedure is usually adopted. The first stage is sparse coding, aiming at finding the (column) sparse matrix S with a given , i.e., min s k s k 0 s.t. x k = s k , ∀ k = 1, 2, · · · , P or min s k x k − s k 2 2 s.t. s k 0 ≤ K , ∀ k = 1, 2, · · · , P (4) This problem can be solved by orthogonal matching pursuit (OMP) based techniques [14]- [16]. Many algorithms differentiate each other mainly in the second stage named dictionary updating [12], [13], that is to update with the obtained S fixed. In the method of optimal direction (MOD) [12], the dictionary is simply taken as the solution of followed by an extra normalization procedure. Here T denotes the transpose operator. Taking the sparse structure of the given S into consideration, Aharon et al. proposed an algorithm, named K-singular value decomposition (K-SVD) [13], in which the atoms of the dictionary are updated one by one. Such an algorithm usually yields a better performance than the MOD as the non-zero elements in S are simultaneously updated. In general, the sparse coefficients are always updated along with the corresponding dictionary under the same model, such as (2) discussed above. If signal sparse representation holds, CS can be formulated as a mathematical framework dealing with the accurate reconstruction of a high-dimension signal x ∈ N ×1 from its corresponding low-dimension measurement y ∈ M ×1 with M N . This measuring process is implemented by projecting the original signal via a well chosen sensing matrix ∈ M ×N : y = x (5) Substitute x in (5) by (1), then y can be rewritten as y = s Ds (6) D is sometimes named equivalent dictionary. With both y and D given, reconstruction is to find s through min s s 0 s.t. y = Ds (7) or min s y − Ds 2 2 s.t. s 0 ≤ K Similar with (3) and (4), such problem can also be addressed using the greedy algorithms, such as OMP based techniques [14]- [16]. The estimated signal is simply obtained with x = s.
A key notion for signal reconstruction is the restricted isometry property (RIP) [1], [3]. A matrix D is said (K , δ)-RIP if there exists a δ with 0 ≤ δ < 1 such that holds for all K -sparse vectors s. Clearly, a small δ can ensure that any subset of D with cardinality less than K are nearly orthogonal. It has been shown that when the equivalent dictionary D is (2K , δ)-RIP, a K -sparse X in can be reconstructed exactly from its low-dimension measurement [1]- [3], [9]. Another significant property for exact signal reconstruction is spark. The spark of a matrix D is defined as the smallest number of columns in D that are linearly dependent. It has been shown that as long as the spark of D is greater than 2K , K -sparse signal s can be exactly recovered via (7) or (8) [9]. Since neither RIP nor spark is tractable, it is preferable to use alternative properties of D that can be easily manipulated. One of such properties is the mutual coherence [9], [17], which is defined as µ(D) measures the maximum linear dependency possibly achieved by any two columns of matrix D. As shown in [9], a K -sparse signal can be exactly recovered from the measurement as long as With the dictionary learned from the training data via (2), the property of the equivalent dictionary D depends only on the sensing matrix . (10) implies that if is well designed such that µ(D) is as small as possible, the CS system will allow a wider set of candidate signals to be successfully recovered.
Roughly speaking, many approaches for designing sensing matrix are based on a similar criterion: The dictionary is assumed to be given and S t is a class of Gram matrices possessing certain properties, e.g., the Gram of equiangular tight frame leads to good coherence property, see [17], [18]. However, it should be pointed out that (10) is just a worst-case bound and the mutual coherence cannot reflect the actual performance of the system accurately. The sensing matrix designed based on coherence property as (11) yields impressive performance when the signal to be considered is exactly sparse in the dictionary, but this is not true when there exists sparse representation error [19]- [26].
Note that the sensing matrix and the dictionary in a CS system are coupled in pair, therefore it would be better to optimize these two matrices simultaneously [22]- [28]. Moreover, in the practical CS scenario, what we receive is the low-dimension measurement y rather than the original x. So it is crucial to recover the sparse coefficient s accurately according to y and the well-designed sensing matrix and dictionary .
The main objective of this paper is to propose a novel framework for CS system optimization and the contribution is three-fold: VOLUME 8, 2020 • A novel measurement-driven framework is presented to design the CS system in order to improve the reconstruction accuracy of the sparse coefficients from the low-dimension measurements. In such a design framework, sparse coefficients and dictionary are updated with different measures.
• Under the measurement-driven framework, a new cost function is proposed for optimizing sensing matrix and dictionary simultaneously. Different from the work in [22] and [23], the sensing matrix and the dictionary can be designed jointly under this new cost function.
• An iterative algorithm that updates sensing matrix and dictionary alternately is put forward to solve the design problem, in which the solutions of sensing matrix and dictionary are derived analytically, hence the convergence of the cost function can be ensured.
In addition, the potential of the designed system is demonstrated by a great many of experiments based on real images, especially medical images. The remainder of this paper is arranged as follows. In Section II, some preliminaries are provided and the work in [22] and [23] is introduced. Section III is devoted to formulating problem of a novel measurement-driven CS system. Besides, a new cost function for simultaneous sensing matrix and dictionary optimization is presented. The main algorithm for updating sensing matrix and dictionary is investigated in Section IV, where in each iteration, the solutions of sensing matrix and dictionary are derived analytically. Simulations are carried out in Section V to examine the performance of the proposed system and to compare with the existing works. Some concluding remarks are given in Section VI.
In this paper, we use lowercase and capital bold characters to indicate vectors and matrices, respectively. All the parameters are real-valued.

II. PRELIMINARIES AND RELATED WORK
In this section, we review some preliminaries on sensing matrix and dictionary optimization based on the work in [22] and [23].
There are lots of work concerning designing optimal CS systems. In most of these papers, the sensing matrix and dictionary are optimized semi-independently, i.e., the dictionary is learned according to the signal data under the model (2) [12], [13], while the sensing matrix is designed to make the equivalent dictionary possess certain properties with corresponding dictionary fixed, such as the criterion (11) [17]- [21]. Seen from the reconstruction procedure (7) or (8), one can observe that the performance of a CS system is determined by the equivalent dictionary D = . Therefore, it would be better to optimize these two matrices and simultaneously under a uniform framework.
To the best of our knowledge, the first work dealing with the simultaneous sensing matrix and dictionary optimization was probably [22]. In general, dictionary learning intends to design a dictionary along with the sparse coefficient matrix S such that the sparse representation error E X − S is minimized. In practical applications, this error E is not nil, for example, in the case of image compression as we will show in Section V. In the design of CS system, besides , one has to consider how to choose the sensing matrix such that the original signals can be reconstructed from the corresponding measurements Y = X = S + E with high accuracy. E is the amplified version of E. This error may seriously degrade the reconstruction accuracy of S as Y may be far away from being sparse in . This observation is the main motivation of [22] and [23] that takes the sensing matrix into account in dictionary learning and leads to the following optimal dictionary learning problem with sensing matrix embedded: The parameter α is used to balance E and its amplified version E. Note that the sensing matrix is a function of dictionary . Such a function, denoted as = f ( ), is generally implicit and determined by the method used for sensing matrix design.
In [22], the sensing matrix was designed using (11) with G t = I L fixed. I L denotes the identity matrix of dimension L. With some algebraic operations, the design problem was converted into the following where is the diagonal matrix in the SVD of T . The optimal {v k } were searched one by one using an iterative algorithm: This algorithm is based on eigen-decomposition and it is computationally efficient but cannot guarantee the global solution.
In [23], the following modified cost function was adopted: where G t is a target Gram matrix possessing certain property, e.g., coherence property. One main contribution of [23] is to derive the closed-form solution of problem (13). From then on, several works have been published for jointly optimizing sensing matrix and dictionary. For example, [24] extended the work of [23] to tensor compressive sensing and the experiments on multi-dimensional signals verified its superiority; [25] mainly concerned the normalization of the incoherent dictionary and the equivalent dictionary 35952 VOLUME 8, 2020 for clear physical meaning when optimizing these two matrices. Although these works have tried to optimize the sensing matrix and dictionary of CS system in a joint way, it should be pointed out that the model of (12) is still far away from simultaneously optimizing and . An iterative procedure is used to attack (12), in which the sensing matrix and dictionary are alternatively updated by minimizing different cost functions, i.e., when updating sensing matrix, the authors jump out of the problem (12) and turn to a new measure = f ( ), e.g., (13) in [23]. The key to simultaneous optimization is to derive a cost function of and that is tractable.

III. PROBLEM FORMULATION
In this section, the measurement-driven framework for designing CS system with simultaneous sensing matrix and dictionary optimization will be introduced in detail. Before that, a preprocessing procedure on sensing matrix is investigated, and the resulted constraint on sensing matrix will be applied in our method throughout this paper.

A. PREPROCESSING ON SENSING MATRIX
Let us take a close look at the signal reconstruction problem (8). 1 Denote x − s e and From a statistical point of view, assume that e follows a normal distribution N (0, νI N ). So the probability density function (PDF) of e is in which det(·) denotes the determinant operator [29]. ε = e indicates that ε has a multivariate normal distribution N (0, ν T ), which means that the PDF of ε obeys According to the maximum likelihood estimation (MLE) principle [26], [30], the best estimate of the sparse coefficient vector s is the one that maximizes the PDF f ε (ξ ) at ξ = y−Ds, which leads to as in a statistical point of view, the variance ν ≥ 0 is independent from s.
With the above discussions, we modify (8) as 1 (7) is also feasible. In the following, we will set sparsity K for sparse coding procedure, i.e., expression as (8). where¯ ( T ) −1/2 . When x − s = 0, the MLE principle indicates that the reconstruction procedure carried out by (14) is better than (8) [26].
Let general M ×N full row rank matrix has the following singular value decomposition (SVD) with U and V two orthonormal matrices of dimensions M ×M and N × N , respectively, and the diagonal singular value matrix. It is easy to show that Our model is depicted as (14), so for convenience, in the following, the optimal CS system design is considered with the sensing matrix constrained by the form Remark 1: Hereinafter, all the sensing matrices we talk about in our method are constrained by (15). A matrix described as (15) is actually a tight frame, which has been proved valid when being applied to wireless sensor networks [31], [32]. Comparing (14) with (8), this preprocessing procedure on just constrains the searching space for optimal sensing matrix, and will not affect the process for calculating sparse coefficients, e.g., OMP based techniques can be employed as usual.

B. MEASUREMENT-DRIVEN FRAMEWORK
Now go to our main job on the measurement-driven framework. As seen from the discussions on sparse coefficients, no matter model (2) in sparse representation theory or (12) in CS, the sparse coefficient matrix S is always updated along with the dictionary , i.e., alternating between sparse coding for S and dictionary updating for . 2 However in the practical CS scenario, what we receive is the low-dimension measurements Y , and our ultimate task is to reconstruct S from Y , which will be affected by both and as indicated in (8) or (14). From this point of view, it is important to recover S adapted to the measurement matrix Y , rather than the high-dimension signal matrix X. We name this as measurement-driven framework to make it different from data-driven or task-driven frameworks as the latter two are popularly used in dictionary learning problem [33], and we intend to calculate the sparse coefficient matrix S from the measurements Y without considering the original data X.
In our system design strategy, mathematically, the independent update procedure of S can be formulated as where Y is measured by the CS system composed of pair ( , ), the design of and will be discussed in the next subsection, and the notation S(:, k) = s k . (16) makes the sparse coefficients totally associate with the measurements and the measure procedure (according to sensing matrix and dictionary ). Remark 2: It should be declared that (14) is the process concerning signal reconstruction, this is exactly the reason why we choose (16) to update sparse coefficients in the system design strategy. Make this updating procedure fully connected with the signal reconstruction, then utilize the obtained sparse coefficients to optimize sensing matrix and dictionary, so that and will be implicitly adapted to the reconstruction accuracy. As far as the authors know, this is the first work to separate the updates of sparse coefficients and dictionary into different cost functions. Calculate S from the measurements, then design system pair ( , ) with S fixed. This is the core idea of our measurement-driven framework.

C. SIMULTANEOUS SENSING MATRIX AND DICTIONARY OPTIMIZATION
Besides the measurement-driven framework, another main contribution of this paper is to derive one cost function that can optimize sensing matrix and dictionary simultaneously.
As sparse representation is prerequisite for CS theory, when designing the dictionary, the penalty term regarding representation error should be included in the design model naturally. Once again, in our measurement-driven framework, the sparse coefficients are updated by (16), so 1 ( ) is just a cost function of , and S is viewed as a constant matrix when minimizing 1 ( ). This is totally different from the traditional dictionary learning problem (2).
Considering the sensing matrix, a better should result in less information loss when reducing the dimension of the original signal to lower one. Define X * S as the ideal signal set without sparse representation error. The desired sensing matrix is expected to sense the key information in X * as precisely as possible. Denote the clean measurements Y * = X * , then it is obvious to derive the best linear estimate of X * with the clean measurements Y * via the following Least-Squareŝ as is constrained by (15) in our work which leads to [29].
With the best linear estimateX obtained above and Y * = X * , we propose the following penalty term to design the CS system for preferable performance. Regarding this penalty term, S is also treated as a constant matrix which is calculated with an independent step by (16). Based on the discussions above, the following multiobjective cost function for simultaneously optimizing sensing matrix and dictionary is proposed: where ω is the weighting factor to control the importance of each term. Remark 3: • The system design strategy of this paper can be summarized as: I. update sparse coefficients S with (16); II. update system pair ( , ) with (18). Iterating these two steps results in our desired CS system composed of optimal ( , ). The first step is associated with the signal reconstruction accuracy from the measurements directly, while the second step is focused on the properties of system matrices and . On one hand, this cost function (18) makes sensing matrix and dictionary connect to each other closely so that information is allowed to be fully exchanged between these two members. On the other hand, the dictionary is learned mainly based on the signals, and this operation makes it return to original intention of training dictionary as indicated in sparse representation theory. 3 • In many existing works concerning designing robust sensing matrix against sparse representation error, the penalty term E 2 F = (X − S) 2 F was taken into consideration [19]- [21], [26]. In our opinion, if we include this term on designing system pair ( , ), the property of may be destroyed from representing original signals sparsely when the tradeoff strategy is not properly made. Besides, as illustrated in [21], for cases where there are no sufficient number of data available to obtain the error matrix E, or where a dictionary should be trained on huge datasets with millions of training samples and in a dynamic CS system for streaming signals, it is usually prohibitive to compute and store E for designing . Moreover, in fact, this penalty term is exactly the one for calculating sparse coefficients S in our first step (16), and the effect of this term on can be implicitly reflected via 2 ( , ) with the obtained S. This also avoids the repetitive optimization of (X − S) 2 F . • Comparing with [22] and [23], besides an independent step for updating sparse coefficients, another main improvement of this paper is the cost function (18) for simultaneously optimizing sensing matrix and dictionary. In the two mentioned references, though model (12) embeds the sensing matrix in the design of dictionary, the convergence of (12) cannot be guaranteed strictly, as the update procedure for sensing matrix actually does not follow the same cost function. In this paper, we design the pair ( , ) using (18) sternly, so better convergence behavior can be expected.

IV. ITERATIVE ALGORITHM FOR CS SYSTEM DESIGN
This section is devoted to the detailed algorithm description of the CS system design. The pseudo-code is given below:

Algorithm
Initialization: 0 -initial sensing matrix constrained by (15); 0 -initial dictionary; X -training data; K -sparsity level; Iter 1 -iteration number for system design; Iter 2 -iteration number for sensing matrix and dictionary update. Set i = 1. Begin: For 1 ≤ i ≤ Iter 1 , repeat Step I -Step III: • Step I: With i−1 and i−1 , compute the measurement matrix Y = i−1 X and solve the following to get the sparse coefficients S i by OMP algorithm. Set S = S i , j = 1 and

Go to
Step II with j → j + 1. i−1 , then go to Begin with i → i + 1 to continue the iterative procedure. End: End the algorithm, and the designed CS system is composed of ( Iter 1 , Iter 1 ).
As (19) is attacked by OMP method, the remaining problems in the proposed algorithm are to update the dictionary and sensing matrix, i.e., solving (20) and (21). For convenience, the superscripts and subscripts will be omitted in the following.

A. UPDATE DICTIONARY ψ
With some algebraic operations, ( , ) indicated in (18) can be rewritten as where both this problem can be solved by the following theorem proposed in [23]. Theorem 1: Let A, B and S be known and the SVDs of B and S be given below Then the solutions to (23) are characterized by whereÃ 11 is the first block of with dimension rank(B)× rank(S), andˆ 12 ,ˆ 21 andˆ 22 are all arbitrary matrices of proper dimensions. In the experiment part,ˆ 12 ,ˆ 21 andˆ 22 are all generated with normally distributed entries.

B. UPDATE SENSING MATRIX φ
When considering updating sensing matrix , we can rewrite ( , ) as follows with tr[·] denoting the trace operation and independent from . Note that the second equation of (25) holds because of (15) which leads to Define Q S( S) T a positive semi-definite matrix and σ nṼ (n, :) According tõ V (n, :) and N n=1Ṽ (n, :) one can conclude that the expression (28) satisfies with V Q defined as the singular vector matrix of S( S) T . So the solution of (26), as well as (21), can be expressed as with U ∈ M ×M an arbitrary orthonormal matrix. In the experiment part, U, V 11 and V 22 are all set to identity matrices of proper dimensions for convenience. With the analytical solutions of (24) and (32), after several iterations, we can reach the optimal dictionary and sensing matrix of (18).

Remark 4:
• Take a close look at the convergence behavior of the new cost function for simultaneously optimizing sensing matrix and dictionary, i.e., Step II in the proposed algorithm concerning index j. This is an iterative procedure, resulting in a series of {( the solution of the above problem is derive analytically as (24). So ( Then turn to consider (15) (15) just constrains the searching space for optimal sensing matrix and does not make change to the cost function. With the closed-form solution (32), we can conclude that under the constraint (15), ( i )} makes a monotonically nonincreasing function. So the convergence of the process for optimizing sensing matrix and dictionary can be guaranteed.
• For the iteration indexed by i in the algorithm, it is hard to say whether it converges or not, as the update of sparse coefficients is an independent procedure from the cost function ( , ). This is treated as a specialty of our paper. In some existing works such as [22], [23] and [26], S is included in the design with ( , ). When discussing the convergence performance, the authors always assumed that the OMP based techniques can be performed perfectly and the best sparse approximations to the signals are obtained. However, it is not guaranteed whether this assumption holds all the time for all the cases.
• Regarding the results of (24) and (32), there still exists several degrees of freedom, such asˆ 12 ,ˆ 21 ,ˆ 22 and orthonormal matrices U, V 11 , V 22 . If we can find certain significative criterions to further design and , these degrees of freedom may be optimized.

V. EXPERIMENT RESULTS
In this section, we evaluate the performance of the system design method using real image data, especially medical images. Firstly, the convergence performance of minimizing the cost function ( , ) is assessed with synthetic data. Then the choice of the weighting factor ω is discussed with image data. The extensive experiments on both image data and real images are executed based on the results of the above two.
A. CONVERGENCE BEHAVIOR OF (18) In the first place, we carry out several simulations to examine the convergence behavior of optimizing cost function ( , ), i.e., Step II in the proposed algorithm. An N × L is generated with normally distributed entries as the ground-truth dictionary. One more random N × L dictionary (0) and an M × N matrix (0) of structure (15) is respectively set as the initial condition.
The data for training and testing is generated as follows. Firstly, we produce 2P K -sparse L ×1 vectors {s k } 2P k=1 , where each non-zero element of s k is randomly positioned with an Gaussian distribution of i.i.d. that has zero-mean and unit variance. With the ground-truth dictionary , the set of signal vectors {x k } 2P k=1 is generated by Then divide X into two equal parts, i.e., X = X 1 X 2 with X 1 and X 2 both of dimension N × P. X 1 is used for training ( , ) and X 2 is used for testing purpose.
The measurements Y {y k } are obtained using Y = X 2 for of each iteration. The reconstruction accuracy is quantified with the mean square error (MSE) defined as [13] whereX 2 is the recovered version of X 2 with its k-th atom given byx k = ŝ k with updated , andŝ k is the solution of the following minimization: which is solved using the OMP algorithm. Set K = 4, M = 20, N = 80, L = 120, Iter 2 = 1000 and P = 1000. 4 Figure 1 depicts the evolution of the cost function ( (j) , (j) ). The corresponding evolutions of the iterates (j) and (j) , measured using Figures 2 and 3, respectively. The evolution of σ mse for the CS system composed of ( (j) , (j) ) is given in Figure 4.
Remark 5: As seen from Figures 1 to 4, it is clear that the cost function ( , ), the iterate metrics δ and δ , and the MSE performance all converge well within 100 iterations for three choices of ω. Especially the monotonic decrease performance of ( , ) verifies the statement in point one of Remark 4 very well. It is also observed that the convergence 4 The dimension of measurement vectors M is usually set such that M ≥ 4K to ensure a K -sparse signal to be recovered from its measurement with a high probability [1]- [3].   of the iterate δ seems a little fluctuant. This phenomenon can be explained by the constraint (15), which limits the searching space for sensing matrix. Broadly speaking, δ converges anyhow. Moreover, larger ω leads to slower convergence. This is also caused by (15), as the penalty term 2 ( , ) constrained by (15) is de-emphasized by taking small ω.

B. CHOICE OF THE WEIGHTING FACTOR ω IN (18)
From here on, all the experiments will be carried out with image data or real images. The parameters K = 4, N = 64 and L = 256 are fixed.
The training data and testing data are real images extracted from the LabelMe database [34] as in [23]. The training and testing sets both consist of 6000 64×1 samples, which means that P = 6000. Denote the clean testing set as {x * k } with each x * k ∈ 64×1 , ∀k = 1, · · · , P. White Gaussian noise set {e k } is added to each of the clean testing data x * k with certain signal-to-noise ratio (SNR) σ snr . The set of signals {x * k + e k } is used to evaluate the signal reconstruction accuracy of the CS systems.
For image applications, the recovery accuracy is assessed in terms of peak signal-to-noise ratio (PSNR), defined as [13] σ psnr 10 × log 10[ with r = 8 bits per pixel and σ mse the same as in (33). With M = 20, Iter 1 = 50 and Iter 2 = 50 (Iter 1 and Iter 2 are fixed from here on), execute every experiment for 100 times and the average results are saved. Figure 5 shows the effects of different ω values on the performance indicator σ psnr with σ snr = ∞ (i.e., no additional noise is added to the image data).
Remark 6: Though there is no systematic way to find the best factor ω, data experiments show that ω taking value around 1 usually leads to high reconstruction accuracy. So ω = 1 is set in the sequel.

C. EXPERIMENTS WITH IMAGE DATA
Now we start to evaluate the performance of our designed system compared with several existing works. The system composed of sensing matrix and dictionary optimized by the proposed algorithm is denoted as CS NEW . For comparison,  the system composed of K-SVD dictionary [13] with random sensing matrix, and that designed by [22], [23] and [26] will be tested. The corresponding comparisons are denoted as CS KSVD , CS DCS , CS BLL and CS LZWH , respectively. The training data and testing data are kept the same as those used in the previous subsection.
We generate an initial sensing matrix 0 ∈ M ×64 with all the entries i.i.d. uniformly distributed (also used for CS KSVD ), then constrain its structure with (15) for CS NEW . An overcomplete discrete cosine transformation (DCT) dictionary 0 of size 64×256 is created by sampling the cosine wave at different frequencies. The iteration numbers for CS KSVD , CS DCS , CS BLL and CS LZWH are all kept the same as in corresponding literatures.
When σ snr = ∞, the PSNR performance of the five compared CS systems versus measurement dimension M is depicted in Figure 6.
Remark 7: It can be observed from Figure 6 that the superiority of the proposed system CS NEW is more notable with small M values. When M increases to more than 30,  the improvements of CS BLL and CS LZWH are prominent and the performance gets close to CS NEW . It should be emphasized that CS is a theory that focuses on both compression and sensing, so the stable performance of CS NEW on high compression ratio (N /M ) makes it remarkable.
The above experiment is carried out with σ snr = ∞. To demonstrate the robustness of the systems, varying SNR cases are executed. For M = 10, 20, 30 and 40, when the input SNR level added to the testing data is varying from 10 dB to 28 dB, the PSNR results are shown in Figures 7 -10.
Remark 8: As can be observed, when considering high compression ratio scenarios, e.g., when M = 10, the proposed CS NEW performs much better than other methods. It achieves PSNR values almost 2 dB higher than CS LZWH and 4 dB higher than CS BLL . When for M ≥ 20, the phenomena are consistent with the previous experiment. The performance of CS BLL and CS LZWH improves dramatically and gets close to CS NEW . We still can say that CS NEW is more robust than others for all these cases.
The simulations are conducted with MATLAB R2018a, in a laptop with Intel Core i5-8250U CPU, 8.00 GB RAM,   In our discussions, designing system is an off-line task and such an increasing in design complexity (compared with CS LZWH ) is acceptable. As will be seen in the follow-up experiments, the superiority of CS NEW over CS LZWH is notable, especially for medical image cases.

D. EXPERIMENTS WITH REAL IMAGES
Besides the image patches, we carry out a great many of experiments with full images to evaluate the performance of each system.
A frequently-used image quality assessment index for assessing the perceptual recovery result of a full image is the VOLUME 8, 2020  mean structural similarity (MSSIM) defined in [35]. As clarified by the authors of [35], the value of MSSIM denoted as σ mssim is bounded by   with equality holds if and only if the two compared images are exactly the same. The better the image recovers, the bigger of the value σ mssim will be. Twelve popularly tested images as indicated in Figure 11 are adopted. For every 8 × 8 non-overlapping patch in each image, it is re-arranged to a 64 × 1 vector. For instance, when considering the 'Lena' image of size 512 × 512, we get P = 4096 vectors for testing. Tables 2 and 3 summarize the recovery PSNR and MSSIM results for these twelve images with σ snr = ∞ and M = 20. For each test, we highlight the best result with bold.
Remark 9: As seen from Tables 2 and 3, though the gap is not so notable, CS NEW achieves all the best results. Compared with PSNR, MSSIM is a much more illustrative indicator to demonstrate the superiority on image visual effect. Now we apply the new system design method to medical image compression and reconstruction tasks. As known, it is difficult to diagnose special diseases due to the lack of medical resources or less diagnostic experience from clinicians. Sending the case report of patient with corresponding images to the experts in center hospital by the network is an effective way to improve the diagnosis accuracy. Gastrointestinal endoscopy is an important technique to diagnose gastrointestinal diseases, which generates a huge number of images in each operation. For such a heavy transmission task in remote diagnosis, CS can be employed to improve the transmission efficiency and also improve the reconstruction quality.
The images selected for testing in this part include four upper digestive tract images (UDTI) of size 144 × 168 [36] and eight gastric images (GI) of size 392 × 424 as shown in Figure 12. Each medical image is preprocessed to a 64 × P testing matrix in a similar way as above.
Tables 4 and 5 summarize the recovery PSNR and MSSIM results for these medical images with M = 10. For each test, we also highlight the best result with bold.
Remark 10: For the medical image applications, the superiority of CS NEW is demonstrated strongly over all the other compared methods: For PSNR indicator, CS NEW is averagely 2.5 dB higher than CS LZWH , 3 dB higher than CS BLL and 3.5 dB higher than CS DCS ; while for MSSIM indicator,  CS NEW is averagely 0.04 higher than CS LZWH , 0.06 higher than CS BLL and CS DCS . Figures 13 -14 present the visual effects of two gastric images. The corresponding PSNR and MSSIM values can be found in Tables 4 and 5.
Remark 11: • These two gastric images chosen for showing contain special information. GI-3 and GI-7 show stromal tumor in gastric remnant and small leiomyoma of gastric fundus, respectively. As seen from the reconstructed images, CS NEW always achieves the best visual effect, especially in the edge of interested area, for example, the tumor. This can also be verified by the PSNR and MSSIM results as indicated in Tables 4 and 5. The visual effect of a medical image is pretty important for diagnosing certain diseases.
• CS can save the storage space and speed the transmission of the medical images through the network, especially for huge amount of data. The potential of CS being applied in medical image processing is promising.

VI. CONCLUSION
In this paper, we have investigated the problem of designing optimal CS system. The contribution is three-fold. The first one is to have presented a measurement-driven framework, where the sparse coefficient matrix is calculated adapted to the low-dimension measurements. Under this novel framework, a new cost function that optimizes the sensing matrix and dictionary simultaneously has been proposed as the second contribution. The last one is to have carried out an iterative algorithm to solve for optimum sensing matrix and dictionary. In addition, the superiority of the designed CS system composed of the optimized sensing matrix and dictionary has been demonstrated with a great many of experiments based on real images, especially medical images.