Dynamic Orthogonal Matching Pursuit for Sparse Data Reconstruction

The orthogonal matching pursuit (OMP) is one of the mainstream algorithms for sparse data reconstruction or approximation. It acts as a driving force for the development of several other greedy methods for sparse data reconstruction, and it also plays a vital role in the development of compressed sensing theory for sparse signal and image reconstruction. In this article, we propose the so-called dynamic orthogonal matching pursuit (DOMP) and enhanced dynamic orthogonal matching pursuit (EDOMP) algorithms which are more efficient than OMP for sparse data reconstruction from a numerical point of view. We carry out a rigorous analysis to establish the reconstruction error bound for DOMP under the restricted isometry property of the measurement matrix. The main result claims that the reconstruction error via DOMP can be controlled and measured in terms of the number of iterations, sparsity level of data, and the noise level of measurements. Moveover, the finite convergence of DOMP for a class of large-scale compressed sensing problems is also shown.


I. Introduction
D ATA reconstruction/recovery is a common request in such areas as signal denoising, imaging reconstruction, statistical model selection, pattern recognition, streaming data tracking and wireless channel estimation (see, e.g., [1]- [5]). For instance, recovering a sparse signal from limited measurements (observations) is a central task in compressedsensing-based signal processing [1], [2], [6]- [8]. Thus developing fast and robust algorithms for data reconstruction is significantly important in these scenarios. Without loss of generality, let x ∈ R n denote the data (e.g., signal or image) which admits some sparsity structure in the sense that x is either k-sparse (i.e., x has at most k ≪ n nonzero components) or k-compressible (i.e., x can be approximated by a k-sparse vector). In this paper, the unknown data x ∈ R n to reconstruct is called the target data. To reconstruct the data x ∈ R n , one may first collect a few measurements where m ≪ n, a i 's are given measurement vectors, and ν i 's denote the measurement errors. Let A be the m × n matrix whose row vectors consist of (a i ) T , i = 1, . . . , m, and let y = (y 1 , . . . , y m ) T and ν = (ν 1 , . . . , ν m ) T be the vectors of measurements and errors, respectively. Then the system (1) can be written as y := Ax + ν. Under the assumption that the target data x is k-sparse or k-compressible, the reconstruction of x ∈ R n from the measurements y can be formulated as the sparse optimization problem where z 0 denotes the number of nonzero entries of z. Essentially, the model (2) is a combinatorial optimization problem. The vectors satisfying the constraint of (2) are ksparse. In this paper, we assume that k ≪ n which is a typical assumption in the scenario of compressed-sensingbased signal and image reconstruction [1], [2], [8], [9]. The algorithms for (2) and similar problems have experienced a significant development over the past decades. The most widely used algorithms are based on convex/nonconvex optimization (e.g., [10]- [15]), hard thresholding and greedy methods (e.g., [1], [2], [8], [16]), and the recent technique based on the concept of optimal k-thresholding [17]- [19]. The orthogonal matching pursuit (OMP) is one of the popular greedy methods for the problem (2). Before recalling such a method, let us first introduce some notations used in the paper.

A. Notation
Throughout the paper, all vectors are column vectors unless otherwise specified. The transpose of the vector z and matrix A are denoted by z T and A T , respectively. The support of the vector z is denoted by supp(z) = {i : z i = 0}. We use H k (·) to denote the hard thresholding operator which retains the k largest entries in magnitude and zeros out the other entries of a vector, and we use L q (·) to denote the index set of the q largest entries in magnitude of a vector. When two entries of z are equal in absolute value, H k (z) and L q (z) might not be uniquely determined, in which case we select the entry with the smallest index. The complement set of S ⊆ {1, 2, . . . , n} with respect to {1, . . . , n} is denoted by S = {1, 2, . . . , n}\S, and the cardinality of S is denoted by |S|. Given x ∈ R n and S ⊆ {1, 2, . . . , n}, the vector x S ∈ R n denotes the vector obtained from x by retaining the entries of x supported on S and setting other entries of x to be zeros. x 2 = √ x T x and x ∞ = max 1≤i≤n |x i | are the ℓ 2 -norm and ℓ ∞ -norm, respectively.

B. Traditional OMP algorithm
Referred to as stagewise regression, the OMP algorithm first appeared in the area of statistics several decades ago. It was introduced to signal processing community around 1993 (see, e.g., [20]- [22]) and later to approximation theory (e.g., [23]- [25]). The OMP algorithm is stated as follows, which iteratively selects a vector basis that is best correlated to the residual at the current iterate.
OMP Algorithm. Perform the following steps until a certain stopping criterion is satisfied: S1 Initialization: x (0) = 0 and S (0) = ∅. S2 Given x (p) and S (p) , let S (p+1) = S (p) ∪ L 1 (A T (y − Ax (p) )), The advantage of OMP lies in its simple structure allowing a fast implementation at a low computational cost. OMP has stimulated the development of several other compressed sensing algorithms, including the subspace pursuit (SP) [26], compressive sampling matching pursuit (CoSaMP) [27], multipath matching pursuit [28], stagewise orthogonal matching pursuit (StOMP) [29], regularized orthogonal matching pursuit (ROMP) [30], [31], general orthogonal matching pursuit (gOMP) [32], weak orthogonal matching pursuit (WOMP) [33], stagewise weak orthogonal matching pursuit (SWOMP) [34] and constrained matching pursuit (CMP) [35]. In each iteration, the OMP algorithm only select one index corresponding to the largest entry in magnitude of A T (y − Az), the gradient of the error metric y − Az 2 2 /2. Thus the OMP algorithm does not utilize the gradient information efficiently when the gradient vector possesses several large entries in magnitude whose absolute values are close to each other. The gOMP algorithm is a direct generalization of OMP, allowing N indices to be selected simultaneously in every iteration, where 1 ≤ N < k is a prescribed integer number. However, such a selection rule remains very rigid, and it might significantly increase the chance for a wrong index being selected. For instance, if the gradient vector of error matric is τ -compressible, where τ < N, then the N indices selected by gOMP would include an index corresponding to a vanished or insignificant entry of the gradient vector. The SP and CoSaMP in their nature are not close family members of the OMP-type methods. At each step, they iteratively and simultaneously search k indices as the approximation to the support of signal. The StOMP, ROMP, WOMP and SWOMP adopt other different index selection rules to possibly enhance the performance of the standard OMP procedure. A common feature of these methods is that they allow the algorithm to select more than one indices in each iteration. WOMP is remarkably different from OMP in the sense that the index for the largest absolute entry of A T (y − Az) might be excluded during its iterations. StOMP and SWOMP might select too many indices in a single iteration which admits a high risk for wrong indices to be selected. Recently, the CMP method, as a modification of OMP, was proposed for solving more complicated recovery problems with certain constraints. In this paper, we propose further modifications of OMP, which dynamically select a few indices among the k largest components in magnitude of A T (y−Az). We also allow the algorithms to adjust the selected index sets at every iteration by performing a shrinkage step when the total number of selected indices goes beyond the sparsity level of the signal (see the description of the proposed algorithm in section II).

C. Existing results for OMP
The theoretical performance of OMP-type methods can be analyzed in terms of coherence tools (e.g., [1], [36]- [39]). However, the restricted isometry property (RIP) of the measurement matrix becomes relatively more popular than coherence for the analysis of various compressed sensing algorithms including OMP. The RIP is defined as follows.
Definition 1.1: [11] Let A be an m × n matrix. The restricted isometry constant (RIC) denoted by δ q is the smallest number δ ≥ 0 such that for any q-sparse vector z ∈ R n . A is said to satisfy the restricted isometry property (RIP) of order q if δ q < 1.
It was shown in [40] that OMP can identify the support of a k-sparse signal in exact k iterations if δ k+1 < 1/(3 √ k).
This condition was relaxed to δ k+1 < 1/(1 + √ 2k) in [41] and δ k+1 < 1/( √ k + 1) in [42] and [43]. This condition was further improved to δ k+1 < ( √ 4k + 1 − 1)/(2k) in [44], which is situated between 1/( √ k + 1) and 1/ √ k. Examples are given in [42], [43] to show that the OMP algorithm might fail to achieve the correct support identification in exact k iterations when δ k+1 = 1/ √ k. In noiseless situations (i.e., the measurements are accurate), it was shown that the condition δ k+1 < 1/ √ k + 1 is sharp to guarantee the success of OMP for k-sparse data reconstruction in exact k iterations [45]. In noisy settings (the measurements are inaccurate), the guaranteed performance of OMP has been shown in [46] and [47] under the condition δ k+1 < 1/( √ k + 1) and a certain assumption on the target signal. Another improvement was achieved recently in [48], and it was shown that the condition δ k+1 < 1/ √ k + 1 together with a condition on the signal is sufficient for OMP to correctly identify the support of a ksparse vector in exact k iterations. Moreover, the stability of OMP was also studied under the RIP in [49].
While CoSaMP and SP are also motivated by OMP, the two algorithms equipped with a hard thresholding operator are very different from OMP. The interested readers can find the latest development of CoSaMP and SP algorithms in such reference as [8], [50]- [52], and can also find the results for other modifications of OMP such as StOMP, MMP, ROMP, gOMP and SWOMP. Here we only mention a few results for gOMP since it includes OMP as a special case and it is the closest kin to OMP. The gOMP algorithm was introduced first in [32] (see also [53], [54]). It was shown in [32] is sufficient for the gOMP algorithm to reconstruct the k-sparse signal in k iterations. This condition was relaxed in [53] and [55] and was further relaxed to δ N k < √ N /( √ k + 1.27 √ N ) in [56]. The reconstruction error via gOMP in noisy scenarios has been investigated in [48], [57], [58].

D. Contribution of the paper
The main purpose of this paper is to propose an enhanced modification of OMP for sparse data reconstruction called dynamic orthogonal matching pursuit (DOMP) and its enhanced counterpart called enhanced dynamic orthogonal matching pursuit (EDOMP). The proposed algorithms dynamically select vector bases to reconstruct the target data according to the major gradient information of the error metric at every iteration. As a result, the gradient information is sufficiently and efficiently exploited to identify the support of the target data. The reconstruction error bound via DOMP is established under the RIP assumption. This bound also implies the finite convergence of DOMP in largescale compressed sensing scenarios. Moreover, the numerical performance of the proposed algorithms is evaluated through random problem instances. The results indicate that due to the efficient usage of significant gradient information, the proposed algorithms are fast and robust for sparse data reconstruction and are very comparable to several state-of-art algorithms in this field.
The paper is organized as follows. The proposed algorithms are described in Section II. An approximation counterpart of the projection problem in the proposed algorithm is give in Section III, which is used to facilitate the theoretical analysis of the algorithm. Section IV is devoted to such a theoretical analysis which leads to the reconstruction error bound for the DOMP algorithm. The finite convergence of DOMP for large-scale compressed sensing problems is also discussed is Section IV. Numerical results are demonstrated in Section V.

II. Dynamic orthogonal matching pursuit
The OMP algorithm only select one index corresponding to the largest magnitude of the gradient of error metric at every iteration. To increase the chance for the correct support of the target data being identified, it would be helpful to select indices at every iteration in a dynamic and adaptive manner. It is worth pointing out that this idea has been exploited in StOMP and SWOMP, but the total number of indices being selected at every iteration of these methods might increase too fast and exceed the sparsity level of the signal. This renders the algorithm either requires extra effort to reconstruct the signal or fails to reconstruct the signal. Thus we propose the following dynamic orthogonal matching pursuit (DOMP) method, which may efficiently utilize the underlying gradient information and control the number of selected indices at each step so that it does not exceed the sparsity level of signal. When the cardinality of selected indices does exceed the sparsity level, we may further enhance the algorithm by including a shrinkage step to bring the cardinality down to the sparsity level.

VOLUME , 3
Since x (p) is an optimal solution to the convex optimization problem min{ y − Az 2 : supp(z) ⊆ S (p) }, by the first-order optimality condition, the iterates generated by DOMP satisfy that (r (p) ) S (p) = 0 for any p ≥ 1. ( This property will be frequently used in later analysis of the algorithm. The DOMP algorithm dynamically select the vector basis according to the criterion |(r (p) ) i | ≥ γ r (p) ∞ for i ∈ L k (r (p) ). This criterion guarantees that only the indices corresponding to the significant entries of r (p) can be used to identify the support of target data. In general, Θ (p) contains more than one elements corresponding to the indices of a few largest magnitudes of r (p) . Clearly, even when γ = 1, the DOMP algorithm may not necessarily be the same as OMP since the set Θ (p) may still contain more than one indices if r (p) possesses multiple entries whose absolute values are equal to r (p) ∞ . The cardinality of Θ (p) can be any value between 1 and k, depending on the prescribed value of γ and the number of significant entries of r (p) . Such a dynamic selection of indices efficiently utilizes the gradient information of the error metric, and thus it might speed up the algorithm by increasing the chance for the correct support of target data being identified at every iteration. After a few iterations are performed, the cardinality of S (p+1) in DOMP might be larger than k. In this case, it is useful to perform certain shrinkage on the vector generated by (DOMP2) in order to maintain the k-sparsity of iterates so that they are feasible to the problem (2). This consideration leads to the following enhanced dynamic orthogonal matching pursuit (EDOMP), which is different from existing modifications of OMP.
EDOMP Algorithm. Perform the following steps until a certain stopping criterion is satisfied: Numerical results in Section V indicate that the capability of DOMP and EDOMP in sparse data reconstruction is generally stronger than OMP (see section V for details). In the next section, we perform a theoretical analysis to establish a reconstruction error bound for the DOMP algorithm. Such an error bound measures the distance between the target data and the iterates generated by the algorithm. To facilitate this analysis, let us first define an auxiliary optimization model.

III. Approximation counterpart
Denote by S = L k (x), and let x (p) , S (p) , r (p) , Θ (p) and S (p+1) be defined in DOMP. Define We now introduce an approximation counterpart (AC) to the projection problem (DOMP2), which generates the index set S (p+1) and the vector x (p+1) .
Approximation counterpart of (DOMP2): Let x (p) , S (p) , Θ (p) and S (p+1) be given in DOMP, and let Λ (p) be defined by (4). Let i and σ is a given large positive parameter.
When r (p) = 0, x (p) is already a global solution to the problem (2). Thus, to analyze the DOMP algorithm, we assume without loss of generality that r (p) = 0, in which case we can see from the definition of Θ (p) that (r (p) ) i = 0 for any i ∈ Θ (p) . This together with (3) implies that Since S (p+1) = S (p) ∪ Θ (p) , we also see from (4) that Thus The solution of the optimization problem in (AC2) depends on σ, and thus x (p+1) should be written as x (p+1) (σ). For notational convenience, however, we simply use x (p+1) to denote the solution of (AC2). The above-described approximation counterpart is only used as an auxiliary tool to support the analysis of DOMP in the next section. From (AC2), one can see that for a sufficiently large σ, the components of x (p+1) supported on Λ (p) would vanish completely or be sufficiently small. Thus x (p+1) can be made arbitrarily close to x (p+1) (the iterate generated by DOMP) provided that σ is sufficiently large. The following lemma makes this rigorous.

Lemma 3.1:
At the iterate x (p) , let x (p+1) and S (p+1) be generated by DOMP, and let S (p+1) and x (p+1) be defined in (AC1) and (AC2) accordingly. Then one has Proof. Suppose that x (p) , Θ (p) and x (p+1) are generated by DOMP. Let Λ (p) be defined by (4) and x (p+1) be a solution to the minimization problem (AC2). Note that x (p+1) is a feasible solution to the minimization problem (AC2), by the optimality of x (p+1) , we have where the equality follows from (6) which implies that By (AC1) and (6), we see that Thus by the triangular inequality, we have Merging (12) and (13) leads to By Taylor's expansion, we also have that where the last equality follows from (3) and the fact that Combining (14) and (15) yields (16) By (AC1) and (6) , where the last equality follows from the fact that ( x (p+1) ) S (p+1) = 0. This together with (11), (16) and which is the inequality (8).
A basic property of the optimization problem (AC2) is given in the next lemma which follows directly from the optimality condition of convex optimization. The proof of the lemma is omitted.

IV. Reconstruction error bounds
In this section, we establish the reconstruction error bound for DOMP. The efficiency of the algorithm for sparse data reconstruction can be interpreted via such an error bound which measures the distance of the generated solution and target data. The error bound may also imply the stability and convergence of the algorithm under certain conditions. Let us first recall some properties of δ t that will be used to establish the desired error bound.

VOLUME , 5
Lemma 4.1: [8] Let u ∈ R n and Λ ⊆ {1, 2, . . . , n}. Then the following three statements are true: It should be pointed out that δ t is not the only tool that can be used to analyze algorithms. Other tools such as the spark [1], mutual coherence [1], null space property [8] and range space property of A T [15], [59] might be also used to achieve this goal. However, we only use δ t to establish the main result in this paper.

A. Main result for DOMP
The purpose of this section is to answer the question: How does the reconstruction error decay in the course of iterations? In other words, we would like to measure the quality of the iterate, generated by the algorithm, as an approximation to the target data. The result for DOMP is summarized in Theorem 4.2 below, whose proof is postponed to the end of the next subsection after we establish some helpful technical results.

Theorem 4.2:
Suppose that y := Ax + ν are the measurements of the target data x ∈ R n where ν are the measurement errors. Let γ ∈ (0, 1] be a given constant. If the RIC of the measurement matrix A satisfies where c > 2 is an integer number, then at the pth iteration, provided that |S p+1 | ≤ (c− 2)k, the iterate x (p+1) generated by DOMP approximates x S with error where ν ′ = Ax S + ν and S = L k (x). The constants β and ̺ are given respectively as and τ is a constant dependent only of δ ck , k and γ.

Remark 4.3:
If x is k-compressible with very small tail x S 2 and the measurements are accurate enough, then ν ′ 2 would be very small. In particular, if x is k-sparse and the measurements are accurate, then ν ′ 2 = 0. From (19), the reconstruction quality in these cases would completely depend on the iterations being executed. We also note that |S (p+1) | is the total number of indices being selected after p iterations. At the pth iteration, the indices in Θ (p) are added to S (p) to form the set S (p+1) . By (5), . This means when c ≫ 2 the error bound (19) is satisfied for a large number p = p * and thus x (p) − x S 2 ≈ 0. Thus x (p) is a quality reconstruction of x S after the algorithm is performed enough iterations.
The lemma below follows immediately from Theorem 4.2.

Corollary 4.4:
Let y := Ax + ν be the measurements of the target data x ∈ R n where ν are the measurement errors. Let γ ∈ (0, 1] be a given constant. If the matrix A satisfies the RIP of order t * > 2k where then at the pth iteration, provided |S p+1 | ≤ t * − 2k, the iterate x (p+1) generated by DOMP approximates x S with error where ν ′ = Ax S + ν and S = L k (x). The constantsβ and ̺ are given respectively aŝ andτ is a constant dependent only of δ t * , k and γ.
The results above indicate that the quality of the iterate as an approximation to x S can be measured provided that the total number of selected indices is lower than t * − 2k. The RIC bounds (18) and (20) depends on the parameter γ. In theory, a small parameter γ can alleviate the requirement on measurement matrices since the smaller the parameter γ, the larger the right-hand sides of (18) and (20).

B. Proof of the main result for DOMP
We first establish several useful technical results. The first one below is partially shown in [52].

Lemma 4.5:
6 VOLUME , Proof. The first inequality in (22) was shown in [52] (see Lemma 4.1 therein). Thus we only need to show the second inequality in (22), which follows from the first inequality and the fact that (a + c) 2 + b 2 ≤ √ a 2 + b 2 + c for any a, b, c ≥ 0.
Some fundamental properties of DOMP and the optimization problem (AC2) are given in the next two lemmas.

Proof. By Lemma 3.2, we have that
the equality above can be written as By the definition of Λ (p) in (4), we see that S ∩ Λ (p) = ∅.

Proof.
Let x (p) be the current iterate and r (p) , S (p) , Θ (p) , S (p+1) be defined in DOMP. Suppose that Θ (p) and S are disjoint, i.e., We denote by L := L k (r (p) ) for notational convenience. Note that | L| = k = |S|. Thus This means the number of elements in S\ L is equal to the number of elements in L\S. By the definition of L, the entries of r (p) = A T (y − Ax (p) ) supported on L\S are among the k largest magnitudes of r (p) . This together with (29) implies that (r (p) ) S\ L 2 ≤ (r (p) ) L\S 2 .
We now estimate the last term (r (p) ) S∩ L 2 2 in (36). Note that the elements in Θ (p) are added to S (p) to form the next set S (p+1) . From the definition of Θ (p) , it contains the indices in L = L k (r (p) ) such that |(r (p) ) i | ≥ γ r (p) ∞ . Clearly, we have either |Θ (p) | < k or |Θ (p) | = k. We now consider the two cases separately.
Case 2: |Θ (p) | < k. In this case, by the structure of DOMP, not all indices in L are selected into Θ (p) . Without loss of generality, we assume that S ∩ L = ∅ since otherwise (r (p) ) S∩ L 2 2 = 0. Under the condition (28), we see that (S ∩ L) ∩ Θ (p) = (S ∩ Θ (p) ) ∩ L = ∅. So the indices in S ∩ L are not in Θ (p) , and hence |(r (p) ) i | < γ r (p) ∞ for all i ∈ S ∩ L, which implies that Again, as Θ (p) ⊆ L and the elements in S ∩ L are not in Θ (p) , we see that Θ (p) ⊆ L\( L ∩ S) = L\S. Since the index of the largest magnitude of r (p) is in Θ (p) , one has r (p) ∞ = (r (p) ) Θ p ∞ ≤ (r (p) ) Θ p 2 ≤ (r (p) ) L\S 2 .
We have all ingredients needed to show the main result stated in Theorem 4.2.
The proof of Theorem 4.2: Let x (p) be the current iterate and S (p) , Θ (p) , S (p+1) and x (p+1) are defined by DOMP. Without loss of generality, we assume r (p) = 0. Let S (p+1) be defined as (AC1) and x (p+1) be a solution to the auxiliary minimization problem in (AC2). Still, S = L k (x) is the index set for the k largest magnitudes of the target vector x. At the pth iteration, there are only two cases: either Case 1: S ∩ Θ (p) = ∅. In this case, at least one of the indices in S is added to S (p+1) = S (p) ∪ Θ p . By (5), Θ (p) ∩ S (p) = ∅. Thus set S (p+1) contains more elements of S than S (p) in this case. Note that supp(x (p) ) ⊆ S (p) ⊆ S (p+1) . Thus x (p) is a feasible vector to the optimization problem (DOMP2), to which x (p+1) is a minimizer. Thus by optimality, one has y − Ax (p+1) 2 ≤ y − Ax (p) 2 . Note that y = Ax S + ν ′ where ν ′ = Ax S + ν. By the triangular inequality, it follows from the inequality above that By noting that |supp(x S − x (p+1) )| ≤ k + |S (p+1) | and |supp(x S − x (p) )| ≤ k + |S (p) | and by the definition of RIC of A, the inequality above implies that Since δ k+|S (p) | ≤ δ k+|S (p+1) | , one has The first relation in (7) implies that |S (p+1) | = p i=1 |Θ (i) |, which means {|S (p) |} is nondecreasing sequence. Let p * denote the largest integer number p such that |S (p+1) | ≤ (c − 2)k. Then for any p ≤ p * , the coefficients on the right-hand side of (45) are bounded by where ̺ := 1+δ ck 1−δ ck and c 1 : Thus it follows from (45) and (46) that (47) for any p ≤ p * . We now analyze the second case.

VOLUME , 9
Then using the triangular inequality and (48) and (50) yields where the constants β and c 2 are given by It is easy to verify that β < 1 under the assumption of the theorem. In fact, β = φδ ck √ Suppose that the algorithm DOMP has performed p iterations where p ≤ p * . Without loss of generality, suppose that within these p iterations the aforementioned Case 1 appears τ * times, and thus Case 2 appears p−τ * times. Clearly, Case 1 appears at most k times since |S| = k, so τ * ≤ k. The error bound for this case is given by (47), and the error bound for Case 2 is given by (51). Therefore, it is not difficult to verify that after p iterations, one has where all constants (β, ̺, c 1 , c 2 ) are determined only by δ ck and/or k and γ. To see (53), without loss of generality, we assume that the first τ * iterations correspond to Case 1, and the remaining p − τ * iterations correspond to Case 2. Then using the fact Also, by (47) and the fact Merging (54) and (55) yields which is exactly the error bound given in (53). Note that the coefficients (̺/β) τ * ≤ (̺/β) k and c 1 β p−τ * ̺ τ * −1 since β < 1, ̺ > 1 and τ * ≤ k. Thus we deduce that where τ = c 1 (56) is a fixed quantity. In (56), only the last term ǫ/(1 − β) depends on σ, and we see from the definition of ǫ that ǫ → 0 as σ → ∞. Therefore, the desired estimation (19) is obtained.
The main result established in this section provides an estimation for the distance between the significant components of the target data and the solution generated by the DOMP algorithm. The result claims that the reconstruction quality becomes better and better as the iteration proceeds provided that |S (p) | ≤ (c − 2)k, where k is the sparsity level of the target data and ck is the order of the RIC of the measurement matrix. In this process, the algorithm gradually identifies the correct support of the sparse data and the iterate x (p) gets closer and closer to the significant components of the target data. When m, n are particularly large with n ≫ m and the highest order RIC of A is much larger than 2k, the proposed algorithms can perform enough iterations to guarantee that the reconstruction error becomes so small that the finite convergence of the algorithms can be achieved, as discussed in the next subsection.

C. Finite convergence
From Definition 1.1, we see that δ s ≤ δ t for any positive integer number s ≤ t. Thus if A has the RIP of order t, i.e., δ t < 1, then it has the RIP of any order s ≤ t. In this section, we define the highest order of the RIP of A as The basic assumption in compressed sensing is that the number of measurements m is much lower than the signal length n, i.e., m ≪ n. It is also well known that to ensure a k-sparse signal being exactly reconstructed, the signal should be sparse enough so that it is the unique sparsest solution to the underdetermined linear system y = Ax (see [1]). We now consider the reconstruction problems satisfying the following assumption.
Assumption 4.8: k ≪ m ≪ n and 2k ≪ t max , where t max is the highest order of RIP of the measurement matrix A.
The problem in such settings may be referred to as a large-scale compressed sensing problem for which the DOMP algorithm can maintain the error bound (19) with enough iterations so that the significant information of the target data can be guaranteed to reconstruct. Let c be the largest integer number such that 2k ≪ ck ≤ t max , i.e., c = ⌊ t max k ⌋. Recall that p * (defined in the proof of Theorem 4.2) denotes the largest number of iterations such that |S (p * +1) | ≤ (c − 2)k, i.e., Thus p * would be a large number when c ≫ 2 which is guaranteed by t max ≫ 2k.
We now point out that for large-scale compressed sensing problems, the DOMP algorithm can exactly reconstruct the support of x S in a finite number of iterations as long as Ax S + ν 2 is small enough. We say that the vector x is nondegenerated k-compressible if it has at least k nonzero components and x S 2 is sufficiently small, where S = L k (x). For such a vector, x i = 0 for all i ∈ S. Clearly, any k-sparse vector with k nonzeros must be nondegenerated k-compressible.

Theorem 4.9:
Assume that the data x ∈ R n is nondegenerated kcompressible. Let y := Ax + ν be the slightly inaccurate measurements of x in the sense that the errors ν are sufficiently small. Let γ ∈ (0, 1] be a given constant in DOMP. Suppose that Assumption 4.8 holds and that A has the RIP of order ck with t max ≥ ck ≫ 2k and δ ck satisfies (19). Then when p * = max{p : |S (p+1) | ≤ (c − 2)k} is large enough, there must exist p such that for any p with p ≤ p ≤ p * , the iterate x (p) generated by DOMP satisfies that L k (x (p) ) = L k (x).
Proof. (i) Under (19), Theorem 4.2 claims that where S = L k (x) and the constants (β, ̺, τ ) are given in Theorem 4.2. By assumption, the right-hand side of the above bound is sufficiently small when p is large enough.
Note that It follows from (57) that both terms (x (p) ) S − x S 2 and (x (p) ) S 2 are sufficiently small when p is large enough, which is ensured under the assumption of the theorem. Since x is 'nondegenerated k-compressible', this implies that the largest k magnitudes of x (p) are concentrated on S and sufficiently close to x S . This means L k (x (p) ) = S. Thus after performing enough iterations, the support of x S is correctly reconstructed by DOMP.
The error bound established in this paper measures how close the iterates generated by the algorithm to the target data in terms of δ ck , the number of iterations p, and the measurement error ν 2 . The bound (19) indicates that as p increases, the reconstruction error decays provided that p ≤ p * . In the setting of large-scale compressed sensing, Theorem 4.9 further claims that the exact reconstruction of the sparse data can be also possible under the assumption of the theorem.

Remark 4.10:
The EDOMP algorithm shares the same steps of DOMP when |S (p) | ≤ k, and performs hard thresholding to restore the k-sparsity of iterates when |S (p) | > k. The iterate x (p+1) , generated by EDOMP when |S (p) | > k, is very different from the one generated by DOMP. Clearly, the relations S (p) ⊆ S (p+1) and |S (p) | = p i=1 |Θ (i) | in DOMP are lost in EDOMP. Thus the analysis of DOMP presented in this section would not be generalized straightforwardly to EDOMP. While we had attempted to show the reconstruction error bound for EDOMP in the earlier draft of this report, the argument therein for EDOMP remains incomplete. It is still not completely clear at the moment about the error bound for EDOMP. This is an interesting and worthwhile future work.

V. Numerical experiments
The performance of the proposed algorithms is clearly related to three main factors: (a) the choice of parameter γ, (b) the relative sparsity level k/m or k/n, and (c) the total number of iterations executed. We first carry out experiments to demonstrate how these factors might affect the performance of the proposed algorithms and thus to suggest the suitable choice of γ as well as the number of iterations to perform. All sparse vectors and measurement matrices in experiments are randomly generated. The entries of sparse vectors and matrices are assumed to be independent and identically distributed and follow the standard normal distribution, and the positions of nonzero entries of sparse vectors are uniformly distributed. All iterative algorithms in our experiments use x (0) = 0 as the initial point. Unless otherwise stated, we adopt the following reconstruction criterion: where x (p) is the iterate produced by the algorithm and x is the target data to reconstruct. If x (p) satisfies (58), we say that the algorithm succeeds in reconstructing x.

A. Effect of the parameter γ
To illustrate how the value of γ affects the reconstruction ability of the proposed algorithms, we compare the success rates of the proposed algorithms in k-sparse data reconstruction using the following 20 different values: γ = t/20, where t = 1, 2, . . . , 20. The size of matrix is 500 × 2000, and the sparsity levels k = 120, 140, 150, 160, 170 and 180 of the sparse vectors x ∈ R 2000 are examined in this experiment. For every given sparsity level k, the success rates for ksparse data reconstruction via DOMP and EDOMP with each given γ are obtained respectively by attempting 500 random pairs (A, x) with accurate measurements y := Ax.
The algorithms are performed a total of k iterations, where k is the sparsity level of the vector to reconstruct. The results are given in Fig. 1. Intuitively, a small value of γ allows more indices to be added to Θ (p) at every iteration, but it may also increase the chance for an incorrect index being added to Θ p , and thus a small γ may lower the success rates of DOMP for data reconstruction as shown in Fig. 1(a). However, as shown in Fig. 1 (b), it seems that the performance of EDOMP is less sensitive to the change of γ, compared with DOMP. On the other side, the results in Fig. 1 indicate that the success rates of the proposed algorithms with γ = 1 are not the highest ones. This does indicate that the algorithm with γ = 1 (including OMP) are not working to their potentials to identify the support of the target data. One can also see from Fig. 1(a) that the sparser the target data, the wider range of values for γ can be used in DOMP. The numerical experiments indicate that the values of γ situated between 0.7 and 1 are generally good for DOMP, while EDOMP admits more freedom in the choice of γ. As a result, we may use γ = 0.9 as a default choice for two algorithms in the remaining experiments.

B. Effect of the number of iterations
The main feature of OMP is that the support of the target data is gradually identified as the iteration proceeds. This means that the support of the target data (which is usually unknown beforehand) cannot be fully obtained if the number of iterations performed is too low. Similarly, the proposed algorithms need to perform enough iterations before being terminated. However, after the necessary iterations have been performed, can any further iterations continue to remarkably improve the performance of the proposed algorithms? To find a possible answer, it is useful to carry out an experiment to test the performance of algorithms against the number of iterations executed. According to our theory, under the  RIP assumption, the DOMP and EDOMP algorithms keep finding the correct support of the target data provided that the performed iterations are lower than p * , the largest integer number p such that |S (p+1) | ≤ (c − 2)k where c > 2 is the constant associated with the order of RIP. But when iterations go beyond p * , there would be no guarantee for the algorithms to continue to find the correct support of x S . With the same random matrices and sparse vectors used in first experiment, we compare the performance of DOMP and EDOMP when they are allowed to perform different numbers of iterations: p = 1+3j where j = 0, 1, . . . , 59. For every given sparsity level k = 120, 140, 150, 160, 170, 180 and the prescribed number for iterations, the success rates for data reconstruction via the two algorithms with γ = 0.9 are obtained by 500 random tries of (A, x) with accurate measurements. The results are given in Fig. 2. As expected, the algorithms need to perform necessary iterations in order to possibly reconstruct the data. The figure indicates that with the increment of iterations, the success frequencies of the two algorithms for data reconstruction increase to some highest point, after that the performance of algorithms would generally not be improved no matter how many further iterations are performed. From Fig. 2, it can be easily seen that the highest point of the reconstruction curve appears after a certain number of iterations that is much lower than the sparsity level k of the vector. This means unlike the classic OMP, the DOMP and EDOMP algorithms do not need to perform at least k iterations in order to reconstruct a k-sparse signal. Thus we carry out further simulations to examine the average number of iterations as well as the average runtime needed for the proposed algorithms to reconstruct sparse data.

C. Average number of iterations and average runtime
Experiments were performed to demonstrate the average number of iterations and runtime needed for the proposed algorithms to achieve the reconstruction criterion (58). The random matrices in 10 different sizes ranging from (m, n) = (200, 1000) to (2000,10000) are used in this experiments. Specifically, n = 5m and m = 200j, where j = 1, . . . , 10. For every m × n matrix, the sparsity level of x ∈ R n is set as k = 0.3m. The OMP algorithm always performs at least k iterations in order to possibly reconstruct the ksparse vector. Thus the total number of iterations required by OMP to reconstruct the k-sparse vectors (k = 0.3m in this experiment) is increasing linearly with respect to the size of the matrices, as shown in Fig. 3(a). However, Fig. 3(a) indicates that the average numbers of iterations required by DOMP and EDOMP to meet the reconstruction criterion (58) are slowly increasing to the size of problems, remarkably lower than that of CoSaMP and OMP, and very comparable to gOMP (N = 5), but slightly more than that of SP and StOMP (with threshold parameter t s = 3). The average runtime for these algorithms to achieve the same criterion (58) is summarized in Fig. 3(b). It can be seen that the proposed algorithms require much less computational time than OMP and CoSaMP to achieve the same reconstruction criterion. The proposed algorithms take very similar amount of time as SP, StOMP and gOMP. As the problem size goes up, the runtime of the proposed algorithms increases very slowly in a linear-like manner, while the runtime for OMP and CoSaMP increases dramatically with respect to the size of problems. The simulation does indicate that the total iterations as well as the runtime required by DOMP and EDOMP to reconstruct the random sparse data can be remarkably lower than that of OMP and CoSaMP.

D. Comparison to some existing algorithms
Finally, we compare the reconstruction success rates of the proposed algorithms and a few existing methods including OMP, SP CoSaMP, gOMP, StOMP and ℓ 1 -minimization. The parameter γ = 0.9 is still set in DOMP and EDOMP which are performed the same number of iterations as OMP and gOMP, which is set to be the sparsity level k of the target signal. In our experiments, N = 5 is set in gOMP, and the threshold parameter t s = 3 is set in StOMP. CoSaMP and SP are allowed to perform a total of 500 iterations for every problem instance in this experiment. For accurate measurements, all algorithms use the stopping criterion (58). For noisy measurements, however, we adopt the following criterion: In this comparison, the size of matrices is 500×2000, and the sparsity level k of the random data x ∈ R 2000 is ranged from 1 to 300 with stepsize 3, i.e., k = 1, 4, 7, . . . , 299. For each given sparsity level, 200 random pairs of (A, x) were realized and used to estimate the success rates of the algorithms for data reconstruction. All matrices A are normalized. The accurate measurements are taken as y := Ax and inaccurate measurements are taken as y := Ax + 0.001h, where h is a normalized Gaussian random vector. When an algorithm terminates, if x (p) satisfies the reconstruction criterion, a 'success' is counted for the algorithm. The success rates  of the proposed algorithms using accurate measurements are shown in Fig. 4(a), and the results with inaccurate measurements are given in Fig. 4(b). One can see that in noiseless setting, DOMP and EDOMP outperform OMP, CoSaMP, gOMP and ℓ 1 -minimization, and they perform very comparably to SP and StOMP. The similar results for EDOMP can be observed when the measurements are slightly inaccurate, while DOMP and gOMP are clearly subject to the effect of noises. In summary, EDOMP is very competitive to the existing mainstream algorithms for sparse data reconstruction. It is fast, stable, robust and very easy to implement like CoSaMP and SP.

VI. Conclusions and future work
The reconstruction error bound for the proposed DOMP algorithm is established in this paper under the RIP assumption. Finite convergence of the algorithm is also discussed in large-scale compressed sensing scenarios. Simulations indicate that the proposed methods (especially, EDOMP) are more efficient than OMP in sense that they may take much less average computational times and iterations to reconstruct the sparse data, and they can compete against several state-of-art methods in this field. Distinguished from the existing OMP-type methods, the main feature of the proposed algorithms is that they can dynamically and efficiently exploit the gradient information of the error metric at every iteration to recognize the support of the target data.