A Modified Spectral Gradient Projection Method for Solving Non-linear Monotone Equations with Convex Constraints and Its Application

In this paper, we propose a derivative free algorithm for solving non-linear monotone equations with convex constraints. The proposed algorithm combines the method of spectral gradient and the projection method. We also modify the backtracking line search technique. The global convergence of the proposed method is guaranteed, under the mild conditions. Further, the numerical experiments show that the large-scale non-linear equations with convex constraints can be effectively solved with our method. The <inline-formula> <tex-math notation="LaTeX">$L_{1}$ </tex-math></inline-formula>-norm regularized problems in signal reconstruction are studied by using our method.


I. INTRODUCTION
In this paper, we focus on finding the solution of the following non-linear systematic equations where F: R n → R n is a given functions and is a non-empty convex set. If = R n , Eq.(1) is an unconstrained problem, which could be solved by many methods, such as Newton method, Quasi-Newton method, Conjugate Gradient method, Fixed Point method and their variants. The details could refer to [1]- [5].
If = R n and is a non-empty closed set, the problem is a constrained problem. Great efforts have been made to find the solutions, including the projection Newton method [6], Trust Region method [7], [8], Lagrangian method [9]. These methods perform well both in numerical experiments and practical applications, based on the information of the first and the second derivative of F. However, many problems in application are not differentiable and the above mentioned methods are not available directly.
In recent years, many efforts have been put in finding the solution of non-smooth or semi-smooth system of The associate editor coordinating the review of this manuscript and approving it for publication was Yanbo Chen .
(1), in which F is non-differentiable, such as [10]- [13]. By approximating the secant condition of Quasi-Newton methods, Barzilai and Borwein proposed the BB step size in [14], which has been widely used in the unconstrained problem. There are many other modified BB step size methods, such as [15]- [18]. In [15], Li and Wan proposed an adaptive and composite BB step size by integrating the advantages of such existing step sizes and in [16], Huang et al. extended the nonmonotone line search method and developed a class of spectral gradient algorithms by combining with the spectral step-size. The Spectral Gradient (SG) method is one of extensions of the BB method and is not a descent method such that its convergent analysis is a non-trivial task. To overcome this problem, Huang and Wan proposed a spectral residual method in [19] for non-smooth and non-linear equations, and Svaiter and Solodov [6] proposed a hybrid projection method for solving the unconstrained monotone problems. By constructing an appropriate separation hyperplane in the Newton direction, under non regular assumptions, Svaiter and Solodov's method is sufficiently descent and globally convergent, and it is suitable for solving smooth equations as it depends on the gradient information of the equations. Consequently, Zhang and Zhou [20] combined the SG method and projection method to propose a new framework for solving non-linear unconstrained monotone equations with derivative free feature. Illustrated by the work in [21], VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Li and Guo [22] proposed another derivative free method to solve the non-monotone and non-linear problem with no constraints. Due to the non-expansion of the projection operator, Wang et al. [23] proposed a method to solve the convex constrained problem of (1) by projecting the predictor-corrector point in the constraint sets. And then, Yu et al. [24] extended the works of [20] and [23] to solve monotone equations with convex constraints. By modifying the search direction to be sufficiently descent, their method performs more effectively in numerical experiments. Subsequently, Awwal et al. [25] proposed a modified Hestenes-Stiefel (HS) spectral conjugate gradient (CG) method to solve monotone non-linear equations with convex constraints efficiently. Xiao and Zhu [26] combined the CG-Descent method [27] and projection method to improve the algorithm in the numerical experiments and applied their work to solve L 1 -norm problem in image deconvolution problems. More recently, many related works have been done to solve the constraint or unconstrained problem of (1) and could be referred to [28]- [35] for details. In addition, it is noted that the emerging neural networks [36]- [42] provide different approaches for solving this kind of problems, which deserves the deep exploration.
In this paper, inspired by the works of [24] and [25], we further investigate the projection method with a modified SG parameter and propose a derivative free algorithm for solving non-linear monotone equations with convex constraints. The aim of this work is to improve the efficiency of the projection method and expand the scope of its applications. The proposed derivative free method is suitable to solve large-scale non-linear equations as no sub-problem needs to be solved, which is different from the classical projection method. In our proposed algorithm, the line search technique is also modified to obtain the descent property. The global convergence is guaranteed with a new line search technique under the mild conditions. Furthermore, we present some preliminary numerical experiments to illustrate the efficiency of the proposed method. In addition, we apply our proposed method to solve the L 1 -norm regularized problems in signal reconstruction.
This paper is organized as follows. In Section 2, we introduce related preliminary knowledge and propose our method. In Section 3, the global convergence result of the proposed method is established. In Section 4, numerical experiments are implemented by comparing our proposed method with the other two methods, and then application of the method in signal reconstruction problem is constructed. Conclusion is given in Section 5. Throughout this paper, || · || denotes the Euclidean norm of a vector and ·, · denotes the vector product. Moreover, for convenience, F k is denoted as an abbreviation of F(x k ).

II. ALGORITHM
If is a closed convex subset of R n , the projection operator P(·) is a map, which is for any x ∈ R n , We firstly recall some basic properties of the projection operator in [43] and [44].
Proposition 1: Let be a closed convex subset of R n . For any x ∈ R n and y ∈ , the following inequality holds (2) Proposition 2: Let ⊂ R n be a closed convex subset. Then it holds that for any x, y ∈ R n . From (2) and (3), it is easy to find which indicates that the projection operator is a nonexpansive operator. In the spectral gradient method, the iterative scheme for solving (1) has a general form as where t k is the step-size and d k is the search direction obtained by some formulation term.
In [24], the search direction d k is given by whereθ k is the direction scalar parameter, given bỹ In [25], a bounded parameter of the search direction is given by Thus the value of θ k depends on the angle between F k , d k−1 , and we have 0 ≤ θ k ≤ 1.
Motivated by [24] and [25], we propose a new method in which the search direction is given as, where where ν and µ are constants. Here, z k is a point depending on x k and d k . Notice that if we set µ = 0 and replace z k−1 by x k , the parameter θ k will beθ k in [24]. Due to the modified direction we denote our method as Algorithm MSGP with the following steps.
Step 2 (Stop criterion): If ||F k || ≤ , then stop the iteration. Otherwise, go to the next step.
Step 4 (Line search): Compute the point Step 5 (Projection): Compute the predictor-corrector point bỹ And then compute next iterative point by the following projection operator, Step 6 (Update x k ): Set k := k + 1, go to step 2.

III. CONVERGENCE ANALYSIS
In this section, the global convergence of the proposed method is analyzed. Assume S is a solution set of (1) and F k = 0, otherwise the solution has been obtained. First of all, some assumptions on the objective function F will be given. Assumption 3: The function F : R n −→ R n is Lipschitz continuous, i.e., there exists a constant L >0 such that Assumption 4: F satisfies monotonicity, i.e, for ∀x, y ∈ R n , it holds that Lemma 5: Suppose the Assumption 3 holds. The sequence {d k } is generated by Algorithm MSGP. We denote 1 = 1−µ ν+L and 2 = 1 ν . Then the following inequalities can be obtained as The proof of Lemma 5 is in Section VI. Lemma 6: Suppose the sequence {d k } is generated by (6). In each iteration, the search direction d k always is the decent direction, i.e., there exists a positive constant M = min{1, 1 } such that Proof: If k = 0, by the definition of d k , we can easily find that where the first part of the inequality follows from (25). The proof is complete. From Lemma 6, we have which implies that the point z k , obtained in Step 4 of Algorithm MSGP, exists in each step.
The proof of Theorem 8 is given in Section VI.

IV. NUMERICAL EXPERIMENTS AND APPLICATION
In this section, in our numerical experiments, we consider 8 problems to implement our proposed method and make comparison with MHSP [25] method and PDY [45] method. And then we apply the proposed method in the signal reconstruction problem as application.

A. NUMERICAL EXPERIMENTS
In this subsection, to apply our proposed method in the numerical experiments, we consider the following 8 examples which are the samples in the related works of several references. Meanwhile, we make a numerical comparison of our method with MHSP [25] method and PDY [45] method. Both of the two methods have good performance in numerical experiments, and specially PDY method is well known for its stable performance on many problems. In the numerical results, the number of iteration is denoted by NT ; the number of evaluation of F(x) in iterations is denoted by NF; the CPU running time is denoted as Time and recorded in second; the norm of the residual value of F(x) is denoted as Norm. In order to make the relative fair comparisons in our numerical experiments, we set 8 different initial points as follows: The parameters of MHSP method and PDY method are chosen from [25] and [45] respectively. In MSGP method, we set the parameter σ = 0.001, λ = 1, ρ = 0.8, µ = 0.85 and ν = 0.001. The symbol '-' is denoted the method fails when NF ≥ 5000 on the relevant problems. The numerical results by applying three methods are shown in Tables 1 and 2. VOLUME 8, 2020 Problem 9: A strictly convex function taken from [23], i.e., Problem 10: A trigonometric function taken from [46], i.e., Problem 11: A strictly convex function modified from [23], i.e., Problem 12: A tridiagonal function taken from [47], i.e., Problem 13: A modified logarithmic function taken from [21], i.e., where = (−1, +∞). Problem 14: A function taken from [45], i.e., Problem 15: A non-smooth function taken from [48], i.e., Problem 16: An exponential function taken from [49], i.e., From a practical point of view, both stability and effectiveness are important for a good appropriate algorithm. In our numerical experiments, although good results have been achieved by using MHSP method on some specific problems, such as Problems 10, 12, and 13, PDY method is more stable than MHSP method. In comparison, it can be seen from Tables 1 and 2 that our proposed MSGP method not only guarantees the good performance that MHSP method shows on some problems, but also maintains the stability compared with the PDY method. Furthermore, MSGP method can guarantee the effectiveness and stability of the algorithm with very few iterations and running time, and one obvious fact is that problems on which the method fails are rare. In order to compare the performance of these three methods more intuitively, Fig.1 is plotted by a benchmarking optimization software proposed by [50]. In the sub-figures, the variable χ s denotes the ratio of the performance of different methods to the method whose performance is the best for the same problem, and the vertical axis is the probability of log 2 χ s < τ . Thus, probability closer to 1 shows the method is much better. Overall, our proposed method MSGP is more applicable in many problems.

B. APPLICATION IN SIGNAL RECONSTRUCTION
Sparse representations of signals has attracted much attention from researchers in the fields of signal processing, image processing, computer vision and machine learning, specially sparse signal reconstruction problems. The standard problem of sparse representation is closed to solve a l 2 − l 1 problem as following: where A is a k ×n matrix, b is a vector and τ is a non-negative parameter. Many different algorithms have been proposed to solve (18). In [51], these algorithms are divided into four categories VOLUME 8, 2020 including greedy strategy approximation, homotopy algorithm, proximate algorithm and constrained optimization. One of the earlier greedy algorithm is Matching Pursuits (MP) [52] algorithm. It is a simple method involving the computation of the inner product between the signal and dictionary column or basis by dividing signals on these dictionaries or basis. The homotopy algorithm [53], which requires at least as many iterations as the number of nonzero arguments of x, is a competitive method when the signal is sparse enough. There are various approximate algorithms, such as Iterative Shrinkage/ Thresholding (IST) [54] method, the sparse reconstruction by Separable Approximation [55] (SpaRSA) method and so on. The main idea of these approximate algorithms is to approximate the problem by solving the sub-problems iteratively, and they are usually used to solve the non-smooth convex problems.
Our proposed MSGP method can be viewed as a modified Gradient Projection Sparse Representation (GPSR) method, which is a standard approach for the constrained optimization problem. Comparing with the original GPSR method, instead of searching in the direction of negative value of the gradient, our iterations are obtained by the following two steps. Firstly, a point is obtained by searching in the direction of the negative value of the function. The second step is to project the obtained point onto the separation hyper-plane. By considering the robustness of the signal reconstruction problem, the signal is with noise or error. And the strategy of our method is to reconstruct new signal with length n on the k observations.
As in [56], we split vector x into two parts as where u and v are the positive and the negative parts of x, respectively. Then, from (18), we have with the constraints u ≥ 0, v ≥ 0, where E n = [1, 1, . . . , 1]. By the properties in [26] and [57], the above problem is equivalent to the following form where By Lemma 3 in [58] and Lemma 2 in [59], we have that (20) is Lipschitz continuous and monotone. Thus, our method is applied to solve (20) in this paper. Furthermore, Wan et al. [60] proposed a new reformulation of (18), and they took the constraint u T v = 0 into account in (19). In our numerical experiments, A is a random Gaussian matrix with k = 1024, n = 16384. We denote the origin signal asx, which is a random signal with non-zero elementŝ n = 256. We reconstruct the signalx from b by k = 1024 observations and set f (x) = 1 2 ||Ax − b|| 2 2 + τ ||x|| 1 , where b = Ax +b andb is white Gaussian noise with parameter 0.05, τ = 0.0197762.
We compare the efficiency of our proposed method against the PCG method [29], and IST method [61], which is a well known and widely applied method in Lasso problem (18). To indicate the efficiency of our method, we consider three elements: the MSE, which is equivalent to ||x−x|| 2 n ; the CPU time, which is counted in second; the totally number of iteration abbreviated by "Iter". We set ν = 0.05, µ = 0.8, λ = 10, ρ = 0.6 and σ = 0.001 in MSGP method. The parameter of PCG method follows [29]. The initial guess of x 0 in all methods is set to be 0. The iterations in the numerical experiments terminate when the relative change of the result From the perspective of sparse reconstruction, our method and PCG method can be viewed as special cases of GPSR method. A large number of experiments show that the GPSR method benefits from a so-called warm start [62]. Thus, illustrated by it, we employ the regularization parameter τ as a descending sequence as follows: {800τ, 800 3 4 τ, 800 1 2 τ, 800 1 4 τ, τ } in the numerical experiments by applying these methods to solve the problem (18). Fig. 2 demonstrates the original signal, noised signal, the reconstructed signal by PCG method, MSGP method and IST method, respectively from the top to the bottom. Fig. 3 shows results of the iterations vs MSE*n in (a) and results of CPU time vs MSE*n in (b) respectively. From Figs. 2 and 3, we can clearly find out that, for comparing with the other two methods on the condition of the same order of magnitude of MSE, our method can reconstruct the original signal in less iterations, which means the lower cost of CPU running time.

V. CONCLUSION
In this paper, a new derivative free algorithm MSGP for solving non-linear equations with convex constraints is proposed by combining the method of spectral gradient and the projection method. In our method, we don't need to solve sub-problem, which is different from the classical projection method. More specifically, the line search technique is modified in our method. The global convergence of the MSGP method is guaranteed analytically with certain conditions. The numerical experiments show that the large-scale non-linear equations with convex constraints can be effectively solved with our method. Furthermore, our proposed MSGP method is applicable for solving the L 1 -norm regularized problems in signal reconstruction. As one future work, we will explore the application of our method in the research area of the neural network. It is noted that the projection neural networks proposed in [36], [37] can also be used to solve the problem in this paper. We will report the comparison between our method and those projection neural networks in the forthcoming paper.

A. A PROOF OF LEMMA 5
Proof: From Assumption 3 and the definition of d k , we have In addition, F is monotonic, i.e., (12) holds. Then we have From (9), (21) and (22), γ k satisfies And then we discuss the value of θ k in two cases. If F(x k−1 ) ≥ F(x k ) , from the definition ofF and the Cauchy-Schwarz inequality, we have Both of the above two inequalities imply that Here, we denote 1 = 1−µ ν+L , 2 = 1 ν . From (7), (23) and (24), consequently we obtain Combining the definition of d k , we have (13) holding.

B. A PROOF OF LEMMA 7
Proof: We first deduce the boundedness of {x k −x} and where the first and second inequalities follow (4) and Assumption 4 respectively, and the last equality follows the definition of ζ k . The inequality (26) yields that {x k −x} is a non-increasing and bounded sequence, and thus {x k −x} is convergent. Since By (26) and (28), we obtain where the last inequality is obtained from (10). Hence, the sequence {α k d k } is bounded and From the process of Step 4 in Algorithm MSGP and the (30), the following inequality holds that −F(x k + ρ −1 α k d k ) T d k < σ α k ρ −1 ||d k ||||F(x k +ρ −1 α k d k )|| ≤ σ α k ρ −1 ||d k ||M .
By the definition of d k , we have where the inequalities are obtained from (13), (31) and Assumption 3 respectively. By (32) and the assumption ||F k || > , we have The above inequality implies that which contradicts with (29). The proof is completed.