A new generalized quasi-Newton algorithm based on structured diagonal Hessian approximation for solving nonlinear least-squares problems with application to 3DOF planar robot arm manipulator

Many problems in science and engineering can be formulated as nonlinear least-squares (NLS) problems. Thus, the need for efficient algorithms to solve these problems can not be overemphasized. In that sense, we introduce a generalized structured-based diagonal Hessian algorithm for solving NLS problems. The formulation associated with this algorithm is essentially a generalization of a similar result in Yahaya et al. (Journal of Computational and Applied Mathematics, pp. 113582, 2021). However, in this work, the structured diagonal Hessian update is derived under a weighted Frobenius norm; this allows other choices of the weighted matrix analogous to the Davidon-Fletcher-Powell (DFP) method. Moreover, to theoretically fill the gap in Yahaya et al. (Journal of Computational and Applied Mathematics, pp. 113582, 2021), we have shown that the proposed algorithm is R-linearly convergent under some standard conditions devoid of any safeguarding strategy. Furthermore, we experimentally tested the proposed scheme on some standard benchmark problems in the literature. Finally, we applied this algorithm to solve robotic motion control problem consisting of 3DOF (degrees of freedom).


I. INTRODUCTION
I N this research article, we propose generalized structuredbased quasi-Newton algorithm for nonlinear least-squares problems of the following form: where the residual, r i : R n → R is a smooth function for each i = 1, 2, · · · , m. We assume that for higherdimensional problems, i.e., when (n is large), the Jacobian matrix of r, J(x) T is not stored explicitly; however, we can evaluate the matrix-vector product say, J T v, where v ∈ R m . Moreover, the gradient, g(x) and Hessian, H(x) of f are defined as follows: and Algorithms for solving (1) are paramount because of their wide range of applications, since problems of the form (1) arise in robotic motion, imaging, parameter estimation, data fitting, and also when solving systems of nonlinear equations (for more information, kindly see [1]- [19]).
In recent times, there are some algorithms developed for solving (1) considering its structure. The approach adopted in formulating these algorithms is mostly toward approximating the action of the Hessian of (1) by a structured vector, z ∈ R n which can be derived through Taylor series expansion of r i or it's gradient g i , for i = 1, 2, · · · , m such that a secant condition, Hs ≈ z or weak secant condition, s T Hs ≈ s T z is satisfied, where s is a difference between successive iterates. For instance, in [20] the authors approximate the Hessian in (3) with a scalar multiple of an identity matrix such that the secant condition is satisfied. They incorporated this approximation into the well-known Barzilai and Borwein [21] (BB) spectral parameters and their convex combinations, as reported in [22]. Similarly, although using a different paradigm, Mohammed and Santos [23] came up with diagonal-based approximations of Hessian's (3) first and second matrix terms. The derived approximations satisfied the modified secant condition. However, despite approximating these matrices in (3), their search directions require several safeguarding techniques before the sufficient descent condition is satisfied.
To mitigate some of the shortcomings of their proposal, recently, Yahaya et al. [24] proposed structured, quasi-Newtonbased algorithms for solving (1). First, the two formulations of the structured vector were derived. Both derivations approximate only the second term of (3), where the first formulation is estimated using first-order Taylor series expansion. On the other hand, the second term is approximated to higher-order Taylor series expansion on r i and its g i for each i = 1, 2, · · · , m by using the Richardson extrapolation technique to get rid of the tensor terms. These derived formulations are such that a modified weak secant condition of Dennis and Wolkowicz [25] is satisfied. Thus, they used the formulations to develop two diagonal updating schemes. These are then independently used in generating the search directions. Interestingly, their algorithm requires fewer userdefined parameters in the search direction.
This paper used the formulation in [24] to derive a generalized diagonal updating mechanism using a weighted Frobenius norm defined as (1), where A ∈ R nxn , tr(·) is trace operator, and W is a weighted matrix which changes at every update and often different choices of it, leads to other updates. Some well-known updates include Davidon-Fletcher-Powell (DFB) and Powell-Symmetric-Broyden (PSB). Motivated by the previous works, this paper also aims to fill in the gap of the recent work [24] by giving the rate of convergence results under some standard assumptions with the aid of an Armijo line search strategy.
Inspire by the work of Yahaya et al., [24] this paper gives the following contributions: 1) We propose a generalized structured diagonal approximation of the Hessian of the objective function. 2) Under some standard assumptions and with the aid of the chosen line search technique, we show the R−linear convergence of the algorithm. 3) We apply the proposed algorithm to a robotic motion control model with 3DOF. We divided the remainder of the article into the following sections: We will state the algorithm's formulation and its steps in section 2. Next, we describe the algorithm's convergence under some conditions in section 3, and finally, we present experimental results of the algorithm and its application in section 4. In this article, ∥.∥ means a Euclidean norm.

II. DESIGN AND STATEMENT OF THE PROPOSED ALGORITHM
From the second term of (3), we can observe that computing the residuals' second-order derivative is required. This second term is computationally expensive; thus, approximating the term may be a reasonable idea since it helps to evaluate the Hessian of the objective function.
Suppose at an iteration say, k the second term of equation (3) is as follows: in which r i (x k+1 ), and K i (x k+1 ) denote the i th − component of the residual vector r(x k+1 ), and Hessian of r i (x k+1 ), respectively.
Thus, the goal is to find a diagonal matrix say, B(x k+1 ) that satisfies the following weak secant condition stated as follows: is defined using the least change secant condition and the term we are now left with approximating the term s T k Q(x k+1 )s k . Now, post-multiplying (4) by s k gives where for notational simplicity, we represent Q(x k+1 ) = Q k+1 , r i (x k+1 ) = r i k+1 , and K i (x k+1 ) = K i k+1 , this is essentially approximating the action of the second order term K i k+1 on s k without explicitly computing the K i k+1 . Suppose that the gradient of the residual r i k+1 at i th − component is denoted by g i k+1 . We now, use Taylor's series expansion on g i k+1 to approximate the term K i k+1 s k as follows: Now, we have Therefore, plugging-in equation (6) into equation (7) and summing over all i = 1, 2, 3, · · · , m gives Hence, we aim to obtain diagonal matrix, B k+1 which satisfies the property that Thus, the requirement is that The diagonal approximation, B k+1 of the Hessian, H k+1 in the above modified weak secant condition is defined as B k+1 = B k + C k , in which C k is a diagonal correction matrix, where B k is a diagonal approximation of H k and both B k and B k+1 are required to positive definite. Next, we state Lemma with which we derive the diagonal entries of the correction matrix. Lemma 1. Let C k and B k be two diagonal matrices containing the elements c i k and b i k for i = 1, 2, · · · , m respectively.
Then the entries, c i k of the solution of the following optimization problem satisfies where ∥.∥ W , is a weighted Frobenius norm and tr(·) is trace of a matrix.
Proof. The optimization problem (10) can be reformulated as Since the problem (10) is convex. So, the Lagrangian function of (12) is as follows: in which, β k is a Lagrangian multiplier. Now, evaluating ∂L ∂c i k and setting ∂L this implies, Therefore, solving for β k from the above expression gives Thus, plugging equation (17) into equation (16), gives the entries of the correction matrix, C k as

VOLUME 4, 2016
Now, by setting S k = diag(s k ) and W k = diag(w k ) and substituting these terms in (18), the diagonal correction matrix, C k can simply be written as Note: The motivation behind adding the trace operator in equation (10) is that we intend to find the correction matrix that clusters the eigenvalues of the updated diagonal matrix, B k+1 , in such away that its condition number is improved.
Moreover, in what follows, we look at some possible options of the weighting matrix, W k in (19). Some of the apparent choices of W k are as follows: 1) Choice Take W k = I, leads to the standard formulation proposed in [24]. 2) Choice Another alternative choice of W k , motivated from Davidon-Fletcher-Powell (DFP) update can be obtain by setting W k = B k . This will yield the following correction matrix It can be observed that by choosing a weighting that varies, the denominator of (20) may become too small as the iteration progresses. To remedy this, as was similarly suggested in [26], we use the above correction matrix in the update if where ν 1 is some small values in the interval, (0, 1).
Therefore, the search direction say, d k of the propose method can simply be defined as where B 0 = diag(b i 0 ), b i 0 = 1 for all i, and the entries of the diagonal matrix B k+1 are given as follows where c i k is given by (18) and ν 2 ∈ (0, 1). The parameter ν 2 , is introduce into (22) just to aid in showing the convergence result.
We employed a monotone line search couple with backtracking strategy for selecting a suitable step length. The step say, α that satisfies Armijo line search conditions together with backtracking strategy, is computed as follows: Algorithm 1: Armijo line search with backtracking.
Compute r k , f k and g k ; and then compute d k = −B −1 k g k .
Step 1: If ∥g k ∥ ≤ T ol, stop. Else, go to Next step Step 2: Compute α k using Algorithm 1.
Step 3: Evaluate the next iterate using Step 4: Evaluate the update of the entries, b i k+1 of the diagonal matrix, B k+1 as follows: where c i k is defined in (18).
Step 5: Update as follows: Step 6: Set k := k + 1 and go to Step 1. Remark 1. The above Algorithm 2 is composed of two algorithms depending on the choice of W . If W = I for all k, then in evaluating the entries, c i k for i = 1, 2, · · · , m of the correction matrix, C k , w i k = 1 for all k, however, if W = B for all k, then the entries c i k for i = 1, 2, · · · , m are computed using (20).

III. CONVERGENCE ANALYSIS
For the convergence analysis of the proposed algorithm, we first present the following useful assumption: Assumption 3. There exist some positive constants N 1 and f or all u ∈ R n and x ∈ χ, holds.
Next, we state an underline assumption on the Jacobian matrix and residual as follows: Assumption 4. We also assume that the Jacobian, denoted by J(x) and the residual r(x) are Lipschitz continuous in some neighborhood N of χ with Lipschitz constants l 1 > 0 and l 2 > 0 i.e ∥J(x) − J(y)∥ ≤ l 1 ∥x − y∥, and ∥r(x) − r(y)∥ ≤ l 2 ∥x − y∥, ∀x, y ∈ χ.
It can be deduced from the above Assumption 4 that there exist some positive constants l 3 , c 1 , c 2 , c 3 such that ∀x, y ∈ χ, we obtain Lemma 2. Suppose Assumptions 2, 4 and 3 hold, then there exists some positive constants N 1 andN such that, ∀k > 0, Proof. Recall, that γ k is defined in (13) as follows where L := c 2 1 + l 1 c 2 . Now, from (25) and the above inequality, we have where y k = g k−1 − g k and z k is a structured vector. Hence, by settingN = N 2 + L, the inequality (26) holds.
Lemma 3. Suppose that the step-size α k is established by Algorithm 1, and assume that Assumption 3 is satisfied. Then either α k = 1 or there exist some positive constants p 1 and p 2 such that: Proof. Suppose 23 is satisfied by α k = 1, then the first segment of the proof is achieved. Let α k < 1, which simply mean that the relation (23) failed, for a step-size α k < α ≤ 2α k . This implies Then by using mean-value theorem, we can have (0, 1). Thus, this leads to (28) and Assumption 3, yield Thus, the lower bound on α k is established, when we set p 1 = (1−ς) 2N2 . The condition in (23) gives the upper bound on α k , by Taylor's theorem, we have for some ψ that lie in the line segment joining x k+1 and x k . Therefore, The required inequality in (27) is obtain by setting Furthermore, if ν 2 < 1, then

VOLUME 4, 2016
Now, using Lemma (2), we have On the other-hand, we have Hence, the diagonal matrix B (i) k is bounded both above and below by some positive constants for i = 1, 2, · · · m.
We now state and prove the theorem that shows the convergence rate of the proposed algorithm.
Theorem 5. Suppose the Algorithm 2 generates sequence {x k+1 } using (24) where the search direction, d k = −B −1 k g k , whose elements of B k are evaluated using (22) and let Assumptions 2 and 3 hold. Then for any positive definite matrix B 0 , which possesses bounded diagonal entries, the Algorithm generate sequence of iterates which converges to the minimizer say, x * and Moreso, there is a positive constant, ν 3 in [0, 1) such that Proof. Suppose we define θ k to be the angle between the search direction, d k and negative gradient,−g k stated as Therefore, using the line-search condition in Algorithm 1, the lower boundedness of α k in Lemma 3 and the assumptions on f we have.
2N1 . Now, from Assumption 3 i.e the boundedness of ∇ 2 f, we have where g(x * ) = 0 and hence, applying the Cauchy-Schwartz, on the above expression, we have Hence, Therefore, using (29) and (30), we have Now, using the upper bound of α k and the inequality which states that ∥B k s k ∥ ∥s k ∥ ≤ tr(B k ).
Now, applying the geometric/arithmetic mean inequality (i.e det(B k+1 ) Similarly, using the upper bound for B Hence, using the above relation, together with bounds of α k , we have Thus, using the relations (33) and (34), we have Moreover, when induction is apply to (31), we have Now, re-applying the geometric/arithmetic mean and using (35), we get where ν 3 = 1 − N 1 ςp 2 p1 ∆2∆3 < 1. Furthermore, with the aid of Assumption 3, we can easily achieve as follows: The above relation (38), together with (37) gives Hence, Therefore, the sequence {x k } is convergent.

IV. NUMERICAL EXPERIMENTS
This section explores the proposed algorithm's numerical performance compared to other recent structured algorithms. We segmented the experiment into two components. The first part is composed of/discuss testing the algorithm on some benchmark test problems. On the other hand, the second segment comprises applying the proposed algorithm to solve some data fitting problems in the literature. We conducted these experiments on a MATLAB R2019b programming packet installed on a PC with a processor speed of 1.60 GHz, intel CORE i5-8265U, and 8 GB of RAM.

A. EXPERIMENTATION ON SOME BENCHMARK TEST PROBLEMS
This subsection presented some numerical results on solving a set of benchmark test problems. These results verify the numerical efficiency of the proposed algorithm in comparison to ASDA1 and ASDA2 (which are essentially the proposed algorithm when W = I) algorithms developed in [24]. The extracted problems are from various sources in the literature; we cited each problem's reference and their respective standard initial starting point (see Table 1).
The set of problems considered in this experiment comprises twenty (20) large-scales while the remaining three (3) are small-scales. Each of these large-scale problems had varying dimensions. This dimensions are 3000, 6000, 9000, 12000, 15000. The parameters used in implementing the proposed GSDA algorithm are as follows: • Algorithm GSDA: ϵ = 10 −2 , ς = 10 −4 , ε = 10 −3 , T ol = 10 −4 On the other hand, we took the parameters of ASDA1 and ASDA2 from [24]. Furthermore, unlike ASDA1 and ASDA2 algorithms, where a monotone line search strategy is adopted, we used a simple Armijo line search technique based on Algorithm 1. An approximate solution is achieved when the stopping criterion ∥g k ∥ ≤ 10 −6 is satisfied. However, a failure by an algorithm reported as F occurs when either the number of iteration surpasses 1000 and the stopping criterion mentioned above has not been satisfied. The standard metrics of comparison used are the number of iterations, number of functions evaluations, number of matrix-vector products, and computing time. These are represented by #niter, #nf val, #nmvp and #ncpu respectively. The results of the numerical experiments are tabulated and made available in this link: https://github.com/MAHMOUDPD/Experimental_ Results_of_GSDA_Algorithm. It can be observed from the results that our proposed algorithm GSDA (with W = B ) solved all the test problems successfully. The ASDHA1 algorithm subsequently follows this. However, the ASDHA1 and ASDHA2 algorithms recorded some failure cases in problems named P2, P15, and P20. Moreover, for a concrete visual representation of the result, each metric considered for all the problems is summarized using the well-known performance profile of Dolan and Moré [27]. That is, for each algorithm, we plot a fraction, say, ρ of problems for which the algorithm performed well within a factor, say τ. One can easily see from the Figs. 1-4, that the performance of GSDA is superior to all of its competitors. Since the curves formed by the proposed GSDA topped all the algorithms thus, these results indicate that the GSDA algorithm could provide a better alternative for solving NLS problems. Thus, this further accentuates the efficiency of the GSDA algorithm. VOLUME 4, 2016 Remark 6. To mitigate the possible generation of nonpositive and singular updated diagonal matrix, B k+1 , we obviously require that the entries, b i k+1 > 0 for all i = 1, 2, · · · , m. In practical implementation, the direction, where ϵ * is a positive parameter. (1/3, 1/3, · · · , 1/3) T P2 Trigonometric function [29] (1/n, · · · , 1/n) T P3 Discrete boundary value [29] ( Linear function full rank [29] (1, 1, · · · , 1) T P5 Problem 202 [30] (2, 2, · · · , 2) T P6 Problem 206 [30] (1/n, · · · , 1/n) T P7 Problem 212 [30] (0.5, · · · , 0.5) T P8 Strictly convex function I [31] (1/n, 2/n, · · · , 1) T P9 Sine function 2 [32] (1, 1, · · · , 1) T

B. APPLICATION IN 3DOF MOTION CONTROL OF ROBOTIC MANIPULATOR
In this segment, we apply the proposed GSDA algorithm to solve a real-robotic model with three degrees of freedom (3DOF) that was describe in [36]. We describe the three joint kinematic model in a planar, and the discrete kinematic model equation with 3DOF can be represented using the following equations r(θ) =   l 1 cos(θ 1 ) + l 2 cos(θ 1 + θ 2 ) + l 3 cos(θ 1 + θ 2 + θ 3 ) l 1 sin(θ 1 ) + l 2 sin(θ 1 + θ 2 ) + l 3 sin(θ 1 + θ 2 + θ 3 ) where r(·) is kinematic mapping function which relate the position and orientation of a robot's end-effector or any part of the robot to an active joint displacements,θ ∈ R 3 , l i (for i = 1, 2, 3) denotes the length of each link, and in a context of robotic motion control, r(θ) is an end effector position vector. Suppose, δ t k ∈ R 2 denotes the desired path vector at any given time say, t k . We formulated the following leastsquares problem which is solved at every time interval say, The problem is stated as follows: where δ t k as reported in [37] is the desired path at t k of a Lissajous curve express as It can be observed that the above equation (40) resembles the structure of (1). Thus the GSDA Algorithm can be used to solve it. Now, to solve the model and subsequently simulate the results, we initialize the joint at time instant, t = 0 to be θ t0 = [0, π 3 , π 2 ], the link length as l i = 1 (for i = 1, 2, 3) and the maximum duration it takes as, t max = 10s, in the above Algorithm 3.
Finally, we can observe from the figures that portray the results obtained from solving (40) using the proposed algorithm. These results are plotted in Figs. 5-6. It can be seen that from 6, the task of synthesizing the robot trajectories is successfully achieved, and the error rate of the residuals is about 10 −6 which can be observed from Figs. 7 and 8.

V. CONCLUSION
We have proposed an algorithm for computing a minimizer of nonlinear least-squares problems. The developed algorithm is essentially based on a standard quasi-Newton class of algorithms; we called the algorithm 'generalized structured based diagonal algorithm' (GSDA). It was derived based upon a structured weak secant, the least-change secant updating scheme coupled with the trace of the correction matrix of the updated matrix. The least-change secant is under a weighted Frobenius norm. Thus, the algorithm is matrixfree and straightforward; this simplicity, of course, yields its low computational cost in each iteration. Furthermore, it should be noted; this algorithm is a generalization of the algorithms proposed in [24]. We have also presented the convergence result of the proposed algorithm. In addition, we have shown that the algorithm with monotone(Armijoline search) is R−linearly convergent; this fills the gap that existed in [24] for a convex class of NLS problems. Moreover, the proposal was numerically shown to be efficient and comparatively better than those proposed in [24] when the associated weighted matrix, W, is taken as the previous diagonal update B k . However, if the weighted matrix is an identity, I, the proposed structured formulation of the diagonal update becomes the one presented in [24]. Finally, we have shown that the algorithm can be applied successfully to robotic planar motion control manipulators with 3DOF; this underscores the applicability of the proposed algorithm. Our GSDA MATLAB codes are available on the first Author's GitHub page through this link: https://github. com/MAHMOUDPD/GSDA_for_Robotic_Arm POOM KUMAM received a PhD degree in mathematics from Naresuan University, Thailand. He is currently a Full Professor with the Department of Mathematics, King Mongkut's University of Technology Thonburi (KMUTT). He is also the Head of the Fixed Point Theory and Applications Research Group, KMUTT, and also with the Theoretical and Computational Science Center (TaCS-Center), KMUTT. He is also the Director of the Computational and Applied Science for Smart Innovation Cluster (CLASSIC Research Cluster), KMUTT. His research targeted Fixed point theory, Variational analysis, Random operator theory, Optimization theory, and approximation theory. Also, Fractional differential equations, Differential game, Entropy and Quantum operators, Fuzzy soft set, Mathematical modelling for fluid dynamics and areas of interest Inverse problems, Dynamic games in economics, Traffic network equilibria, Bandwidth allocation problem, Wireless sensor network, Image restoration, Signal and image processing, Game Theory and Cryptology. He has provided and developed many mathematical tools in his fields productively over the past years. Dr Poom has over 600 scientific papers and projects either presented or published. Moreover, he is editorial board journals more than 50 journals, and also he delivers many invited talks at different international conferences every year all around the world.
PARIN CHAIPUNYA obtained his PhD at King Mongkut's University of Technology Thonburi (KMUTT) and is now affiliated to the Department of Mathematics, King Mongkut's University of Technology Thonburi (KMUTT). He also works for the Center of Excellence in Theoretical and Computational Science (TaCS) at the same university. He has broad areas of interest but mainly roam around nonlinear functional analysis, general topology and differential/metric geometry. He authored and co-authored many high-quality research articles and projects in high impact journals. Moreover, he loves working with colleagues worldwide.
ALIYU MUHAMMED AWWAL received B.Sc. and M.Sc. degree from Gombe State University and Bayero University Kano respectively. He recently received Ph.D degree in applied mathematics from the King Mongkut's University of Technology Thonburi (KMUTT). He has authored and co-authored a number of research articles in high impact journals. His current area of research include iterative algorithms for solving nonlinear problems such as numerical Optimization problems, nonlinear least squares problems and system of nonlinear equations with applications in signal recovery, image deblurring and motion control. VOLUME 4, 2016