An efficient Quasi-Newton method for nonlinear inverse problems via learned singular values

Solving complex optimization problems in engineering and the physical sciences requires repetitive computation of multi-dimensional function derivatives. Commonly, this requires computationally-demanding numerical differentiation such as perturbation techniques, which ultimately limits the use for time-sensitive applications. In particular, in nonlinear inverse problems Gauss-Newton methods are used that require iterative updates to be computed from the Jacobian. Computationally more efficient alternatives are Quasi-Newton methods, where the repeated computation of the Jacobian is replaced by an approximate update. Here we present a highly efficient data-driven Quasi-Newton method applicable to nonlinear inverse problems. We achieve this, by using the singular value decomposition and learning a mapping from model outputs to the singular values to compute the updated Jacobian. This enables a speed-up expected of Quasi-Newton methods without accumulating roundoff errors, enabling time-critical applications and allowing for flexible incorporation of prior knowledge necessary to solve ill-posed problems. We present results for the highly non-linear inverse problem of electrical impedance tomography with experimental data.


I. INTRODUCTION
Historically, derivative-based optimization is at the heart of, and required for solving, complex systems in engineering, applied sciences, and applied mathematics [1], [2]. The computation of suitable update equations can be particularly problematic, when the system equations are computationally expensive to evaluate. This is especially a problem, when solving (large) nonlinear inverse problems of the form with a nonlinear model A, typically given by a finite element solver. Given data g with possible noise δg, we can then compute a solution f minimizing a suitable penalty functional the reconstruction procedure as well as incorporate prior information. For nonlinear problems, second-order methods are typically employed to compute minimizers of eq. (2) due to fast (quadratic) convergence. However, given potentially expensive evaluations of the forward model A the computation of second order derivatives is undesirable. Thus, approximations such as the Gauss-Newton method are used, where only first order derivatives are required in each iteration, collected in the Jacobian J = ∂A(f ) ∂f . We can then form an approximate Hessian H = J T J, characterizing the GN approach. Yet, if no efficient/accurate analytical or semi-analytical algorithms are available, computing J is often problematic, requiring either (a) numerical differentiation or (b) updating approaches.
Both, (a) and (b), have advantages and disadvantages. Numerical differentiation approaches, namely the perturbation method, can indeed yield reliable derivative approximations. However, perturbation methods require repetitive function evaluations which can be extremely demanding. Therefore, when semi-analytical gradients are not available, conventional numerical differentiation approaches may be infeasible in solving high-dimensional nonlinear inverse problems -if not intractable on accessible computing resources [3], [4]. To address this challenge, a number of researchers have proposed methods for updating gradient terms, rather than, e.g., computing finite differences at each iteration; these approaches are referred to as Quasi-Newton (QN) methods. Possibly the most classical QN derivative approximator utilizes Broyden's method proposed in 1965 [5], taking advantage of the secant method. Since then, a number of successful updating methods have been proposed including the ever popular Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [6]- [9], the limited memory BFGS algorithm [10], and a suite of other QN-family derivative approximators [11]. Nevertheless, incorporation of prior information to regularize and stabilize the reconstruction for ill-posed problems is often either limited or not possible for QN methods [4], [12], thus effectively limiting applicability in practice. This is exacerbated in the case of nonlinear problems, where prior information is essential to stabilize reconstructions.
Motivated by recent advances of data-driven methods for nonlinear inverse problems [13]- [24] we propose a new QN approach for updating J based on the singular value decomposition (SVD). First we learn a representative decomposition of the Jacobian from a randomly generated set of training data. We then train a neural network to predict the associated singular values from the model outputs, making it possible to compute an approximate Jacobian without the need for arXiv:2012.07676v2 [math.NA] 1 Mar 2021 numerical differentiation, resulting in the proposed neural network augmented Quasi-Newton method (NN-QN). In related studies, researchers have proposed learned SVD frameworks in applications to regularized inverse problems [25]- [27] We will first detail the mathematics and framework for the NN-QN approach in section II. We then present an application to the highly nonlinear inverse problem of electrical impedance tomography in section III. In the following, the approach will be tested with experimental data in section IV. Lastly, conclusions will be provided.

A. Implementing an approximate Jacobian
Computing minimizers of L(f ) in eq. (2) requires an iterative scheme where the search direction ∆f k needs to be computed as well as a suitable step-length λ k ∈ (0, 2] [28]. Let us first discuss the unconstrained case without a regularization term, i.e. we are solving only the least squares problem g − A(f )| 2 2 . When using the GN method, we are required to compute J at each iteration in order to attain an unregularized update ∆f k = H −1 J T (g − A(f k )). It is worth mentioning that a defining characteristic of the GN method is that higher order terms are ignored when computing the Hessian H = J T J. On the other hand, QN methods generally aim to either approximate/update (i) H or H −1 using gradient information or (ii) J and use the approximation of the type H = J T J [29], [30]. Moreover, when statistical noise information is available, we may include noise weighting in the search direction via a weight matrix W.
In this work, we will utilize option (ii) and consider the regularized problem in eq. (2) which is required for ill-posed problems. Given this choice, we obtain the following weighted description for the update where Γ R and ∂R correspond to the regularization and gradient of the regularization term [31]. In our proposed QN method we utilize the SVD for computing the Jacobian, with U and V T containing the left and the right singular vectors, respectively, and S is a diagonal matrix containing the singular values [32]. The crucial assumption for our QN method is that we herein assume that the matrices U and V T can be reasonably approximated, given a suitable initialization. That is, given an initial f 0 we compute the SVD of the corresponding Jacobian i.e. J(f 0 ) = U 0 S 0 V T 0 . We then fix V = V 0 and U = U 0 for any approximation of J during the iterative process of eq. (3). Since S contains the nonnegative singular values, it centrally influences the estimation of J and we aim to estimate the updated Jacobian given the fixed singular vectors as The challenge is now to obtain a good estimate of the singular values in S in order to compute the Jacobian (the target quantity) for the QN update. To do this, we propose the use of a neural network Λ to find an approximate set of singular values S Λ ≈ S and obtain the learned estimate for J as The obtained approximation to the Jacobian can now be readily employed to compute the search direction in eq. (3) and forms the basis for the learned QN method proposed here. In the next subsection, we will present the neural network (NN) used for predicting and approximating S Λ .

B. Learned prediction of the singular values
The keystone to the viability of the NN augmented Jacobian approximation, as proposed here, is an accurate and reliable prediction of the singular values in S Λ . We note, that S Λ is a diagonal matrix and hence we are only left with estimating the vector of singular values diag(S Λ ), simplifying the implementation of a learned estimator. What remains unclear at present is the selection of the input to the network Λ to predict diag(S A ). An obvious choice is the model output itself While other choices may be viable, for simplicity we choose this mapping and aim to predict where Λ denotes the functional mapping between NN inputs and predictions. We can then readily form the estimated singular value matrix S Λ . The proposed neural network augmented Quasi-Newton method (NN-QN) can then be obtained by simply substituting eq. (7) into the update in eq. (4): The major obvious advantage to using the NN-QN method is computational speed, since A(f ) need not be repetitively computed to form the Jacobian after initialization. We also note that Jacobian updates are not susceptible to accumulating round-off errors present in other QN gradient estimators [33], the NN-QN method largely depends on the quality of U 0 and V 0 estimates. Additionally, it can be expected that S Λ predictions are primarily valid within the space of the training data; but, as we will show in the following, highly general random fields can be successfully used for network training. Lastly, we present a summarizing pseudocode of the workflow in Algorithm 1 as a reference for solving nonlinear inverse problems using the proposed NN-QN method. Remark: It is sometimes the case that optimization problems do not require regularization. As such, we would like to note that the proposed NN-QN method also holds for unregularized weighted solutions by using ∆f k = ( ). In the case that weighting is not used and assuming J T Λ J Λ is invertable, we can in fact represent the update in the reduced while Stopping criterion > tolerance do 7: Compute ∆f k ← Solve eq. (9) 10: Use linesearch to compute λ k 11: f k+1 ← f k + λ k ∆f k 12: k ← k + 1 13: end while 14: f rec ← f k 15: end function(return f rec )

A. Electrical Impedance Tomography
Herein we demonstrate the proposed NN-QN method using Electrical Impedance Tomography (EIT) [34], [35], a nonlinear inverse problem aiming to estimate the conductivity σ provided voltage data g. Due to notational convention we write σ instead of f for the unknown. The resultant observation model for EIT is then written as g = A(σ) + δg where δg is a noise term and A(σ) is the nonlinear forward model characterized by the Complete Electrode Model [36], [37] and solved using finite elements. In addition to the nonlinear relationship between σ and g, EIT is ill-posed and susceptible to measurement noise. As such, regularization is essential to attain suitable reconstructions; to do this, we aim to minimize the following functional characterizing absolute EIT imaging where L T e L e = W −1 and R(σ) is the regularization term. Lastly, we require a linearization point from which to compute the left and right singular vector matrices U 0 and V T 0 . For this, we simply utilize σ 0 = σ exp where σ exp is a constant homogeneous estimate based on the expected conductivity, providing a good estimate and general starting point.
In the experimental EIT part, data from water tank (nonconductive inclusion localization) and carbon black-modified glass fiber/epoxy laminate (damage localization) tests are considered. In both cases, 256 measurements are collected using adjacent excitation and measurement following [38], [39]. Geometries for the water tank and laminate tests are circular (radius = 14 cm) and square (width = 9.54 cm), respectively. These circular and square geometries have 16 electrodes and are discretized using quadratic triangular finite elements meshes consisting of N el = 2034 and N el = 2528, respectively. To attain reliable electrode contact impedances for these finite element models, the best homogeneous estimate was used. Lastly, in order to demonstrate EIT reconstruction generalizability using the proposed NN-QN method, two different regularization forms are used. To accomplish this, we use total variation regularization and Laplacian regularization for the water tank and laminate tests, respectively.

B. Neural network implementation
The NN architectures used herein employ the same structure, since data sizes for both example problems are identical. Firstly, we note that the input and output data vectors are of size R 256 , because the number of singular values is equal to the number of model outputs for the test examples. Given this, we select a fully connected regression network consisting of three hidden layers composed of 300 neurons each. The first two layers utilize LeakyReLU activation functions with a negative region slope of α = 0.1 while the last hidden layer uses ReLU as activation to ensure non-negative singular values.
To train the networks, N = 5000 randomized training and N t = 1000 validation data sets are used. The randomized conductivity training samples σ rand were generated using a spatially-weighted Gaussian kernel G perturbed with a random vector r, i.e. σ rand = G −1 r + σ exp . We note that this is highly general training data, where we ensure the conductivity mean is centered at σ exp while the range of the random samples varies within two standard deviations of the mean. To illustrate this, Fig. 1 shows samples for both EIT geometries. In optimizing the network using adaptive learning rates (with initial 10 −5 ), the following loss function is minimized where diag(S n ) and diag(S n Λ ) represent the n th vector of singular values in the training set and output from the NN, w are the network weights, and κ is the regularization hyperparameter. For regularizing the network during training, κ = 0.001 and a dropout rate between layers of 20% are used. To prevent over-fitting, training is terminated when the change in the validation loss was less than 10 −2 between epochs. Summarily, learning singular values requires (i) generating model output and singular value samples, (ii) prescribing NN architecture/parameters, and finally (iii) minimizing eq. (11).

IV. EXPERIMENTAL RESULTS
In this section we compare absolute EIT reconstructions from experimental water tank and laminate data as shown in Fig. 2. These reconstructions are computed using (i) the GN method using an accurate perturbed J, (ii) the proposed NN-QN method for computing J, and (iii) Broyden's classical method for computing J. We note that Broyden's method was selected as it readily supports the integration of W in the inversion. Additionally, the perturbation method was selected for GN optimization to illustrate computing demands when a semi-analytical algorithm for calculating J is not available (as is often the case).
Water tank reconstructions in Fig. 2(a) all show the localization of the non-conductive inclusion well. In fact, by visual inspection, NN-QN produces the fewest artifacts near the electrodes but slightly more near the target. We may conclude that NN-QN is comparable to GN, even though GN is based on the more accurate J and H. Meanwhile, it is also apparent that the background water conductivity estimated using Broyden's method has significant fluctuations near the electrodes and is significantly lower than the accurate GN estimate. This observation is likely owed to roundoff errors accumulating in successive iterations, consistent with findings in [4]. On the other hand, such roundoff errors do not occur in the proposed NN-QN method, resulting in a more accurate estimation of the background conductivity.
Images reported in Fig. 2(b) endeavor to quantitatively localize two through-holes in the composite laminate. These absolute images show that the holes are visible in all cases, although the damage is clearly blurred using Broyden's method. Due to the small size of the holes relative to the imaged domain and the diffusive nature of EIT, it is clear that the presence of errors/noise in J and H resulting from the use of Broyden's method significantly impacts the ability of EIT to detect highly localized changes in conductivity. Comparing the proposed QN-NN and GN reconstructions, the drop in conductivity observed near the holes is visually comparable -although the NN-QN reconstruction is somewhat smoother relative to the GN reconstruction. Note that the conductivity of carbon filler-modified polymers is known to vary appreciably even for a set filler concentration [40]. Hence, conductivity fluctuations away from the holes are expected in Fig. 2(b).
A table is provided (Table I), reporting key variables associated with the computational demand of reconstructions presented. Owing to the fact that the GN method used forward differencing to compute J, GN reconstructions took approximately 100-200x longer (≈ 15 minutes) to reach an equal stopping criteria (||f k+1 − f k || 2 2 /||f k || 2 2 ≤ 10 −2 ) than the Broyden-or NN-based approaches (≈ 5 seconds). We point out that the computation times are after initialization. Overall, NN-QN and Broyden's method exhibited similar minimization behavior, requiring less time to reach a minimum but requiring more iterations than GN. We illustrate the effectiveness of singular value predictions, relative to those obtained from an accurate Jacobian J Accurate , in Fig. 3(a) which compares closely. More importantly, the accurately predicted singular values contribute to a significantly improved approximation of J relative to the Broyden approach, as measured by the differences in matrix norms ||J Accurate − J|| 2 shown in Fig. 3(b), which explains the improved reconstruction quality. Specifically, the higher accuracy in early iterates may be beneficial for the final reconstruction quality.

V. CONCLUSIONS
In this work, we presented a new data driven approach to the Quasi-Newton method intended for application to non-linear inverse problems. The method uses a neural network to map non-linear numerical model outputs to the singular values of the Jacobian matrix, which can be used in forming an approximation of the (otherwise costly) Jacobian. By doing this, the method falls within a subset of Quasi-Newton methods.
Preliminary, the efficacy of the proposed NN-QN method has been demonstrated in the context of an ill-posed nonlinear inverse problem -EIT. It was shown that the use of different priors may be effectively incorporated into the approach. While these results showed that the NN-QN method may improve reconstructions relative to the use of other QN methods, specific further research is required to enable a more concrete evaluation. Lastly, it was found that applicationspecific prior information may be included in the data driven Jacobian estimate via the selection of training data. Whereas very general training data was used herein, more specific distributions may improve solutions to nonlinear inverse problems and is the source of future research.