Recursive Identification of Time-Varying Hammerstein Systems With Matrix Forgetting

The real-time estimation of the time-varying Hammerstein system by using a noniterative learning schema is considered and extended to incorporate a matrix forgetting factor. The estimation is cast in a variational-Bayes framework to best emulate the original posterior distribution of the parameters within the set of distributions with feasible moments. The recursive concept we propose approximates the exact posterior comprising undistorted information about the estimated parameters. In many practical settings, the incomplete model of parameter variations is compensated by forgetting of obsolete information. As a rule, the forgetting operation is initiated by the inclusion of an appropriate prediction alternative into the time update. It is shown that the careful formulation of the prediction alternative, which relies on Bayesian conditioning, results in partial forgetting. This article inspects two options with respect to the order of the conditioning in the posterior, which proves vital in the successful localization of the source of inconsistency in the data-generating process. The geometric mean of the discussed alternatives then modifies recursive learning through the matrix forgetting factor. We adopt the decision-making approach to revisit the posterior uncertainty by dynamically allocating the probability to each of the prediction alternatives to be combined.


I. INTRODUCTION
The Hammerstein model consisting of a static nonlinear curve followed by a linear filter provides a capacity to represent a broad class of input nonlinear systems [1], [2]. The list of existing approaches [3] indicates that the Hammerstein model estimation is still dominated by prediction error and maximum likelihood-type methods. The unknown parameters are then obtained by optimizing a certain criterion to best fit the model to the data. This traditional concept is predominantly tied up with point estimation. The available recursive solutions mostly accumulate approximation errors by replacing lossless estimation with one step approximation. This replacement is motivated by updating the latest approximated posterior via a treated parametric model. As a result, approximation errors may accumulate to an extent degrading the estimator performance, making these strategies vulnerable to an inaccurate initial guess. This article aims to identify the Hammerstein system by approximating the exact posterior probability density function (pdf). The error accumulation is completely avoided by propagating the sufficient statistics of the overparameterized model, which serve as information-bearing for the posterior pdf approximation. The search for the approximate pdf is made optimal by adopting the variational Bayes (VB) method, factorizing the posterior into the product of independent VB-marginals (for a detailed overview, see [4]). The resulting method is designed to account for a hard constraint imposed on the nonlinear curve parameters to uniquely determine the filter gain. The VB method has proven its efficiency when, for instance, tailored to solve the identification for the nonlinear autoregressive with exogenous input (NARX) system [5], to jointly estimate the state and the measurement noise covariance parameters [6], and to iteratively identify the multiple-model based Hammerstein parameter varying systems [7].
In this article, the capability of tracking unmodeled changes in the system dynamics is conceptually achieved via forgetting. At the Bayesian level, a sort of forgetting arises through combining the posterior pdf with its flattened alternative. The combination strategies prominently involve the nonsymmetric Kullback-Leibler divergence (KLD) [8] with different properties depending on the order of the KLD arguments [9]. There is rich literature on the adaptation of a single forgetting factor causing the information about all of the system parameters to be uniformly discounted [10]- [13]. However, the formulation of a matrix forgetting factor capable of providing different forgetting rates for diverse parameter partitions has been neglected. The matrix forgetting algorithms available in [14] and [15] lack any contextualization within the optimization framework, and the solutions thus do not offer an optimal interpretation. Moreover, authors in [14] and [15] numerically search for a symmetric form of the matrix factor that may result in a generally nonsymmetric covariance matrix. In a recent paper [16], a sort of vector forgetting by modifying the least squares criterion is proposed. Importantly, authors in [14]- [16] do not provide any solution on how to apply forgetting differently to various parameters. The Bayesian counterpart to partial forgetting is described in [17], relying on the parallel schema to localize the parameters that are subject to change. On the basis of the work carried out in [11]- [13], [17], we develop a data-informed matrix forgetting factor allowing for tracking a particular parameter subset as well as all the parameters.
Briefly, this article is organized as follows. Section II formally states the estimation problem of the time-varying Hammerstein system from the Bayesian perspective. The relationship between the least squares-like method and the model sufficient statistics is explicated, leaving the question of the choice of the matrix forgetting entities and parameter extraction open. The question related to optimizing the matrix forgetting factor is answered in Section III by adopting the decision-making approach; two variants of the matrix forgetting factor are considered. The optimal extraction of the system parameters from the sufficient statistics in the presence of a hard constraint is discussed in Section IV, relying on the VB method. Section V employs simulated examples to provide empirical evidence of the algorithm performance. Finally, Section VI concludes the article.
Notation: 1 n refers to an n-dimensional vector, all of whose components are one; I n denotes an n × n identity matrix; tr(·) is the matrix trace; · 2 defines the Euclidean vector norm; | · | denotes the determinant; ⊗ symbolizes the Kronecker product; • denotes the Hadamard product; x * symbolizes the range of x;x is used to represent the number of members in a countable set x * or refers to the dimension of a vector x; x is the transpose of x; and f (x) is reserved for the pdf of a random variable x, optionally distinguished by its subscript. Further, the mathematical expectation of a function g(x) with respect to the δf (x) ; vec (·) is the vectorization operator; ≡ stands for equality by definition; and ∝ means equality up to a normalizing factor.

II. PROBLEM STATEMENT AND PRELIMINARIES
Consider a discrete-time SISO Hammerstein system in which a memoryless nonlinear curve is connected in series with a linear ARX subsystem where the current output y k depends on the current input u k and the set of past data through . The input u k and output y k are both measured on the system at the discrete time instants k ∈ k * ≡ {k 0 , k 0 + 1, . . . ,k} ⊂ Z to form the data record D k 1−n ≡ {u i , y i } k i=1−n , with n ∈ N referring to the longest time lag appearing in the system. The components of θ r;k = [r 1;k , . . . , r nr ;k ] combine basis functions g i (·) to modulate the curve shape, and the components of θ a;k = [a 1;k , . . . , a na;k ] and θ b;k = [b 0;k , . . . , b n b ;k ] define the dynamics of the ARX subsystem. The unmeasurable model noise e k is assumed to be white, normally distributed with a zero mean and a nonzero precision d k , that is, e k ∼ N (0, 1/d k ). The ordered set Θ k ≡ {θ a;k , θ b;k , θ r;k , d k } constitutes the random system parameters to be learned in view of sequential data retrieval. To prevent any information reduction during the data update, the functional form of the dynamic exponential family (DEF) ( §6.2.1 in [4]) is adopted as a template for the model parameterization.
Remark 1: The parametric model governed by (1) belongs to the DEF under the assignments with the normalizing factor exp[ Composing the sequence of the pdfs (2) by means of the likelihood function gives rise to the conjugate posterior pdf for the parametric model, having the form where v k is a vector of a dimension compatible with q(θ k , d k ), and the scalar ν k > 2 is referred to as the number of degrees of freedom. The data compression process is then reduced to the recursive updating of the sufficient statistics S k ≡ {v k , ν k }, which allows for learning about {θ k , d k } in tandem with data acquisition. The correspondence of the parametric model (2) to the DEF determines the conjugate pdf (7) as the normal-Wishart (N W) pdf, with the particular factors defined by which in turn assigns where Σ k > 0 is referred to as the least squares reminder. As noted earlier, we assume vague knowledge regarding how the parameters actually evolve, and this prevents us from employing the marginalization integral [18] Owing to such deficiency, we seek a forgetting operation that emerges from rescaling the covariances of θ k |S k , d k and d k |S k at each iteration to reinstate the parameter tracking capability. To guarantee a minimal amount of parameter-related information, the additional source to (2) is processed in a way that stabilizes the forgetting and thus compensates for the potential loss of persistency [12]. We have where Ξ is some symmetric positive definite matrix of an appropriate dimension, λ i ∈ (0, 1] is the forgetting factor, Σ 0 > 0, and ν 0 > 2. Substituting for the ideal transition operation (11) a forgetting operation, the update of the latest posterior is organized recursively with respect to Bayes' rule, as follows: where Λ k−1 ∈ Rθ ×θ is the matrix forgetting factor, and Ω k−1 ∈ Rθ ×θ is constructed so that the pdf N (θ k |θ 0 , Ω −1 k−1 Ξ −1 /d k ) embodies the residual regularization effect of the additional source (12), remaining in the estimator memory after the forgetting has been performed. The application of λ k−1 must coincide with a reduction of the degrees of freedom caused by the matrix forgetting to obtain a realistic esti- This coincidence is established in relation to both of the partial forgetting options and is discussed in Remarks 2 and 3. We expand the concept of a uniform rate of forgetting for all of the parameters by designing the self-tuning factor Λ k , which is able to localize forgetting on the parameters associated with the system measured input (see Section III).
By setting Λ 0 ≡ Ω 0 ≡ Iθ, λ 0 ≡ 1, and P 0 ≡ Ξ −1 in (13) the pdf (12) formally initiates the estimation procedure at the time k = 1. The conjugacy inherent between members of the DEF allows us to reduce the functional recursion (13) to the least squares-like algebraic recursion, namely Since θ k (3) contains product terms coupling the components of θ b;k and θ r;k , there is no solution to distinguish between {θ b;k , θ r;k } and {θ b;k /β, βθ r;k } scaled with some nonzero and finite constant β. To remove this scaling ambiguity, we fix r 1;k ≡ 1 by imposing the hard equality constraint on θ r;k (see Section IV).

III. DATA-INFORMED MATRIX FORGETTING
Let us now be concerned with the design of the factor Λ k enabling us to provide different exponential forgetting rates for two partitions of θ k . In conformity with the announced objective, four distinct alternatives, {f i (θ k+1 , d k+1 )} i∈{0,2} and {f 1,κ (θ k+1 , d k+1 )} κ∈{σ,ρ} , concerning the result of the time update are to be introduced. These alternatives delimit the boundaries on the increase in the parameter uncertainties to allow for a revision of the posterior pdf, reflecting unknown uncertainty about the parameters caused by their variations. The zero alternative f 0 (θ k+1 , d k+1 ) corresponds to the latest posterior available, extrapolating all information to describe {θ k+1 , d k+1 } accumulated so far, that is where δ(·) is the Dirac delta function. The second reference alternative f 2 (θ k+1 , d k+1 ) expects that all the parameters {θ k+1 , d k+1 } have changed within the interval (k, k + 1), and increases the uncertainty of the posterior accordingly, through α ∈ (0, 1) to yield The first reference alternative localizes the increase in the uncertainty to a lower dimensional posterior factor. To factorize the normal posterior part into low-dimensional pdfs, let P k be split in accordance with the splitting of θ k = [θ a;k , θ u;k ] , as follows: with P aa;k ∈ R na×na . Now, we can proceed to constructing the alternatives f 1,κ (θ k+1 , d k+1 ) corresponding to the expectation that only the subset of θ k+1 (θ u;k+1 ) associated with the input signal is subject to change. In this respect, two variants of partial forgetting are proposed, and their implications for the least squares routine are discussed. While the first option (κ ≡ σ) relies on the factorization of the normal posterior part according to (29) and (30) can be directly obtained by application of Claim 1 from [19] to the normal posterior part.
For the steps that follow, we need to partition V k and Ξ into blocks, identically to the partitioning performed in (28), namely, where V aa;k ∈ R na×na and Ξ aa ∈ R na×na . Remark 2: The partial forgetting based on modification of the information submatrix V uu;k [17] employs (29), with the marginal part flattened through α, yielding The above time update (32) corresponds to the matrix forgetting with Wishart part of the prediction alternative f 1,σ (d k+1 ) is suggested to reduce the degrees of freedom consistently with (32), which is fulfilled by solving where Π ≡ {θ 0 , Ξ, Σ 0 , ν 0 }. To solve (33), we consolidate the update (32) within Bayes' rule (33) is actually the normalizing factor for (34), it is obtained by integrating out θ u;k+1 from (34), which shows as whereō is some positive constant. From the above improper pdf (35), we can directly find the factor λ k = α to write Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

Remark 3:
The partial forgetting based on modification of the covariance submatrix P uu;k follows from (30), with the conditional part flattened through α, resulting in The above operation (37) determines the entities of the matrix forgetting as Λ k = and Ω k = All that remains now is to choose the Wishart part f 1,ρ (d k+1 ), which complements the description of the parameters with regard to the impact of Λ k on λ k . The inclusion of the forgetting operation (37) into the Bayes update yields The normalizing factor for (38) updating the Wishart two steps ahead From (39), it can be deduced that the Wishart part must be treated as indicating at the same time that no external flattening of the Wishart posterior occurs; therefore, λ k = 1.
The optimal designf (θ k+1 , d k+1 |S k ) seeks the best approximation of the time-updated posterior f (θ k+1 , d k+1 |S k ) by combining the prediction alternatives f * H,κ ≡ {f 0 , f 1,κ , f 2 } of the arguments {θ k+1 , d k+1 } into a single pdf. To execute this, a dissimilarity measure between the target pdf f (θ k+1 , d k+1 |S k ) and the particular alternatives is quantified by the KLD. The KLD between any two pdfs f T and f , attains its absolute minimum value, which equals zero, at f T ≡ f . To establish a meaningful combination strategy modulating the posterior with regard to the degree of the system nonstationarity, the decision step must respect information about the performances of the prediction alternatives. The required feedback is incorporated into the decision process by considering the nonnegative loss estimates * H,κ ≡ { 0 , 1,κ , 2 }. These are chosen to embody losses incurred at the previous step when selecting each of the prediction alternatives as the best projection of the current posterior (43) Here, the pdfs indexed by ζ refer to the particular normal pdfs {f 0 , f 1,κ , f 2 } of the argument θ k , where the precision d k is additionally multiplied by ζ. The user-defined factor ζ ∈ (0, 1] serves to increase the expected level of noise inherent in the normal parts to reduce false detections of parameter changes as a consequence. In particular, the smaller the value of ζ, the more conservative the forgetting.
The randomness about the alternative selection is modeled by assigning a probability ϕ i to each of the related pairs from {f H,κ , H,κ }, defining the weights by which the alternatives are combined. The probabilities are arranged into the vector ϕ ≡ [ϕ 0 , ϕ 1 , ϕ 2 ] , satisfying 2 i=0 ϕ i = 1. Let us now temporarily omit the time index in {θ k+1 , d k+1 , S k } and the index κ in {f 1,κ , 1,κ } for the sake of brevity.

The specification of a loss functional coherent with the Bayes principle involves evaluation of the expectation E[D(f (θ, d) f H,κ (θ, d)) − H,κ ] over {f H,κ , H,κ }. In this connection, we obtain
where the constraint scaled by the Lagrange multiplier η guarantees that the minimizerf (θ, d|S) integrates to one. The form of the minimizer for an arbitrary ϕ is inspected by the lemma below: Lemma 1: The unique constrained minimizer of the functional (44) over f (θ, d|S) turns out to be given by the geometric mean (44) can be rearranged into a sum of two parts: where the part independent of the inspected pdf f (θ, d|S), absorbs the achieved minimum value. Consequently, the evaluation of the necessary conditions for an extremum δL F δf = [ln(f/f ) + η + 1] and ∂L F ∂η = 0 determines the form of the minimizer (45). The uniqueness of the minimizer is given by the strict convexity of the KLD, which is reflected by δ 2 L F δf 2 = 1/f > 0. By substituting (46) into (44), one can prove that K(ϕ) delimits the lower bound on the functional approximation accuracy. To be more explicit, (48) Hence, taking into account the nonnegativity of the KLD, the best representative of ϕ is found as the maximizer of K(ϕ).
Recall that the search for the optimal value of ϕ according to Lemma 2 requires us to express the KLD between two normal-Wishart pdfs. The required analytical form for the KLD is explicated in Theorem 1 in [12]. The formulation of the time update on the basis of the elaborated Bayes principle affects the least squares routine only through the factors {Λ k , Ω k , λ k } as a consequence of mixing the prediction alternatives. Upon using Lemma 1 and assuming thatφ is known in advance, for and, for κ ≡ ρ, the factors are formulated as The next step is to find the optimal value of ϕ provided by the KKT conditions (49). This task requires us to examine whether the solution lies inside the feasible region S = {ϕ ∈ ϕ * ⊆ R 3 ≥0 : ϕ i ≥ 0, i = 0, 1, 2, 2 i=0 ϕ i = 1} or on the constraint boundaries. In order to perform an exhaustive description ofφ, captured in Table I, we introduce the auxiliary variables 02 , 12 , and 01 . By denotingr k ≡θ k −θ k−1 and letting If f 1,σ (θ k+1 , d k+1 ) (the case κ ≡ σ) is instated into K(ϕ) (47), the other variables are represented in Table I by This completes the derivation for the data-informed tuning of the matrix factors of the claimed forms.

IV. ESTIMATING THE HAMMERSTEIN MODEL BASED ON THE VB METHOD
The VB method is employed to restore the tractability of the inference problem via approximating the posterior pdf f (Θ k |S k ) by the product of conditionally independent posteriors. In this article, the target pdff (Θ k |S k ) is restricted to the product of the marginal pdfs for Under the constraint assumption, r 1;k ≡ 1, the first factor on the righthand side of (58) is recognized to be the exact marginal with χ θ L ≡ Σ k P L;k , designated by the Student's t (T ) pdf. Let us note that the estimateθ L;k represents the firstθ L rows ofθ k (21), and P L;k is nested in the firstθ L rows and columns of P k (22). Since the exact marginals describing θ L;k |S k and d k |S k are accessible, it only remains to infer θ r;k given {S k , d k } by eliminating the redundancies inθ u;k . To formalize the concept covered above, a loss functional quantifying the information loss incurred when moving from f (Θ k |S k ) tof (Θ k |S k ) is constructed and optimized within the calculus of variations approach to yieldf (θ r;k |S k , d k ). The loss functional takes the form The expressions in (60) scaled by the Lagrange multipliers η ι and η l activate the normalization and mean value hard equality constraints, respectively. The normalization constraint forces the VB-marginal f (θ r;k , d k |S k ) to be a proper pdf. By introducing anθ r −dimensional vector of the form l ≡ [1, 0, . . . , 0] , the mean value constraint rigorously sets the estimate of r 1;k to equal one. The conditions for the global optimality off (θ r;k , d k |S k ) are the statements reported by the lemma below: Lemma 3: Letf (Θ k |S k ) be established as an approximation of f (Θ k |S k ), with the restriction that the factorizationf (Θ k |S k ) = f (θ L;k |S k )f (θ r;k , d k |S k ) is the product of the independent marginals. Then, having a fixed functional form for the factor f (θ L;k |S k ) = T (θ L;k |θ L;k , χ θ L , ν k ), the unique minimum of (60) is reached for Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
wheref ξ (θ r;k , d k |S k ) denotes the VB-marginal that is unconstrained in the mean value of the estimand θ r;k (62) The multiplier η l is obtained by solving the integral equation Proof: It proves convenient to rewrite the KLD entering (60) into a sum of two parts where the part independent of the optimizedf (θ r;k , d k |S k ),  Lemma 3, we are in the position to specify the pdf f (θ r;k , d k |S k ). To this end, let us introduce the auxiliary variables and assume that P L;k is partitioned in compliance with the partitioning of θ L;k ≡ [θ a;k , θ b;k ] ∈ Rθ L , as follows: where P aa;k ∈ Rθ a ×θa . After some algebra (see Appendix A), the optimal approximation is found to bê whereξ k is the mean value of θ r;k , on which the equality constraint is not imposed. The normalized (by d k ) covariance matrix P r;k and the meanξ k are calculated as To round out the implementation issues, the computation procedures for the proposed estimator are summarized by Algorithm 1.

V. SIMULATION STUDIES
This section presents several numerical examples to illustrate the behavior of the developed methods. To show its advantages, the suggested VB strategy to identify the Hammerstein system is compared with the simple averaging (AV) approach [7]. To demonstrate the main feature of the matrix forgetting, the change localization capabilities are tested.
The Hammerstein system with a polynomial input nonlinearity is estimated The parameters of the linear filter {a i , b i } are chosen to correspond to the discretized transfer function G(s) = 1.5(−0.5s + 1)(s + 1)/(T 2 s 2 + 2ξ g T s + 1) sampled with the period of 1 s, where ξ g = 0.6, and T is set as T = 5 s. The polynomial coefficients, if not stated otherwise, are specified as r 1 = 1, r 2 = −0.4, and r 3 = 1 20 . The input sequence {u k } is produced by the autoregressive model u k = 0.9u k−1 + w k driven by a discrete white noise, w k ∼ N (0, 1). All the experiments are monitored within the time span of 0 − 500 s. The fit between the model and its estimate is evaluated through the measure whereΔ k ≡ θ a;k −θ a;k and β ≡ 1 . In view of the user-defined input arguments to Algorithm 1, the estimation starts from Ξ = 10 −1 I 11 , Σ 0 = 1, ν 0 = 10, andθ 0 is chosen to be an 11−D vector, all of whose components are zero.

A. Estimation Quality Achieved With the Recursive Strategies
The following example compares the proposed solution (VB) with the averaging strategy (AV) to decouple the coefficient products between the linear and the nonlinear subsystems paired via the least squares estimateθ k . Similarly to our solution, the AV strategy builds on the assumption that the first polynomial coefficient is set as one (r 1 = 1) but it eliminates the redundancies by computing their average valuê where j = 2, . . . ,θ r and θ[i] denotes the ith entry of the vector. Note that the forgetting operation is not contemplated in this example (Λ k ≡ Ω k ≡ I 11 , λ k ≡ 1). The results attained for the decoupling strategies are shown in Fig. 1 , is tested. From Fig. 1(a), we can derive that both of the strategies exhibit similar steady-state responses at the low noise level (d = 10 4 ). If we increase the noise intensity to d = 10 [see Fig. 1(b)], the proposed VB strategy provides more robust, uncertainty-aware averaging than that given in (73).

B. Tracking Quality Achieved With the Forgetting Strategies
The simulation runs are performed with either the least squares with stabilized matrix forgetting proposed herein or the standard exponential least squares endowed with the adaptation rule by Ydstie and Sargent [10], with the tuning knob Q representing the effective number of degrees of freedom.
The experiment that follows is undertaken to show the advantages of matrix forgetting when only the polynomial curve coefficients are subject to change. The initial coefficient values satisfy θ r;0 = [1, −0.2, 0.1] and there is a sudden change at the time t = 250 s, which results in θ r;250 = [0.4, −0.4, 1 8 ]. The matrix forgetting strategies employing f 1,σ and f 1,ρ will be labeled as MF σ and MF ρ , respectively. The scalar forgetting driven by the rule (74) is referred to as SF. For this scenario, the settings {α = 0.5, ζ = 0.5} are assigned to both the matrix forgetting strategies, and SF is in operation with Q = 20. The information matrix is initiated from Ξ = 10 −2 I 11 , and the noise precision d is assigned as 10 3 .
The result of the experiment is captured in Fig. 2. The MF ρ technique attains the smaller Δ k [see Fig. 2(a)] than the other forgetting strategies, as it effectively exploits the probabilityφ 1 [see Fig. 2(b)] indicating the source of inconsistency. Note thatφ 2 also receives some support [see Fig. 2(b)] since the changing parameters are a subset of all the parameters. Importantly, the MF ρ algorithm has exhibited its ability to selectively track the changing parameters [see Fig. 2(d)], leaving the estimate of the filter parameters θ a;k unaffected [see Fig. 2(c)].

VI. CONCLUSION
The main aim of this article have been to resolve the problems of selective forgetting along with the recovering of the Hammerstein model parameters (lost in the overparameterization step) within a rigorous probabilistic framework.
The novelty of the research rests in theoretical results leading to modifications of the recursive least squares method. Remarks 2 and 3 present two partial forgetting strategies: While the former Remark introduces the matrix factor that discounts the information submatrix [17], the latter one proposes a concept modifying the covariance submatrix. This modification is essentially justified by the Kalman filtering-based estimation view, where only a subset of the parameters is time-varying, driven by a random walk with a known covariance matrix of parameter increments. Lemmas 1 and 2 (elaborate on the results stated in [12]) allow us to derive a novel and formal approach for dealing with automated selective forgetting based on the geometric mean of the pdfs. Remark 1 classifies the Hammerstein model as a member of the DEF, enabling a closed-form expression for the propagation of the sufficient statistics of the overparameterized model. Lemma 3 converts the problem of eliminating the redundancies in the least squares estimate of the overparameterized model into an optimization problem, tailoring the VB method to identify the Hammerstein systems. Consequently, the exact posterior is approximated at each step ex post, after the data update has been completed, avoiding any transmission of the VB-moments through iterative cycles. , which are proven in [22]. With the augmented information matrix V k given in (10)  where Σ ξ;k is the least squares reminder associated with the quadratic form (A.6). For the solution to respect the equality constraint imposed on θ r;k , it is necessary to acquire the analytical definitions for η l and exp[1 + η ι ]. We found η l by solving (63), as follows: