Consistency and Asymptotic Normality of the Maximum Likelihood Estimator in GaGLM

The Gamma distribution based generalized linear model (<inline-formula> <tex-math notation="LaTeX">$Ga$ </tex-math></inline-formula>GLM) is a kind of statistical model feasible for the positive value of a non-stationary stochastic system, in which the location and the scale are regressed by the corresponding explanatory variables. This paper theoretically investigates the asymptotic properties of maximum likelihood estimates (MLE) of <inline-formula> <tex-math notation="LaTeX">$Ga$ </tex-math></inline-formula>GLM, which can benefit the further interval estimates, hypothesis tests and stochastic control design. First, the score function and the Fisher information matrix for <inline-formula> <tex-math notation="LaTeX">$Ga$ </tex-math></inline-formula>GLM are derived. Then, the Lyapunov condition is derived to ensure the asymptotic normality of the score function normalized by the Fisher information matrix. Based on this condition, the asymptotic normality of the MLE of <inline-formula> <tex-math notation="LaTeX">$Ga$ </tex-math></inline-formula>GLM is proven. Finally, a numerical example is given to testify the asymptotic properties obtained in the research. The numerical results indicate that the MLE of <inline-formula> <tex-math notation="LaTeX">$Ga$ </tex-math></inline-formula>GLM converged to a normal distribution as the number of sample measurements increased.


I. INTRODUCTION
The generalized linear model (GLM) expands the general linear model so that a dependent variable is linearly related to the factors and covariates via a specified link function [1]. Moreover, the model allows the dependent variable to keep the attribute of actually applied data, such as integer literal, positive and asymmetric, not belong to a normal distribution. It covers widely used statistical models, such as logistic regression models for binary distributed responses, Poisson regression models for count data and Gamma regression models for positive real data.
As a family of moderate skewness and continuous phenomena distributions, the Gamma distribution is a useful model in many areas of statistics when the normal distribution is not appropriate. In the Gamma distribution-based approach, the system output Z can be assumed to be a subject Z ∼ Ga(α, β), where Ga(α, β) is a Gamma distribution with the shape parameter α and the rate parameter β governing its probability density function shape. This distribution was first introduced [2] and subsequently studied in detail [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Sabah Mohammed .
Because of the flexibility of the relationship to many other distributions, the Gamma distribution can be a suitable alternative for modelling such kinds of the positive-valued dependent variable. The Gamma distribution-based models have been applied in many areas, such as medical science [4], [5], biology [6], economics [7], [8], forest science [9] and education [10]. Considering the ubiquitous heteroscedasticity of actually applied data, as a member of the well-known GLM, the Gamma distribution based generalized linear model (GaGLM) is more widely used when α and β are both dependent variables. However, it should be noted that the GaGLM does not belong to the exponential family of distributions based GLM [11]. Therefore, a baseable asymptotic theory for GaGLM is established.
This research investigates the theoretical aspects of maximum likelihood estimator (MLE) for GaGLM. Because GaGLM is a model with two equations (4) being respectively parameterized for α and β, the estimation procedure could be relatively complex. In statistics, several expectation-maximization (EM) type algorithms have been developed for the Gamma distribution inference, where β was assumed to be a latent variable [12]. However, those algorithms were developed by fixing β as constant. If β is parameterized as regression models, the EM algorithm would be extremely computationally involved. Thus, instead of using EM algorithms for latent variable, MLE for GaGLM by using the Fisher scoring algorithm is performed [13]. To this end, the score function and the Fisher information matrix are derived for GaGLM. Furthermore, we obtain the condition to assure the positive definiteness of the Fisher information matrix.
The consistency and the asymptotic normality explaining the efficiency of the estimators have been widely investigated in system identification and statistics [14], [15]. The consistency of MLE for GaGLM can be proved by using the same approach for GLM [15]. To verify the asymptotic normality of MLE, the asymptotic normality of the normalized score function is necessary. GLM was developed for the exponential family, whose moment generating functions are exponential functions of the sufficient statistics. Based on the uniform moment generating function, the asymptotic normality of the normalized score function was proved for GLM.
To investigate asymptotic properties of MLE for parameters occurring in GaGLM, we need to prove the consistency and asymptotic normality of MLE by central limit theorems. Compared with commonly used Lindeberg condition, Lyapunov condition is stronger in proving asymptotic properties. First, we derived the score functions normalized by the Fisher information matrix for the Lyapunov condition, which ensure the asymptotic normality of the normalized score functions [16]. Based on this result, the asymptotic normality of MLE for GaGLM is finally proved. These results can dramatically facilitate the hypothesis testing, the construction of interval estimates, and stochastic control design for the non-stationary stochastic system [17], [18].
The rest of this paper is organized as follows. The concept of GaGLM and maximum likelihood estimation are introduced in Section II. Section III gives the assumptions of asymptotic properties of the MLE in GaGLM, including the proof of related lemmas and theorems. Results of a simulation study are reported in Section IV. Concluding comments are presented in Section V.

II. PROBLEM STATEMENT
In this section, we briefly review GaGLM, including its structure and numerical method of MLE.

A. MODEL AND ESTIMATION
Suppose that we observe realizations of a positive real random variable Z , and we believe that Z has a specified positive continuous distribution.
Let D n = {(Z i , x i , y i ), i = 1, . . . , n} be independent random vectors defined on the probability space ( , F, P).
For each i = 1, . . . , n, the response variable Z i is generated from the following process: where Ga(α i , β i ) denotes the Gamma distribution with positive shape parameter α i and rate parameter β i . The probability density function is where (·) is the Gamma function. The mean and variance of the random variable Z i are given by and Then, we can develop GaGLM by regressing explainatory variable ω i = (x T i , y T i ) T with x i ∈ R p to α i and y i ∈ R q to β i as follows: where a = (a 1 , . . . , a p ) T and b = (b 1 , . . . , b q ) T denote the regression parameter vectors for α i and β i respectively, and • T denotes the transpose of •. Further, θ = (a T , b T ) T is any parameter in an admissible set K θ ⊂ R p+q . For the observations z 1 , z 2 , . . . , z n , the log-likelihood l(θ ) derived from the GaGLM can be written as where log denotes the natural logarithm. Then, θ can be estimated by θ = arg max l n (θ ).
According to (5), the first three order derivative of l n (θ ) with respect to θ is continuous and finite for all θ ∈ K θ . This condition ensures the existence of the Taylor expansion, the finite variance of the derivatives of l n (θ ). Thus, MLE can VOLUME 10, 2022 be obtained by the scoring method [19], in which the score function can be obtained by and the Fisher information matrix can be obtained by With the score function and the Fisher information matrix, (6) can be iteratively solved by using the generalized Newton-Raphson (NR) method, so-called Fisher's scoring (FS) algorithm [19] as the following In what follows, the score function and the Fisher information matrix are derived for GaGLM. Furthermore, the condition that ensures the positive definiteness of F n (θ ) obtained in Corollary 1.
In statistics, the asymptotic properties, mainly including the consistency and asymptotic normality, are often used to evaluate the efficiency of estimators [20]. Another important role of s n (θ ) and F n (θ) is to prove the asymptotic properties. If the first three order derivates of l n (θ ) with respect of θ exist, the consistency, i.e. θ converging in probability to the true coefficients θ 0 , can be proved under a generalized framework [21]. However, the asymptotic convergence of the covariance matrix for GaGLM cannot be proved by using the generalized approach in [21]. To tackle this problem, we first prove the asymptotic normality of the normalized score function F −T/2 n (θ )s n (θ) motivated by [15]. Note that [15] dealt with the exponential family-based models, whose moment generating function is the exponential function of the sufficient statistics. [15] used such moment generating function to prove the asymptotic normality of the normalized score function. However, there is not an asymptotic theory of MLE to Gamma distribution, where the approach in [15] cannot be extended to the GaGLM. Furthermore, the elements constructing F −T/2 n (θ )s n (θ ) cannot be expected to be identically distributed. Therefore, by investigating the Lyapunov condition and Taylor expansion, we can prove the asymptotic normality of GaGLM's MLE. In what follows, we first derive the score function and the Fisher information matrix of MLE of GaGLM.

III. SCORE FUNCTION AND FISHER INFORMATION MATRIX FOR GaGLM
For deriving the score function and the Fisher information matrix, the log-likelihood function of θ is formulated from (6). The score function (7) can be represented as follows.
Proof Lemma 1: The following derivatives can be directly derived from (4).
Then, derivatives of the log-likelihood function (4) are straightforwardly obtained. The Fisher information matrix will be derived via the Hessian matrix follows We define Hessian matrix H n (θ ) as follows to establish the Fisher information matrix F n (θ ) shown below.
Multiply at each side of equation (26) by (α), we can get Then, take the k-order derivative with respect to α of both sides, that 28) Divided by (α) at both sides of (28), we obtain For k = 1, then Let f k (•) denote finite j-order polynomial of log β by j ≤ k and linear combination of E(log i Z ) with VOLUME 10, 2022 i = 1, 2, . . . , k −1 function. Combining (25) and (27), we can get Ga(α, β), the kth moment of Z is limited as where C is positive constant depending on α, β and k, and k > 0 is any finite positive integer. Proof Lemma 3:

Theorem 1 (Fisher Information Matrix for GaGLM):
The components of the Fisher information matrix are obtained as follows: Proof Theorem 1: Under the assumptions of mild general regularity, we have F n (θ ) = EH n (θ ) by [24], and F n (θ ) is positive semi-definite matrix [25]. Thus, using Lemmas 1, 2, equations (8) and (13), the Fisher information matrix can be straightforward computed as follows F n (θ ) = EH n (θ)

Corollary 1 (Definiteness of Fisher Information Matrix for GaGLM): If
Proof Corollary 1: To prove the positive character of the Fisher information matrix, we need derive the range of αψ 1 (α). From the equation (22) as n = 1, we can get following inequality Then, we can get If n i=1 ω i ω T i is of full rank, the Fisher information matrix F n (θ) is positive-definite.

IV. ASYMPTOTIC THEORY FOR THE MAXIMUM LIKELIHOOD ESTIMATOR IN GaGLM
Under the mild assumptions, the asymptotic properties of the MLE was proved in GLM for canonical link functions [15]. These asymptotic conditions can be applied to prove similar results for GaGLM as well as noncanonical.
Let · denote the spectral norm of square matrices. The spectral norm of a real-valued matrix F is given by where · 2 denotes the L 2 − norm of vectors. The maximal (minimal) eigenvalue of a square matrix F will be further denoted by λ max (F) (λ min (F)). For ε > 0, a neighborhood of the unknown true parameter θ 0 can denote by In this paper, let's make the following assumptions. (A1) where C is a positive constant.
and K y ⊂ R q are compact sets. (A3) K θ ⊂ R p+q is an open set, and θ 0 is an interior point of the set K θ . Furthermore, Assumption (A1) means that λ min (F n ) and n are the same order infinity, which is used to prove Lemmas 4 and 5. Assumption (A2) implies what we deal with are compact regressors. If θ lies on the boundary of parameter space K θ , the statements of Theorem 2 do not valid anymore.
Based on the assumptions above, we need to prove two preliminary Lemmas 4 and 5 for asymptotic properties of MLE θ first.
where N P (0, I p+q ) is a (p + q)-dimensional normal distribution with mean vector 0 and covariance matrix I p+q . Proof Lemma 4: Derived from Cramer-Wald [20], we only need to prove that a linear combination u T F −1/2 n s n converges in distribution to N (0, u T u) for any vector u ∈ R p+q (u = 0). Without loss of generality, we set u = 1. Then, let for r = 1, . . . , p, and ψ 0 (·) is digamma function (seen in Equation (20)).
To prove (62), it is sufficient to establish that the (j, k)element of the random matrix (H n (θ ) − EH n (θ )) /n converges to zero in probability, i.e.
There are three different types of entries (14), (15) and (16) in the Hessian matrix. We will show the convergence of formula (64) in the cases of 1 ≤ j, k ≤ p. It is similar to treat the remaining cases. In order to avoid generality, let j = p and k = p, then We have the following bounds: From the law of large numbers and standard arguments, we can get G n → 0 in probability as n → ∞. It remains to show = max The continuity in θ of the function max x∈K x ,y∈K y e 2x T a ψ 1 (e x T a ) − e 2x T a 0 ψ 1 (e x T a 0 ) with value zero at θ = θ 0 yields that G 2n converges to 0 in probability as n → ∞.

Theorem 2:
Under the assumptions (A1)-(A3), the asymptotic normality of θ can be obtained as the following Proof Theorem 2: With the mean value theorem, we have for 0 < τ < 1 and s n ( θ ) = 0. By pre-multiplying F T/2 n and integrating with respect to τ on [0, 1], we have Meanwhile, Lemma 5 implies that By using Lemma 4 and the continuous mapping theorem [28], the asymptotic normality of θ can be proved.

V. SIMULATION STUDY
In this section, we will provide some simulation experiments to illustrate our asymptotic theory and stability results. y 2 were chosen equally spaced between −1 and 1. Further, for distinguishing the effects of different size parameters on the results, we examined set a 0 = −1, a 1 = 1, a 2 = 3, b 1 = 1 and b 2 = 3. Since we are also interested in the case when GaGLM does not satisfactorily fit the count regression data. For each combination of sample size n, setting, we simulated 100 samples of responses Z i 's, i.e., Z i ∼ Ga(exp(a 0 + a 1 x i1 + a 2 x i2 ), exp(b 1 y i1 + b 2 y i2 )) for i = 1, . . . , n.
Henze-Zirkler test [29]was used for normality test ofâ 0 , a 1 ,â 2 ,b 1 andb 2 respectively, in which p-value larger than 0.05 indicates the data fitting well to normal distributions. We computed the average estimate, the estimated mean squared error (MSE) and the mean absolute error (MAE) to indicate the convergence status with the increase of sample numbers in 100 replications for each considered case, shown in Table 1. Simulation results reveal that the average estimate of each parameter close to the true value roughly as the sample size n increases. With n increase, the truncation error in the iterative process affects the estimation accuracy. The MSE and MAE decrease strictly as the number of samples increasing, demonstrate similar patterns.
We also test the normality of each parameter with estimating result by 100 replications. Due to the same range of randomly generated samples, when the number of samples is limited, i.e. n = 50, 100 and 200, the estimation of the smaller parameters will be affected by the bigger parameters. As the number of samples increases, when the number of samples reaches n = 2000, the estimated value tends to be stable and presents a normal distribution.
A normal quantile-quantile (QQ) plots for the empirical distribution of multi-normal components are illustrated in figure 1. When the sample size is n = 50 and 100, there are more outliers in the multivariate normal QQ plots. When the sample size increases to n = 200 and 500, the QQ plots tend to be stable, when n = 1000 and 2000, the QQ plots are normally distributed. Figure 2 illustrates the convergence of parameters estimation by different size of samples. With the increase of samples, the mean value of each estimated parameter gradually approaches the real value, and the fluctuation range gradually decreases, that is, the variance decreases. Therefore, it indicates that the parameter estimation value converges to the actual value, and the maximum likelihood estimate is consistent. Among them, it can be seen from the figure that due to the difference in parameter size, the bigger parameter is easier to converge to the real value, which has the significance of estimation.

VI. CONCLUSION
GaGLM is a kind of generalized regression model for investigating positive real data. This research obtained several theoretical results of MLE for GaGLM. The score function and Fisher information matrix were derived. Then, the Lyapunov conditions were obtained to ensure the asymptotic normality of the score function normalized by the Fisher information matrix. Consequently, the asymptotic normality of MLE for GaGLM was proved.
In the derivation process, we also discussed the range of the logarithmic k-order expectation E(log k Z ) when Z obey the Gamma distribution. And we proved the inequality holds on the trigamma function ψ 1 that αψ 1 (α) > 1 where α > 0. Moreover, the simulation study illustrates that the normal approximation is satisfactory for moderate and large VOLUME 10, 2022 sample sizes. Finally, with the established asymptotic theory, we can further benefit interval estimates [30], hypothesis tests [31]and stochastic control design [32]in a theoretical basis.