Hidden Markov Linear Regression Model and its Parameter Estimation

This article first defines a hidden Markov linear regression model for the purpose of further studying the mutual transformation between different states in the linear regression model, and the regression relationship between the dependent variable and the independent variable in each state. And then, K-means clustering analysis methods are used to identify the hidden states of observed data, and the maximum likelihood estimation of the hidden state transition probability matrix elements is obtained by using the maximum likelihood estimation method, and parameter estimation of unknown parameters in linear regression model is also presented by using the least squares method. Finally, the observation vector set is generated according to the defined model, and the empirical simulation demonstrates that the parameter estimation method shown in this work is reliable.


I. INTRODUCTION
In the 19th century, when the well-known British biologist and statistician Galton studied the genetic laws of parent height and the height of their children, he established an empirical straight-line equation for the height of an adult child about the average height of the parent, and named it as regression equation. After more than 100 years of development and evolution, regression analysis has become to be an important mathematical method and has been widely studied and applied in many disciplines, such as biological signal analysis [1], marine biological optical relationship reasoning [2], etc. The essence of regression analysis is a statistical analysis method of the quantitative change rule between multiple variables with related relationships, and then according to a mathematical model, the value of the independent variable is used to estimate or predict the possible value of the dependent variable. According to different mathematical models, regression analysis can be divided into linear regression models and nonlinear regression models.
Hidden Markov Model (HMM) is used to describe a Markov process. Although the state of the hidden Markov The associate editor coordinating the review of this manuscript and approving it for publication was Fanbiao Li . model cannot be directly observed, it can be observed through the sequence of observation vectors, and each observation vector is represented by various probability density distributions in various states, and each observation vector is generated by a state sequence of the corresponding probability density distributions. Therefore, the Hidden Markov Model is a double random process, it has a certain number of hidden Markov chains and a set of display random functions, its research goal is to infer unobservable state transition information and distribution information in each state based on the information of observed variables [3], which in recent years has been widely used in wearable device data identification [4], SDN network early data stream matching [5], speech recognition [6], malfunction diagnosis [7], gene recognition [8], etc., and thereby providing us a series of research results.
Due to the good application effect of hidden Markov model in many fields mentioned above, many researches have paid their attention to Hidden Markov model. Yu used EM algorithm and formard-backward recursive algorithm to infer hidden Markov model [9]. Song [11]. Du investigates an adaptive sliding-mode controlled design problem for a class of Markov jump system with actuator faults [12]. Li et al. use homogenous polynomial approach to investigate Markovian jump system subjected to time-varying delays and infinite distributed delays [13].
Based on the existing research results of linear regression model and hidden Markov model, this article further studies the hidden Markov linear regression model with a fixed number of hidden states. For example, it is well known that family income has a linear correlation with various consumption expenditures, and a multiple linear regression model of income and various consumption expenditures can be established. However, the macro economic situation is divided into inflation and deflation, and the consumer market is divided into consumption upgrade and consumption degradation. When the macro economy is in two different states with inflation and deflation, and the consumer market is in two different states with consumption upgrade and consumption degradation, the regression relationship between the dependent and independent variables in the linear regression model is different, and these two states are often changed. Therefore, it is necessary to further study the mutual transformation law between different states, and the regression relationship between the dependent variable and independent variable in each state. In this article, a model capable of correctly expressing the rules of mutual transformation between different states and the regression relationship between the dependent variable and the independent variable in each state, named a hidden Markov multiple linear regression model, is introduced and is then committed to the model inference and its parameters estimation research.
Last but not least, the key difficulty in hidden Markov model inference is how to determine the hidden state of observation variables. At present, the most commonly used method to determine the hidden state of observation variables is forward backward recursive algorithm. However, the calculation of this algorithm is too complex to be realized, which brings difficulties to the applications of hidden Markov model. In this article, K-means cluster method is employed to address this problem. The hidden Markov model reduces the threshold of using hidden Markov model, because of its simplicity and fundamentality. This is the important contribution and value of this article, since it brings benefits to the application of hidden Markov model.

II. HIDDEN MARKOV LINEAR REGRESSION MODEL
Let Z t be the hidden state at the t-th observation time point, and its value range is {1, 2, . . . , K }, and vector Dt = (y t , x t1 , x t2 , . . . , x tr ) T is the observation vector of the model at the t-th moment, where, y t is the value of the dependent variable at the t-th moment, x t1 , x t2 , . . . , x tr is the value of r independent variables at the t-th moment, t is the observation Assume that the transition process of the hidden state satisfies the conditions of the Markov chain given below. (1) where u = 1, 2, · · · , K ; s = 1, 2, · · · , K ; t = 2, 3, · · · , T ; a us is the transition probability from the hidden state u of the previous time point to the hidden state s of the next time point, the matrix of all possible hidden state transition probabilities is called the hidden state transition probability matrix, which is written as shown in the following formula.
In summary, the triad [D t , Z t , A] is called a hidden Markov model with K hidden states.
When the hidden state Z t = k, a multiple linear regression model describing the relationship between the independent and dependent variables can be defined as follows. where Therefore, formula (3) is the hidden Markov linear regression model studied in this article.

III. THE PRINCIPLE OF PARAMETER ESTIMATION A. THE DETERMINATION OF HIDDEN STATE
The hidden state determining of observation vectors is an important issue in the study of hidden Markov statistical models. In previous studies, Chen et al. used forward-backward algorithms to determine hidden states [14], and Liu et al. selected appropriate prior distributions for model parameters, and used Bayesian methods to infer the number of hidden states [15], [16]. However, both the forward and backward algorithms are very complicated not only in theoretical ideas but also in execution processes. Bayesian methods have many limitations in applications owing to the involved selection of prior distributions. Therefore, this article uses the cluster analysis method in multivariate statistical analysis to determine the hidden state of the observed variables.
Cluster analysis is a modern statistical analysis method that divides research objects into several categories according to certain rules [17]. Common cluster analysis methods include systematic clustering and K-means clustering. The systematic clustering method first treats each case as a class, and then continuously merges classes according to the distance among classes until all cases are classified into one class, then a pedigree is obtained, and cases are classified reference to the pedigree. Choosing different distances will lead to different clustering results, each step of the systematic clustering algorithm needs to calculate the inter-class distance, so, the calculation amount of systematic clustering method is very large, and it takes up a lot of computer memory space, which thus requires high computing power. In order to improve this deficiency, Macqueen proposed a fast clustering method in 1965, which is the so-called K-means clustering method [18]. The K-means clustering method first roughly divides n objects into K categories, and then modifies the unreasonable classification according to some optimal criterion until the criterion function converges, and then obtains the final classification.
For the hidden Markov model with a fixed number of hidden states, this article uses K-means clustering to divide observation points into K classes. The class corresponding to each observation point can be used as the hidden state of the observation point at that time.
The principle and steps of k-means clustering method to determine the hidden state are as follows: Step I, When the number of hidden states is S, randomly select S observation values as the initial cluster center, denoted as j ), j = 1, 2, . . . , S, i = 1, 2, . . . , S, j = i}.
Step m, similar to step 2 to get the classification G (m) = G i . When m gradually increases, the classification tends to be stable, at this time, X (m) i will be approximated to the center of gravity of the class. Therefore, when X , the clustering is complete. At this time, the class of each observation is the hidden state of the observation.
Because K-means clustering algorithm is convergent, the clustering result of the observed variable is convergent, and the hidden state judgment result is convergent, so the parameter estimation result is convergent.

B. THE ESTIMATION OF HIDDEN STATE TRANSITION PROBABILITY MATRIX
The hidden state transition probability matrix is an important part of the hidden Markov model, and its estimation has always been one of the core research problems of the model. Common estimation methods include the moment estimation method and the Baum-Welch algorithm based on the EM algorithm [19]. These two algorithms require researchers to have strong programming skills. This article uses the traditional classic maximum likelihood estimation method [20].
Let N ij be the number of samples transferred from the previous hidden state z i to the next hidden state z j during the hidden state transition. In the hidden state transition probability matrix, the transition probability between the rows does not affect each other. For simplicity, the row index can be ignored and the maximum likelihood estimate of the transition probability can be derived using the transition probability of any row as an example.
Since the number of hidden states K is determined, there are K possibilities for the transition from the previous hidden state to the next hidden state, and the sum of the probability Then the likelihood function of the transition probability of any row can be written as Taking the natural logarithm on both sides of the above formula, we get the log-likelihood function as follows. lnL a j = N 1 lna 1 + N 2 lna 2 + · · · + N K −1 lna K −1 Using the method of finding the maximum value of a function in analysis, in what follows we calculate the maximum point of a log-likelihood function. Calculating the partial derivative of the log-likelihood function lnL a j with respect to a 1 , we get Finally, by making the partial derivative zero, we have Then, Similarly, for any a j (j = 1, 2, · · · , K − 1) whose partial derivative is zero, we can obtain a j a K = That is, a 1 : a 2 : · · · : a K −1 : a K = N 1 : N 2 : · · · : N K −1 : N K . VOLUME 8, 2020 And K i=1 a i = 1, so the maximum likelihood of a i can be estimated as,â From the arbitrariness of the row mark i, the maximum likelihood estimate of any element a ij in the hidden state transition probability matrix can be obtained as shown in formula (4):â where i = 1, 2, · · · , K ; j = 1, 2, · · · , K •

C. THE ESTIMATION OF LINEAR REGRESSION COEFFICIENTS
This section mainly studies the estimation of the linear regression coefficient β k under different hidden states. Since the parameter estimation of regression coefficients in traditional linear regression models mostly uses the least squares method or maximum likelihood estimation, and the least squares estimation has good characteristics such as optimality and unbiasedness, this article tends to use the least squares method to estimate the regression coefficient β k . As mentioned before, in the k-th hidden state, the model can be written as follows The so-called least squares estimation is to find β k , and to estimate when Y k − X k β k T Y k − X k β k taking the minimum value. Recording the sum of squared errors Q β k = Y k − X k β k T Y k − X k β k , and finding the least square estimate of β k being equivalent to finding the minimum value of Q β k . Using the matrix derivative formula, we get Let X k be a full rank matrix, and the least square estimate of the regression coefficient vector β k is

IV. EMPIRICAL SIMULATION
In order to test the reliability of the inference method of the hidden Markov multiple linear regression model introduced in this article, this section will give the number of hidden states K , the hidden state transition probability matrix A, and the linear regression model in each hidden state Coefficient β k . First, a hidden state sequence set is generated according to the transition probability matrix A, and then the observation vector at each observation time point is generated according to the hidden state value of each observation point and the value of the multivariate linear regression model corresponding to the hidden state. Then, according to the abovementioned method introduced in this article, the K-means clustering analysis method is used to cluster and identify the hidden states of the observation vector set, and the least square estimation of the coefficient β k in the linear regression model with the number of hidden states is fixed. Finally, the results of parameter estimation are compared with real models to verify the reliability of the method.

A. SIMULATION I
First, taking the number of hidden states K = 2, then the hidden state probability transition matrix A is a second-order square matrix. Letting, For simplicity, suppose there is a ternary linear regression model in each hidden state. Specifically, In true simulation [21], two hidden states are generated first, and then 200 observation points are randomly generated according to the setting of the two hidden states, recorded as Note that the frequency of the two hidden states obtained by computer simulation is z 1 and z 2 , and the frequency of the two hidden states after cluster analysis is z 1 and z 2 .
In simulation I, the effects of identifying the two hidden states using K-means clustering are shown respectively in Table 1 and figure 1, and the experimental results are shown in Table 2, 3. In simulation II, the effects of identifying the two hidden states using K-means clustering are shown respectively   in Table 4 and figure 2, and the experimental results are shown in Table 5, 6.

C. SIMULATION ANALYSIS
In Figure 1 and 2, we use the green line to represent the real hidden state, while the red line to represent the hidden state result of clustering analysis. Table 1     After clustering analysis, this article then uses Eq. (4) to estimate the maximum likelihood of the hidden state transition matrix. The experimental results are shown in Table 2 and 4. Finally, in order to obtain a linear regression model, this article continues to use the explanatory variables and response variables after clustering to perform a least squares estimation. The experimental results are recorded in Table 3 and 6.
The results in Tables 1 to 6, Figure 1 and 2 show that the K-means clustering method is effective for the hidden data identification, hidden state transition probability matrix, and linear regression model parameter estimation of observation data.

V. CONCLUSION
This article combines the hidden Markov model and the linear regression model to give the definition of the hidden Markov linear regression model. The hidden state determination, state transition probability matrix and parameter estimation problems involved in the hidden Markov linear regression model are introduced. K-means clustering method is used to determine hidden states. This is the innovation of this article. We use maximum likelihood method to estimate the element of transition probability matrix. And we also use the least square method to estimate the parameter of linear regression model. Simulation results demonstrate that the estimate effect is good. However, this article studies a hidden Markov linear regression model with a fixed number of hidden states, and the use of the model has certain limitations. Therefore, the next research direction will be the reasoning and application research of more complex models, such as hidden Markov logistic regression model, hidden Markov quartile regression model, and hidden Markov logarithmic linear model, and nonlinear stochastic semi-Markov model [22], [23], or hidden Markov model with an unknown number of hidden states [24].