Prediction of Blood Glucose Level Using Nonlinear System Identification Approach

Predicting the blood glucose level of type 1 diabetes mellitus of patients could prevent hypo/hyperglycemia incidents that are threats for the patients with this disease. A nonlinear system identification approach is proposed in this work to develop a mathematical model, which can be used to predict the blood glucose level over a given period with high accuracy. More specifically, the Hammerstein Box-Jenkins model is used to approximate the system, where two infinite impulse response filters represent the linear and noise processes, and a polynomial basis function represents the nonlinearity. The proposed identification method is based on the Steiglitz-McBride approach to predict the model parameters. Moreover, a simulation software licensed by the USA Food and Drug Administration that simulates the dynamics of the glucose-insulin system metabolism inside the body, called Type I Diabetes Metabolic Simulator (T1DMS), was used to generate the data. Thirty subjects of different age groups were considered, and the data was generated for a week with a sample per minute, i.e. 302430 data points. This data was then processed using a developed MATLAB code to predict the blood glucose level. Various scenarios were established to validate the proposed approach. The simulations showed very promising results with a very low average root mean square error of 1 mg/dl, which is seven times less compared to other prediction techniques. Other cost functions have also been used and they showed very good results. In the future, this approach can be embedded in closed-loop continuous blood glucose monitoring systems in order to give alerts to the patients and help in calculating the needed insulin dose.


I. INTRODUCTION
According to the World Health Organization, diabetes is defined as a lifelong disease, where the patients' pancreas either stops producing insulin, known as type 1 diabetes, or the generated insulin cannot be efficiently used by the patients' body, which is known as type 2 diabetes. By 2020, more than 422 million adults have diabetes worldwide, and this number is expected to reach 700 million by 2045. For instance, the United States healthcare system spent 25% of its budget on diabetes in 2019 [1]. Type 1 diabetes (T1D) affects a large number of children and teenagers [2]. One of the most common threats for T1D patients is hypo-glycemia, where the blood glucose (BG) level decreases below 70 mg/dl. This might lead to fatal consequences, such as unconsciousness or The associate editor coordinating the review of this manuscript and approving it for publication was Ehab Elsayed Elattar . even death [3]. Contrary to hypo-glycemia, the sudden sharp increase in the BG level is called hyper-glycemia. Hence, observing the glucose concentration in the blood continuously is crucial to keeping its level within the normal range. Moreover, it is advantageous to predict the level of glucose for the next 30 to 60 minutes or even more to have better management of it and to avoid any consequences. Therefore, the focus of this work is to predict glucose levels with high accuracy for type 1 diabetes over a period of one hour to a few days.
Continuous Glucose Monitoring (CGM) was released commercially at the beginning of the 21 st century and results in more accurate diagnoses and decisions that optimize the timing and amount of the insulin intermitting doses. This helped in developing automated insulin pumps and artificial pancreas [4], [5]. In order to select the proper doses, the prediction of the BG level has to be accurate. Hence, it is crucial to have a reliable and accurate mathematical model to represent such a system [6]. Rather than the difficult traditional approach of deriving the mathematical equations of such a complicated system, artificial intelligent methods use either a machine learning or system identification approach, where the measured inputs/outputs are trained to create an empirical mathematical model [7]- [9]. Several machine learning methods are presented in the literature to predict the BG levels including neural networks, support vector machines, decision trees, deep learning, K-nearest neighbors, multilayer perceptron, random forest, and extreme learning machine [10]- [20]. In these machine learning approaches the methods are fast in computation, but they act as a black-box that does not offer insight into the system itself [7]. Also, there is no systematic approach to the selection of the methods. They are applied mostly in a trial-and-error approach. Moreover, the error in these applications is not predictable and can reach high values of more than 8 mg/dl [20] or more [45], [46].
The second approach is the block structure system identification, such as the autoregressive integrated moving average [21], best linear approximation [22], normalized least mean square [23], polynomial non-linear state-space [24], Bayesian nonlinear stochastic [25], blind nonlinear identification [26], nonlinear least square [27] and nonlinear linear fractional representation [22], [24], [27], [28]. Moreover, there are review papers that compare the results of these prediction methods from different points of view such as the accuracy, sensitivity, and prediction horizon [3], [6], [29], [30]. Furthermore, there are other approaches of prediction models for diabetes type 2, which is not within the scope of this work, as in Reference [31] and others focused on the specific situation as in Reference [32]. Nevertheless, different methods suffer from various problems such as high levels of error for different prediction horizons [3].
This work aims to develop an accurate long-term prediction horizon method for blood glucose levels for T1D patients, that can help in avoiding incidents of hypo/hyperglycemia. More specifically, we utilize the nonlinear system identification approach that is based on the prediction error method (PEM), where the one-stepahead prediction is applied. Hammerstein Box-Jenkins model structure is used to approximate the system, where the basis function represents the static nonlinearity, and two transfer functions were utilized to approximate the process and noise subsystems. For this work specifically, the Stieglitz-MacBride method is used to estimate the parameters of the Hammerstein box-Jenkins structure. Five cost functions have been used to evaluate the results. Among that, the root-mean-square-error function was mainly used to assess the results. Different rigorous validation tests were performed to ensure the reliability of the proposed method. The data was collected using the UVA/Padova T1DMS simulator, where scenarios of 30 patients were created that included 10 adults, 10 teenagers, 10 kids, and the averages of each age category [33], [34]. Different elements were considered during the simulation, such as the period of the training data period, the horizon of the prediction, and the order of the system. The rest of the paper is organized as follows: the data collections and approximated model structure applied in this work are described in section II. Then, section III contains the method that was used to predict the BG level, including the formulation and a summary of the algorithm steps. It is followed by MATLAB simulation examples and results, which are illustrated in section IV that ends with comparisons with previous works and discussion. Finally, the paper is concluded in Sec. V.

II. SYSTEM IDENTIFICATION
A. DATA COLLECTION T1DMS simulator software, based on MATLAB, is used to collect the data for modeling. It is the first and presently the only in silico diabetes model that is accepted by the USA Food and Drug Administration (FDA) as a substitute for pre-clinical animal testing of new treatment strategies for T1DM [34]- [37]. An experiment was simulated using 33 datasets, as follows: • 10 adults + their average = 11 datasets. • 10 teenagers + their average = 11 datasets. • 10 kids + their average = 11 datasets. The data was generated for a complete week. The samples were taken every minute using closed-loop monitoring and insulin injections. A screenshot of the program is shown in Fig. 1.
The input was the insulin injection rate in bolus (U) and the output was the blood glucose level. The pump includes noise and error considerations. Every patient was assumed to VOLUME 10, 2022 eat three main meals and two snacks every day, as illustrated in Table 1. where CHO is the carbohydrate, and the CR is the carbohydrate-insulin ratio.
The first edition of the simulator was released in 2007 [38] and the latest edition that was used in this work was presented in Reference [39], where the dynamic model was derived by Molano-Jiménez and León-vargas [33].

B. MODEL DESCRIPTION
One way to represent the relationship between the input and output of the system is by generating single input single output time series model, such as autoregressive (AR), moving average (MA), autoregressive moving average (ARMA), autoregressive with exogenous input (ARX), autoregressive moving average with exogenous input (ARMAX) and output error (OE) models. A more general model where all the previous models can be augmented is called the Box-Jenkins (BJ) model shown in Fig. 2, which is used in this work and consists of two infinite impulse response filters. One of them is used to approximate the process and the other one is to model the noise. The nonlinearity in this work is introduced before the process linear dynamic subsystem, which creates the Hammerstein Box-Jenkins model structure. Selecting this structure was optimized by using the MATLAB system identification toolbox.
The input to the system, u(t), is the insulin injection rate in bolus (U), where the output from the system, y(t), is the blood glucose level in mg/dl and the noise signal is e(t). It is worth mentioning that the system is represented in discrete time. The relation between the input/output is formulated as: where y d (t) is the output of the process given by: and v(t) is the output of the colored noise part given by: Note that G q −1 and H q −1 are linear filters represent the process and noise subsystems that are formulated in Eqs. (4) and (5) respectively. and The polynomials B q −1 and A q −1 are the numerator and denominator for the process transfer function, where C q −1 and D q −1 are the annotation for the colored noise transfer function. These four polynomials are formulated in Eq. (6).
where q −1 indicates the backward shift operator and b 0 . . . b nb , a 1 . . . a na , c 1 . . . c nc , and d 1 . . . d nd are the parameters of the polynomials that should be estimated as will be shown in the next section. The output of the nonlinearity part is x(t), given by: where m (·) is a static nonlinearity, which could be approximated by any basis function. Equation (8) shows how to calculate that, where the polynomial used in this work with a maximum order equal to M and polynomial parameters γ i , that must be estimated, and it will be explained in the next section. Moreover, other types of basis functions can be applied like splines or orthogonal polynomials.

III. METHODOLOGY
In this section, the system identification approach is going to be explained and formulated. This approach is based on the prediction error method, where the one-step-ahead approach is applied. Next, the Steiglitz-McBride Method is described, and a summary of the proposed method is presented at the end of the section.

A. PREDICTION ERROR METHOD
In the prediction error method (PEM), the parameter vector, θ, is computed to minimize the error, ε, which is the difference between the measured and predicted outputs. Note that θ consists of the parameters of the linear process, noise filter, and the coefficients of the nonlinearity given by: 1) ONE STEP AHEAD PREDICTION (OSAP) One step ahead prediction error approach is the most widely applied choice of the prediction error method to assess the quality of the proposed model. In the OSAP approach, the current output, y(t), at time t is estimated with known input/output data till t − 1 [7]. The formulation of OSAP for the proposed Hammerstein Box-Jenkins structure can be summarized as follows. The prediction error, ε, is the difference between the measured and the predicted output given by: where the t − 1 represents the one step ahead prediction, the outputŷ at time t is predicted up to t − 1, the predicted output at time t,ŷ (t | t−1,θ), is given by: where thev (t | t − 1, θ) is the predictor of the disturbance part given by: Finally, if the values of the process output y d (t, θ), and the noise predictor, are substituted in the output predictor y (t | t − 1, θ), the results can be written as: The error will now be computed as the difference between the measured and predicted output as defined in Eq. (10) and then the cost function will be calculated, which will be explained in the following subsection. A more detailed formulation of different nonlinear Box-Jenkins approaches is presented in References [21], [40]- [42].

2) ASSESSMENT TOOLS (COST FUNCTION)
The first applied cost function in this work is the root mean square error (RMSE). It is a very common tool to evaluate the performance of the predicted models [7], [8]. The main goal is to find the predicted parameter vectorθ which minimizes the RMSE, formulated in (14): where V N : The second cost function is percent variance accounted for (%VAF), which is another statistical measure that is defined by: where y andŷ are measured and predicted validation outputs.
The var(y) is approximated variance that is given by: The third cost function is the mean absolute error (MAE), which is an assessment tool where the absolute difference between the measured and predicted outputs is summed and divided by the number of data.
The fourth cost function is the mean absolute percentage error (MAPE) given by: The fifth cost function is the coefficient of determination (R 2 ) is the fifth assessment tool that used in this work, defined as follows: whereȳ (t) = 1 N N t=1 y(t).

B. STEIGLITZ-McBride METHOD
The linear parameters of the process model were estimated using Steiglitz-McBride (SM) method, which is explained in more detail in [43] and [7]. In brief, the output error model structure is formed in Eq. (21) in the beginning to get the first set of b's and a's using the least square fitting.
Subsequently, the approximatedÂ q −1 is then filtered using the input and output data as follows: Lastly, the filtered results are applied to show a new ARX model, where a Least Squares (LS) is fitted to approximate new parameters for b s and a s as follows: which can be re-written as: Next, a regression matrix, ϕ, is created that consists of the inputs and the outputs as: and a parameter vector θ A contains the parameters of the polynomials A q −1 and B q −1 : where the ARX output of the predictor becomes: Then, the parameters vector θ A is approximated using simple LS approach as:θ Eq. (28) is iteratively performed until the approximated A q −1 andB q −1 converge. After that, the nonlinearity is estimated by fitting a simple LS approach, where the linear coefficients γ i is estimated. The estimated nonlinearity is solved using (8). As a result, the estimated process part y d (t) is calculated. Then, the estimated colored noise part is solved by subtracting the estimated output from the process part as in (11). Finally, the noise parameters are approximated by fitting the ARMA model using MATLAB SIMULINK Toolbox as in (29), given that the noise filter H q −1 is stable and inversely stable.

1) ALGORITHM SUMMARY
The steps of the proposed approach are briefly listed below to show the procedure that has been implemented in the computer with the reference to the equations explained in the paper so far: Algorithm: Summary of the Proposed Nonlinear System Identification Algorithm 1: Start. 2: Find initial guess b 0 b 1 · · · b nb a 1 · · · a na using Steiglitz-MacBride method using Eq. (28). 3: Approximate the coefficients of the nonlinearity using Eq. (8), by applying simple LS. 4: Use Eq. (2) to find the estimated output of the process part y d (t). 5: Subtract the measured output from the estimated process model, which results in the approximated noise part Eq. (12). 6: An ARMA model is fitted using MATLAB System Identification toolbox, ARMAX command, to estimate the parameters of the noise model c 0 c 1 · · · c nb d 1 · · · d na . As in Eq. (29). 7: End.

IV. RESULTS AND DISCUSSION
MATLAB simulation examples and results are provided in this section. The data was generated using the T1DMS simulator, explained explicitly in Sec. II. The created data includes 30 subjects, which were divided equally into three groups: adults, teenagers, and kids. In addition, an average of each of the three categories is also considered. CGM is applied with a sampling time of 1 minute for a week-long term, where the input/outputs are the insulin injection rate and the BG level in units U and mg/dl, respectively. The daily intake of all subjects of three meals and two snacks is given in Table 1. The number of input/output measurements data for each run was 10081 with a minimum input of 9.7182 U and minimum output of 51 mg/dl. The average output using all the 33 datasets was 137.8 mg/dl, where the standard deviation was 30.1 mg/dl. There is no feature selection in this work or other data pre-processing except removing the polynomial trend using detrend function in MATLAB.
A MacBook Pro 2017 laptop was used for simulation with a 2.8 GHz Quad-Core Intel Core i7 processor and 16 GB RAM, and the MATLAB software edition 2018a was used for modeling and prediction. The latest edition of the T1DMS toolbox was used to create the datasets. The order of the linear filters was selected by plotting the poles/zeros map using the MATLAB system identification toolbox, and the repeated poles and zeros were canceled. Henceforth, polynomial nonlinearity was fitted starting from order one and increased that gradually until convergence in the value of RMSE is achieved. The resulting optimized model order for the linear process and noise filters were third and second, respectively. Figure 3 illustrates the measured and estimated blood glucose level versus time for three selected datasets: (A) for an adult, (B) a teenager, and (C) a kid. The minimum and maximum values of these three datasets are 59-185 mg/dl, 87-229 mg/dl, and 66-264 mg/dl for the adult, teenager, and kid datasets, respectively. The datasets were split into two parts, three out of seven days (43%) is used for training and four days out of seven days (57%) for validation. The data of the measured days are shown as a solid black curve throughout the seven days of the data with a reading per minute. The predicted training data are shown as dotted red curve and as one would expect they fit the measured data very well for all datasets. Surprisingly, the predicted data during the validation period shown in the solid green curve show very good agreement with the measured data. These results show the potential of the proposed technique to predict the glucose level in the blood for patients of different ages, rapid changes, and situations of hypocalcemia and hypercalcemia.
Next, the error between the measured and predicted outputs for both training and validation periods of the results shown in Fig. 3 are plotted as thin lines in the left scale of Figs. 4-6. The right scale of the figure shows the corresponding input data. The error value for the three cases of adult, teenager, and kid cases across the time span was about ±1 mg/dl, which is quite small. At the time points where the insulin is injected due to the meal/ snack time sharp increase in blood glucose, i.e. the peak inputs, the corresponding error jumps ±6, ±19, and ±8.8, for the three cases of adult, teenager, and kid, respectively. This is expected to happen as the model is not able to predict the exact output for a given instantaneous very sharp increase in the input. Nevertheless, as the time for such  a jump in error is known, it is applicable to include that in the algorithm and the prediction system can give an alert to the user accordingly. Though the error was shown in Figs. 4-6, it is more useful to calculate the RMSE given in Eq. (15) to evaluate the overall error in these datasets. The corresponding RMSE for the validation part is 0.55, 0.93, and 0.67 mg/dl for the three cases of adult, teenager, and kid, respectively.
Another conventional way to evaluate such results is to draw the Clarke error grids. They are used to quantify the accuracy of predicted blood glucose with reference to the reference measured values. These charts are widely used to evaluate the accuracy of the predicted BG level versus the measured one. They are split into five zones (A-E), where zone A and B reflect clinically acceptable devices with an error that does not exceed 20% compared with the reference value or, in the case of zone B, that is larger than 20% but does not cause incorrect treatment [44]. Fig. 7 presents the Clarke  error grids for the same three datasets using the MATLAB function which was developed in [45].
Having the points on the main diagonal means that the error is very small as revealed for the adult and the kid cases. There is a slightly larger error for the teenager case as shown in part (b) of the graph, but the points are all in zone A, i.e., the points are within 20% of the reference values. These results are well correlated with the error shown in Figs. 4-6 as the error for the teenager dataset is larger than for the adult and the kid datasets. However, the Clarke charts are quite informative and reveal the excellent quality of the prediction technique utilized in this study.
Next, a detailed analysis of the RMSE and the standard deviation of different age categories and datasets have been carried out and presented in Table 2. The RMSE of ''10 adults datasets'' is the average of all RMSE for the 10-adult datasets individually. The next row shows the RMSE for the dataset of the average of all 10-adult datasets. The third row shows the average value of all RMSE values for each dataset of the adult group after taking the average model of all the models for the 10 datasets and using it for all of them. Likewise, the experiments were applied to the teenagers, and kids as shown in the next six rows. Moreover, row #10 shows the average RMSE of all the RMSE values of the 33 datasets. Finally, the last row shows the average RMSE value for all datasets using the average model of all of them. It is evident that the predicted output fits the measured counterpart very well with an RMSE less than 0.9 mg/dl for all 33 datasets. The standard   deviation of four of this analysis was calculated and found to be less than 0.5 mg/dl. From the results of this table, the average molding datasets show lower RMSE than the others for all the categories. Having said that, all the values are very good with an average of less than 1 mg/dl in most of the cases. By comparing the achieved results with the results in the literature [17]- [19] and [46]- [48], the achieved RMSE of the proposed method here is way lower.
Another important factor shown in table 3 is the optimum training period which was required to get a good result and a prediction horizon. Training periods of 1, 2, and 3 days have been considered. Evaluation of the RMSE for different validation period horizons (PH) of 1 H, 6 H, 12 H, 24 H, and for the rest of the week has been carried out. At least one day training period was required to be able to predict for any prediction horizon an acceptable RMSE and two days of training to predict the output for the rest of the week with a good RMSE. On the other hand, three days of training will lead to very accurate results with less than 0.8 mg/dl RMSE for a prediction horizon of four days, i.e. the rest of the week.
Next, Table 4 presents a comparison of various results from other papers that used the RMSE as a cost function to compare the achieved RMSE with the proposed method in this paper. It is worth mentioning that in order to hold an absolutely fair comparison, the dataset and the cost function should be fixed, and the used method is changed only. However, the fact that we generated the data ourselves by using the T1DMS and those papers used different criteria to generate or acquire the data makes it very difficult to hold such a true comparison. Moreover, implementing all the methods proposed in those papers on the same dataset is beyond the scope of this work. Hence, the approach was to focus on the RMSE as the main metric and show the training period as well as the prediction horizon side by side. The maximum prediction horizon was 96 hours (four days) which is presented in Reference [49] but with an average RMSE of 40 mg/dl, which is almost 40 times greater than the results found using the nonlinear system identification approach in this paper. More interestingly, even for a quite short prediction horizon of 15 minutes, a quite large value of RMSE has been achieved in [46], [47], [50]. Table 5 shows a comparison of five different cost functions to validate the results. It is clear from the table that all the cost functions confirm that the objective function was met by minimizing the error.

V. CONCLUSION
A direct Hammerstein Box-Jenkins System Identification technique was proposed to predict the BG level on T1D patients including different ages categories. This approach was based on the regression method called Steiglitz-McBride, where the datasets were created using the T1DMS simulator. Different simulations criteria were optimized and validated, such as the order of the linear subsystems and nonlinearity, the minimum size for the training data, maximum accurate prediction time horizon, and the acceptable RMSE compared to some previous studies. The outcomes were rigorously validated, and they showed a very promising result, encouraging us to test it on real datasets. Moreover, for a further extension, nonlinear iterative convex approach might be tested to optimize the results of the proposed approach.