ReplayBG: A Digital Twin-Based Methodology to Identify a Personalized Model From Type 1 Diabetes Data and Simulate Glucose Concentrations to Assess Alternative Therapies

Objective: Design and assessment of new therapies for type 1 diabetes (T1D) management can be greatly facilitated by in silico simulations. The ReplayBG simulation methodology here proposed allows “replaying” the scenario behind data already collected by simulating the glucose concentration obtained in response to alternative insulin/carbohydrate therapies and evaluate their efficacy leveraging the concept of digital twin. Methods: ReplayBG is based on two steps. First, a personalized model of glucose-insulin dynamics is identified using insulin, carbohydrate, and continuous glucose monitoring (CGM) data. Then, this model is used to simulate the glucose concentration that would have been obtained by “replaying” the same portion of data using a different therapy. The validity of the methodology was evaluated on 100 virtual subjects using the UVa/Padova T1D Simulator (T1DS). In particular, the glucose concentration traces simulated by ReplayBG are compared with those provided by T1DS in five different scenarios of insulin and carbohydrate treatment modifications. Furthermore, we compared ReplayBG with a state-of-the-art methodology for the scope. Finally, two case studies using real data are also presented. Results: ReplayBG simulates with high accuracy the effect of the considered insulin and carbohydrate treatment alterations, performing significantly better than state-of-art method in almost all considered situations. Conclusion: ReplayBG proved to be a reliable and robust tool to retrospectively explore the effect of new treatments for T1D on the glucose dynamics. It is freely available as open source software at https://github.com/gcappon/replay-bg. Significance: ReplayBG offers a new approach to preliminary evaluate new therapies for T1D management before clinical trials.


I. INTRODUCTION
T YPE 1 diabetes (T1D) is a chronic autoimmune disease characterized by the destruction of insulin-producing pancreatic beta-cells [1].As a result, to keep blood glucose (BG) within the safe target range (i.e., BG ∈ [70, 180] mg/dl), people affected by T1D are required to assume exogenous insulin, to avoid hyperglycemic episodes (i.e., BG >180 mg/dl) that can cause serious complications in the long-term.On the other hand, insulin overdosing inevitably leads to dangerous hypoglycemic events (i.e., BG <70 mg/dl) that must be promptly tackled by means of rescue carbohydrates intakes [2].
Assessing (possibly new) therapies for the treatment of T1D is a critical task for obvious safety reasons and can also be expensive and time consuming.Indeed, the clinical assessment requires the enrollment of a sufficient number of subjects to prove that the new therapy is actually effective with respect to the state-of-art.
In this context, in silico clinical trials (ISCT)s played a key role in the last decades [3].ISCTs leverage complex mathematical models of glucose-insulin regulation (see, e.g.[4], [5], [6], [7]) to set up computer simulation environments to preliminary design and evaluate, in vast virtual cohorts representing the T1D population, new methodologies in a safe and cost-effective way.Moreover, ISCTs allow exploring the impact of new treatments by running multiple tests on the same virtual subject maintaining the same identical conditions, which is something impossible to replicate in clinical trials.One great possibility provided by ISCT tools is using them to tune therapies in a specific real individual.This is fostered by the fact that gathering real-life data is becoming easier than in the past, so it would be ideal to exploit such large datasets to further validate or even personalize new therapies.
The aim being exploring this perspective, Patek et al. [8] proposed a novel data-driven, model-based simulation approach which consists of two steps.Firstly, the patient's glucose Step 2: the model parameters identified in Step 1 are given in input to model (6) to simulate the (interstitial) glucose concentration that should have been obtained, in the same individual and in the same time window, by adopting an alternative insulin and/or CHO therapy.
concentration measured via continuous glucose monitoring (CGM) devices, and insulin infusion data are used as inputs of a linear model of glucose-insulin regulation of T1D to retrospectively estimate, through deconvolution on a specified time window, a signal, called "net-effect", representing all the unmodeled phenomena affecting glucose dynamics.Secondly, the net effect is used as a forcing input to run the model at hand and test the new therapy under assessment in the same time window with insulin and/or meal inputs modified by the treatment under test.This second step allows to obtain the hypothetical patient glucose concentration that would have been resulted from its adoption.
However, even if the methodology proposed by Patek et al. is a very promising tool to evaluate the effect of alternative insulin therapies on already acquired datasets, its domain of applicability is limited as demonstrated by Vettoretti et al. [9].As such, recently Hughes et al. [10] proposed and tested an improved version of the methodology, the aim being surpassing such limitations showing better performance and accuracy than [8] in a set of benchmark simulated scenarios.
In the present work, we propose ReplayBG, a digital twin-based approach which aims to reliably simulate glucose concentration traces in response to modifications of the insulin/carbohydrates therapy in real individuals, without the need of estimating a net effect signal.As depicted in Fig. 1, the methodology consists of two sequential steps, described in details in Section II.In the first step, the aim being capturing the peculiar glucose-insulin dynamics observed within the data, a (nonlinear) model of glucose-insulin dynamics is identified on already collected patient data (consisting of insulin infusion, carbohydrate intake and CGM) using a robust and powerful Bayesian technique.Then, the model is used to simulate the (interstitial) glucose concentration trace obtained by using an alternative insulin/carbohydrate therapy in the same individual and in the same identical scenario.In Section III, the methodology is tested following the same approach of [10], where synthetic data are generated through multiple ad hoc ISCTs, used to identify the model of ReplayBG, and then used as "groundtruth" to evaluate if ReplayBG is able to reproduce specific glucose fluctuations resulting from different (here 5) possible scenarios obtained as modification of meal intake and/or insulin administration.In Section III we quantitatively compare Re-playBG with the state-of-the-art net-effect-based methodology of Hughes et al. [10].Section IV illustrates two case studies in which the use of ReplayBG on real data is demonstrated by evaluating two state-of-the-art algorithms for T1D management.Finally, Section V concludes the article considering the strengths and limitations of the study and discussing the margins for its further development.

A. Step 1: Identification of the Personalized Model of Patient Physiology
1) Model Structure: As previously done by Cappon et al. [11], the model was built starting from the physiological model available in the UVa/Padova T1D Simulator (T1DS) [12].Moreover, as in [11], in order to permit model identification at the individual level using carbohydrate intake CHO(t), exogenous insulin I(t), and CGM data, the original T1DS model has been simplified reducing the number of parameters to be identified, but paying attention in maintaining its ability to effectively describe glucose-insulin dynamics.Note that CGM, CHO, and insulin data should be collected in parallel.As documented in the following, the model is composed of three main subsystems: subcutaneous insulin absorption; oral glucose absorption; glucose-insulin kinetics.
a) Subcutaneous Insulin Absorption Subsystem The model of the subcutaneous insulin absorption system is a simplified version of the model incorporated in T1DS [13].The model is composed of three compartments and describes the absorption dynamics of exogenous insulin infusion to the plasma.Exogenous insulin I(t) is infused to the first compartment, which represents insulin in a non-monomeric state.Then, "non-monomeric" insulin diffuses to the second compartment, representing insulin in a monomeric state, and eventually reaches plasma.Model equations are: where I sc1 (mU/kg) and I sc2 (mU/kg) represent the insulin in a non-monomeric and monomeric state, respectively; I p (mU/l) is the plasma insulin concentration; k d (min −1 ) is the rate constant of diffusion from the first to the second compartment; k a2 (min −1 ) is the rate constant of subcutaneous insulin absorption from the second compartment to the plasma; k e (min −1 ) is the fractional clearance rate; V I (l/kg) is the volume of insulin distribution; β (min) is the delay in the appearance of insulin in the first compartment.b) Oral Glucose Absorption Subsystem The model of the oral glucose absorption system, taken from [14], describes the gastro-intestinal tract as a three-compartment system: the first two compartments quantify the glucose in the stomach, while the third compartment models the upper small intestine where CHO is absorbed.Model equations are: where Q sto1 (mg/kg) and Q sto2 (mg/kg) are the amounts of glucose in the stomach in solid and liquid state, respectively; Q gut (mg/kg) is the glucose concentration in the intestine; k empt (min where f (dimensionless) is the fraction of the intestinal content absorbed in the plasma.c) Glucose-Insulin Kinetics Subsystem The core model is based on a modified version of the glucose-insulin kinetics minimal model introduced in [15], [16].The model is composed of three compartments, the first describing the effect of insulin action and oral glucose rate of appearance on plasma glucose concentration, the second quantifies the impact of plasma insulin concentration on insulin action, and the last one represents the transport of glucose from plasma to the interstitium.Model equations are: where G (mg/dl) is the plasma glucose concentration, X (min −1 ) is the insulin action on glucose disposal and production; IG (mg/dl) is the interstitial glucose concentration; SG (min −1 ) is the glucose effectiveness that describes the glucose ability to promote glucose disposal and inhibit glucose production; G b (mg/dl) is the basal glucose concentration in the plasma; V G (dl/kg) is the volume of glucose distribution; p 2 (min −1 ) is the rate constant of insulin action dynamics; SI (ml/μ U•min) is the insulin sensitivity; I pb (mU/l) is the basal insulin concentration in the plasma; α (min) is the delay between plasmatic and IG compartments; and ρ(G), is a deterministic function, introduced by Dalla Man et al. [17], that allows to better represent glucose dynamics in the hypoglycemic range by increasing insulin action when glucose decreases below a certain threshold: where G th is the hypoglycemic threshold (set to 60 mg/dl) and r 1 (dimensionless) and r 2 (dimensionless) are two model parameters, without direct physiological interpretation, set to 1.44 and 0.81, respectively.Note that in the model of (4) we are assuming, without any loss of generality, the gain between G(t) and IG(t) to be equal to one [18] [19].With this assumption, in practice, IG(t) represents a noise-free version of the CGM data, allowing us to fit it versus CGM (t) in the identification step described in the following section.

2) Model Identification
For each model subsystem, we reduced as much as possible the number of parameters to be identified, i.e., retaining only the parameters that have the greatest impact on model output, while setting the other parameters to population values.This selection step facilitates the identifiability of the model maintaining capability of describing the glucose dynamics observed in the data.In particular, we fixed 6 parameters to population values, i.e., V I = 0.126 l/kg, k e = 0.127 min −1 , β = 8 min, f = 0.9, V G = 1.45 dl/kg, and α = 7 min [13] [14], [15].The remaining parameters in (1)-( 4) must be derived, at the individual level, from the data.Therefore, the vector θ of the unknown parameters is composed by a total of eight variables, two related to the subcutaneous insulin subsystem, i.e., k a2 and k d , two associated to the oral glucose absorption subsystem, i.e., k empt and k abs , and four related to the glucose-insulin kinetics subsystem, i.e, S G , S I , p 2 , and G b .Formally, the parameter estimation problem can be stated as the problem of determining θ from the equations: where x x x(t) is the state vector defined as u u u(t) := [I(t), CHO(t)] is the input vector; f (•) is the state update function combining (1) (2), and (4).f depends on the set of unknown parameters θ.
As documented in the literature, (1), (2), given the data at hand (i.e., CGM, I, and CHO) and ( 4) are a priori non identifiable [13], [14], [16].As such, the identification of the vector θ at the individual level from insulin and carbohydrate intake and CGM is not trivial, since the resulting model ( 6) is not a priori identifiable as well.
To mitigate a priori non-identifiability, we adopted the same Bayesian approach of Pillonetto et al.,giapiPls, i.e., Markov Chain Monte Carlo (MCMC).In fact, MCMC allows overcoming improper model parameters estimates that are usually achieved with maximum-likelihood-based approaches.In details MCMC, allows to obtain a point estimate of θ, θ, by performing Monte Carlo integration over a set of N samples θ i , i = 1, . . ., N generated from the posterior distribution Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where p Y |θ,U (Y |θ, U) is the likelihood function, that is, the probability of observing a certain sequence of CGM measurements Y := {y(t k ), t k = k • T s , k = 1, . . ., D} given the parameter vector θ and the input U := {u(t k ), t k = k • T s , k = 1, . . ., D} with D the number of available data points and T s the sampling period, while p θ (θ) is the a priori information on the distributions of unknown parameters, which was obtained from previous studies [13], [14], [21].
The successful application of MCMC to address the a priori non-identifiability of the ReplayBG model was already extensively discussed in [11].As such, for the sake of the article conciseness, we decided not to report again such an analysis in the present article and we refer [11] for further details.Anyway, the reader can find additional analyses and results in the attached Supplementary Material (Sections 2, 3, and 7), where we provided some insights on the practical model identifiability (confidence intervals of parameter estimates, prior vs. posterior probability density distributions of model parameters in a representative subject, and accuracy in capturing real-world CGM dynamics).
In this work, we implemented an improved variant of the identification approach proposed in [11], in which the Monte Carlo integration step is replaced by a new step where the a posteriori distribution p θ|Y,U (θ|Y, U ) is fit and represented in sampled form.
From the practical point-of-view, θ i samples are generated from a Markov Chain whose stationary distribution is exactly the posterior in (7) (target distribution).Such a chain is built in MATLAB (Mathworks Inc., Natick, MA, USA) using an Adaptive Single Component Metropolis-Hastings scheme [22] described in details in the Supplementary Material (Section 4).
The obtained samples θ i are then used to fit a multivariate t copula distribution able to represent p θ|Y,U (θ|Y, U ) and capture the underneath dependence between unknown parameters [23].Finally, such a copula distribution is used to generate 1000 realizations of θ, i.e., θr , r = 1, . . ., 1000, representing in sampled form the target distribution p θ|Y,U (θ|Y, U ).These samples are stored for their later use in the second step of ReplayBG.
2) Remark: The proposed identification scheme of Re-playBG does not estimate the states of model ( 6) and, more importantly, the corresponding initial conditions.Identifying such initial conditions is crucial to correctly estimating the unknown model parameter vector θ and, by product, reliably replaying the glucose profile with new, altered, inputs, i.e., step 2 of ReplayBG described below.To circumvent this issue, ReplayBG assumes all model state initial conditions at steady state.This assumption is granted to be valid when the actions of exogenous insulin and carbohydrate intake are "exhausted", i.e., when the starting point of the portion of data used for model identification and replay is reasonably distant from the last meal and insulin boluses.For this reason, as later described in Sections III.A and IV, in this article we designed simulations and selected portions of real data that guarantee to satisfy these assumptions.The potential ReplayBG user should be aware of this aspect and be careful when selecting the portion of data to work with.Furthermore, as discussed in details in the Supplementary Material (Section 5), as a rule of thumb we suggest to use portions of data that span at least 6 hours.This ensures to obtain both reliable parameter estimates and simulation results.

B. Step 2: Use of Model for Simulation
The second step of ReplayBG uses the model identified in Step 1 to predict the hypothetical glucose concentration profile (which corresponds, in the model, to the variable IG, which we remind to be a noise-free version of CGM) that would have been obtained, in the same individual and in the same time window, from the adoption of an alternative therapy (Fig. 1, right).Specifically, for each r-th realization of the model parameters, θr , a corresponding IG(t) profile, ÎG r (t), is obtained by simulating the model ( 6) using as input the carbohydrate intakes and insulin administrations provided by the alternative therapy to be evaluated.As results, a total of 1000 predicted IG profile realizations are obtained.These realizations are ultimately used to infer the median IG profile, ÎG(t) 50 , and the boundaries of the corresponding 25th − 75th percentile confidence interval, ÎG(t) 25 , ÎG(t) 75 , of the therapy under assessment.

III. ASSESSMENT OF REPLAYBG FROM SYNTHETIC DATA
ReplayBG evaluation is performed in silico, using as reference the T1DS [12], a software to generate ISCT that incorporates a population of 100 virtual adult subjects, each characterized by different physiological parameters to capture the inter-/intra-subject variability of T1D population.Briefly, for each virtual subject of the T1DS, a baseline scenario has been set up, which allowed generating CGM, CHO, and insulin data to identify the ReplayBG model.This allowed us to assess the ability of ReplayBG to correctly estimate IG in a T1D individual given a set of data.Then, several different modifications of the baseline scenario have been created, simulating alternative therapies.These therapies have been given as input to both the T1DS and ReplayBG, generating two IG traces: the one obtained with the T1DS served as a reference to assess the ability of ReplayBG to simulate it.This procedure, also adopted by [8] and [10], allowed setting a "ground truth" term of comparison used to quantify the ability of ReplayBG to reconstruct the glucose trace resulting from the therapy modification under evaluation.

A. Synthetic Dataset
Synthetic CGM, CHO, and insulin data of 100 virtual adult subjects were generated using T1DS.
In detail, we setup a 12-hour long single-meal ISCT, from 7:00 AM to 7:00 PM where each subject had a meal of 50 g of carbohydrates at 7:00 AM.The corresponding insulin bolus was computed through the standard formula commonly used in clinical practice [24]: where carbohydrate-to-insulin ratio and the correction factor, two patient-specific therapy parameters which quantify the patient sensitivity to insulin.Throughout ISCT, patients were treated with the optimal basal insulin infusion rate provided by T1DS.Then, for each subject, we identified the ReplayBG model ( 6) as described in Section II-A2.Finally, as remarked in Section II-A2, it is important to use data that ensure that the model of ReplayBG is not affected by the choice of the starting point.This is granted since data are generated from the physiological model of T1DS whose states all start from the steady state.

B. The Five Simulated Scenarios
To assess the efficacy of ReplayBG, we simulated some alternative therapies in the 100 virtual patients of Section III-A (used for ReplayBG model identification), and then applied ReplayBG step 2 to evaluate the ability of ReplayBG to replicate them.In particular, first, synthetic data were generated through T1DS using the simulation setup used for the identification of Re-playBG but with different insulin and meal input profiles.Then, we fed ReplayBG with the same identical alternative therapies and compared the median IG profile predicted by ReplayBG with the "ground-truth" provided by T1DS.Specifically, we simulated a total of five scenarios, each of them representing a possible reliable therapy adjustment whose effect on glucose could be of clinical interest to be evaluated: r Scenario 1: Meal insulin bolus dose modulation.This scenario simulates the situation in which we would be interested in evaluating, for example, the effect of a different insulin dosing formula.To simulate such a scenario, the original dose of insulin bolus, calculated using (8), was modulated by a multiplying factor ranging from −50% to 50% with a step of 10%.The original optimal basal insulin rate and meal carbohydrate amount were not altered; r Scenario 2: Meal CHO amount modulation.Here, the aim is to simulate a scenario in which the amount of CHO at meal time is different from the nominal one used to calculated the insulin bolus.For example, this scenario can be useful if we want to test the impact of an error in CHO counting.The original meal carbohydrate amount was modulated by a multiplying factor ranging from −100%, which corresponds to the complete elimination of the meal intake, to +100%, which, on the other hand, doubled its amount, with a step of 20%.Insulin inputs were not altered; r Scenario 3: Basal insulin infusion rate modulation.This third simulation scenario is similar to scenario 1, except that here we want to test a new therapy in which only the basal insulin infusion rates are modified.To simulate this, the optimal basal infusion rate was modulated by a −50% to +50% factor, with an increasing step of 10%.The original meal insulin bolus dose and meal intake were not altered; r Scenario 4: Snack addition.This scenario was devised to test what would have happened if the subject had either applied an hypotreatment strategy, e.g., in presence of hypoglycemia, or had taken a snack (without the corresponding insulin bolus).To realize this, five hours after the start of the ISCT, i.e., at 5:00 PM, a snack of 5 g to 25 g of carbohydrates, with an increasing step of 5 g, was added to the carbohydrate input intake regime.The original insulin administrations were unaltered.
r Scenario 5: Bolus addition.Finally, the fifth scenario represents the case in which we are interested in testing new methodologies that advise the assumption of insulin boluses.Practically, such a scenario is simulated by adding in the original insulin administration regimen, an insulin bolus of 10/CF U to 50/CF U, with an increasing step of 10/CF , was administered at 2:00 PM.In particular, rather than using a fixed amount of insulin for each subject, we decided to modulate this amount by the correction factor CF in order to account for the specific subjects' insulin sensitivity.Original meal intake was not altered.

C. The Chosen Performance Metrics
For each scenario 1-5, we evaluated the performance of ReplayBG by considering the average Mean Absolute Relative Difference (MARD) (%) between the "true" IG profile provided by T1DS, IG(t k ) p , and the median IG profile simulated through ReplayBG, ÎG(t k ) 50 p , computed over the virtual cohort: where D is the number of data points available.To evaluate whether the ReplayBG methodology is capable of reliably simulating variations in the therapy of a given individual, we considered a MARD threshold of 10%.Indeed, MARD <10% can be considered a "good" glucose match since it approximates the accuracy found in currently available CGM devices.We also considered, as additional performance metric, the root mean square error (RMSE) (mg/dl) between IG(t k ) p and ÎG(t k ) 50 p : Another evaluation tool that is useful to assess the clinical acceptability of the simulations performed by ReplayBG is the Clarke Error Grid Analysis (CEGA) [25].However, for the sake of article conciseness, we decided to report CEGA results the Supplementary Material (Section 1).

D. Comparing ReplayBG With State-of-The-Art Techniques
To further support the novelty and the potential of ReplayBG, we compared it against the methodology of Hughes et al. [10].Of note, we did not compare ReplayBG also vs. the approach of Patek et al. [8], since [10] is an updated, improved version of [8], thus representing the most challenging approach to compete with.Hereafter, [10] will be referred to as "Net-Effect" for the sake of simplicity.
For the convenience of the reader, here we condensed the principles of the Net-Effect methodology.Briefly, it can be summarized in three sequential steps.First, least-square fitting is used to identify a subset of parameters of a linear time-invariant (LTI) model of glucose-insulin dynamics to accommodate, thus personalize, the model itself to the patient-specific physiological intravariability.Secondly, regularized deconvolution is used to extract a residual signal representative of all unmodeled phenomena, the so-called "net-effect" signal.Finally, the net-effect signal is fed back into the LTI model, together with the known insulin and meal records, to reproduce the observed original glucose trace.In this setup, similarly to ReplayBG, if the original meal and/or insulin inputs are altered, an hypothetical trace of the expected glucose time-course under the modified inputs can be produced by simulating the LTI using as inputs the net-effect signal and the new meal and insulin records.Further details on Net-Effect can be found in [10].
Net-Effect has been reimplemented as described in [10] and quantitatively evaluated adopting the same procedure used for ReplayBG, i.e., on the same data, the same five scenarios, and the same metrics described in Sections III-A, III-B, and III-C, respectively.Then, for each of the five scenarios, we compared the MARD and RMSE distributions obtained with ReplayBG and Net-Effect both numerically and statistically.Specifically, to assess whether the difference between methods is statistically significant, for each scenario and for each considered input modulation/alteration, we performed the two-tailed Wilcoxon signed rank test with a 5% significance level corrected by the number of comparisons made in each scenario using the Bonferroni approach to account for possible repetition biases.
The Wilcoxon signed rank test has been chosen since the MARD distributions were not Gaussian according to the Lilliefors test.

E. Results
For each of the five scenarios, we graphically discussed the application and the performance of ReplayBG on a representative virtual subject.This will be done resorting to Fig. 2, which is composed by five panels, each of them reporting, for each scenario, the original simulated data vs the IG traces obtained with ReplayBG for the specific meal and insulin input profiles.Then, the results, obtained in the entire virtual cohort, are discussed and reported in Tables I-V as median and 5th-95th percentile range of MARD, and RMSE.Finally, ReplayBG is compared against Net-Effect.This will be done through Tables I-V, which report also the median and 5th-95th percentile range of MARD, and RMSE obtained with Net-Effect, and Fig. 3, which shows and compares, through five panels, the MARD distributions of ReplayBG (in red) vs. Net-Effect (in blue) for each scenario and considered meal/insulin input alteration.RMSE distributions are not visualized in dedicated plots since they are qualitatively similar.
Fig. 2(a) compares, for a representative subject, the reference IG data obtained using T1DS (continuous lines) when the ten different modulations considered in scenario 1 are applied to the original meal insulin bolus dose with the respective IG profiles obtained by step 2 of ReplayBG (dashed lines).Results show Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.that ReplayBG is able to reconstruct with good accuracy the target reference IG traces capturing the peaks and nadirs as well as the overall glucose dynamics.
Table I shows the overall metrics calculated on the 100 virtual subjects for the ten different meal insulin bolus modulations considered in scenario 1.
The results show that ReplayBG is very accurate in reproducing the target IG fluctuations.Particularly, achieved median MARD is well below the 10% threshold ranging from a minimum of 1.14% to 6.13%.Of note, when the original meal insulin dose is increased by 50%, in a small number of subjects, the obtained MARD results above 10%, however, the overall Re-playBG performance can be considered more than satisfactory.Similar considerations can be made for RMSE 1 which ranges from 1.91 mg/dl to 8.30 mg/dl.Comparing ReplayBG with Net-Effect, results highlight that ReplayBG is able to estimate the target IG traces more accurately.This can be also appreciated from Fig. 3(a).In particular, significantly better results are achieved by ReplayBG for all considered meal insulin bolus modulations.Notably, median MARD is always below the 10% threshold using ReplayBG (maximum MARD = 6.13%), while Net-Effect exceeds this limit when the original meal insulin bolus is increased by 50%, (maximum MARD = 12.14%).
Fig. 2(b) shows, for a representative subject, the target IG profiles simulated with T1DS (continuous lines) and the respective replayed IG profiles obtained with ReplayBG (dashed lines) for the ten different modulations of the original meal carbohydrate amount considered in scenario 2. Results are satisfactory and the reference glucose dynamics are correctly captured by ReplayBG.Notably, the error increases when the meal amount is totally removed (−100%) or doubled (+100%), but the resulting profiles are still able to correctly reflect the overall dynamics.
Table II shows the obtained MARD 2 and RMSE 2 distributions in terms of median and 5th-95th percentile range.MARD 2 results show that, when the original meal carbohydrate amount is modulated by −60% up to +100%, ReplayBG obtains reliable results.On the other hand, when a modulation of −100% or 80% is applied, the median MARD is above 10% being 13.40% and 10.60%, respectively.Comparing ReplayBG and Net-Effect in scenario 2, in terms of MARD 2 and RMSE 2 distributions, the same qualitative conclusions drawn for scenario 1 can also be made for scenario 2. Indeed, as shown by Fig. 3(b), ReplayBG allows to achieve statistically better accuracy than Net-Effect for all considered meal content modulations, except for the −100% modulation, estimating better IG profiles.
Fig. 2(c) shows the results obtained for scenario 3 in a representative subject for the ten different considered modulations of the original basal insulin infusion rate.In particular, the target IG data (continuous lines) are plotted against the IG profiles obtained in step 2 of ReplayBG (dashed lines) when the same input alterations are applied.Results shows that, when basal insulin is modulated from −50% to +20%, the target IG profiles are really close to the respective traces obtained with ReplayBG.The error increases when the basal insulin infusion rate is increased by more than +20%, where ReplayBG overestimates the impact of insulin on glucose levels.
The median and the 5th-95th percentile range results of MARD 3 and RMSE 3 are reported in Table III.ReplayBG is able to well-reproduce IG fluctuations when the original basal infusion rate is modulated from −40% to +20% being the obtained median MARD 3 below or really close to 10%.For all other considered modulations, MARD 3 is higher than the fixed threshold.ReplayBG vs. Net-Effect results are consistent with the results above: ReplayBG better estimates the target IG traces than the Net-Effect for all considered modulations.Analyzing Fig. 3(c), however, the MARD results statistically lower using ReplayBG only when basal insulin is modulated by −10%, +40% and +50%.Fig. 2(d) compares, for a representative subject, the reference IG data obtained using T1DS (continuous lines) when the five different insulin boluses considered in scenario 4 are added to the original insulin input profile with the respective IG profiles obtained by step 2 of ReplayBG (dashed lines).ReplayBG obtains very accurate results and is able to almost perfectly approximate the target IG profile.
Table IV shows the overall metrics calculated on the 100 virtual subjects for the five different added insulin boluses considered in scenario 4. In this scenario, ReplayBG achieved good accuracy in terms of MARD 4 for considered situations ranging from 2.15% to 6.11%.Results in terms of RMSE are consistent.Again, ReplayBG achieves better performance compared to Net-Effect.It can be appreciated that ReplayBG is able to estimate the target IG time course more accurately than Net-Effect.In particular, the larger the considered carbohydrate intake amount, the larger the performance gap.This can be observed also in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 3(d), where the MARD distributions obtained using Re-playBG and Net-Effect in scenario 4 are shown and compared.In particular, MARD significantly improves when a carbohydrate intake of 15 g or more is added to the original meal input.Of note, median MARD 4 is always below the 10% threshold using both methodologies ranging from 2.15% to 6.11% for ReplayBG and from 2.35% to 9.39% for Net-Effect.Fig. 2(e) shows, for a representative subject, the simulated IG profiles obtained with T1DS (continuous lines) when the five different snacks considered in scenario 5 are added to the original meal input profile, and compares them with the respective IG traces resulting from step 2 of ReplayBG (dashed line).ReplayBG is able to replicate with high fidelity the target glucose dynamics and the fluctuations due to the added snack intakes for all the considered case.
Table V shows the overall metrics calculated on the 100 virtual subjects for the five different added snacks considered in scenario 5.In this case, ReplayBG is very accurate in reproducing the target IG traces obtaining a median MARD 5 that ranges from 1.44% to 5.93%, which is below the 10% threshold for all considered insulin bolus amounts, and a median RMSE 5 that ranges from 1.99 mg/dl to 6.00 mg/dl.
Comparing ReplayBG with Net-Effect, results are consistent with the other scenarios.ReplayBG performs better than Net-Effect by obtaining lower MARD and RMSE.Similarly to scenario 4, the larger the insulin bolus amount the larger the performance gap.As indicated in Fig. 2(e), this improvement is statistically significant for all considered insulin bolus amounts except for 10/CF.Of note, while ReplayBG MARD is below the 10% threshold for all the insulin bolus amounts, Net-Effect MARD exceed this limit when 50/CF U is considered.

IV. DEMONSTRATION OF REPLAYBG USE WITH REAL DATA
To demonstrate the use of the proposed methodology with real data, we considered two representative applications of ReplayBG using data of a representative patient with T1D.Specifically, the data of this patient were collected during a multicentre clinical trial conducted by the Juvenile Diabetes Research Foundation (JDRF) for the Artificial Pancreas Project Consortium [26], [27].The study aimed to evaluate the efficacy of the platform "Diabetes Assistant" (DiAs) [28] on glucose control at home in overnight-only and 24/7 closed-loop control (CLC) modes, compared to the baseline sensor augmented pump (SAP).The patients used the Accu-Check insulin pump (Roche Diagnostics, Mannheim, Germany), the Dexcom G4 Platinum CGM sensor (Dexcom, San Diego, CA, USA), and the DiAs smartphone-based medical platform.Participants underwent a 2-week period at home using the DiAs, the study pump, and the CGM in SAP mode, followed by 2 weeks of overnight-only CLC system use and 2 weeks of 24 h per day and 7 days per week (24/7) CLC system use.From the first two weeks of data (i.e., those collected in open loop mode), we extracted two sixhour-long portion of data containing a meal and the consequent postprandial period, but not containing any corrective actions, neither assumptions of rescue carbohydrates nor injection of corrective insulin boluses.
As additional selection criterion, we extracted only windows that do not present neither insulin bolus administrations, nor carbohydrate intakes, in the 4 hours preceding their starting point.As remarked in II-A2, this is crucial to avoid that an advice resulting from the application of our methodology would be biased by the starting point of the selected window, since, before that point, events that are close to it might affect the identification procedure.Specifically to this point, the aim being analyzing whether our data selection criteria would lead to a very minute subset of all data generated by a patient with T1D, we studied how many postprandial windows in the dataset at hand qualify to be used by ReplayBG.To do so, first we counted how many breakfasts, lunches, and dinners have been logged by the patients overall, then we computed, in terms of percentages, of many of these meals would have been potentially used to run assessments with ReplayBG.We found that patients logged a total of 135 breakfasts, 169 lunches, and 169 dinners (totaling 473 traces).Applying our selection criteria, we were able to extract 352 single meal traces which corresponds to roughly 74% worth of data.We acknowledge that we are losing some potential traces, however we also believe that this percentage is high enough to run ISCTs to preliminary validate new decision support methodologies using our tool on the dataset at hand.Furthermore, we also analyzed if the selection criteria leads to over representing specific time of day.We saw that the potential 352 traces were composed of 119 breakfasts, 112 lunches, and 121 dinners.As such, we can safely state that, at least for this dataset, we are not over/under representing specific time of the day.
The two selected CGM traces, reported in Figs. 4 and 5 with a red line, have been selected because the former presents an hyperglycemic event, suggesting that the insulin bolus has been probably underestimated, whereas the latter presents an hypoglycemic event, calling for an hypotreatment.These are two cases in which we would like to test if alternative treatments would have been able to improve glucose control.Here, we apply ReplayBG methodology to test the potential benefit that could come from the adoption of more advanced treatment solutions.As case study, to deal with the hyperglycemic event, we tested whether the insulin dose provided by the new formula for its calculation that we recently developed [29] could have been more effective than the one used by the patient.On the other hand, to deal with the hypoglycemic episode, we are Fig. 4. Portion of real data extracted from [26], [27] characterized by a postprandial hyperglycemic event.In red, the original CGM data.In dotted black, the ÎG(t) 50 profile obtained from ReplayBG using the original input data, i.e., the fitted median glucose profile; in shaded black the respective ÎG(t) 25 and ÎG(t) 75 profiles.In dotted blue, the ÎG(t) 50 profile obtained from ReplayBG by substituting the original meal insulin bolus with the one computed by [29]; in shaded blue the respective ÎG(t) 25 and ÎG(t) 75 profiles.Fig. 5. Portion of real data extracted from [26], [27] characterized by a postprandial hypoglycemic event.In red, the original CGM data.In dotted black, the ÎG(t) 50 profile obtained from ReplayBG using the original input data, i.e., the fitted median glucose profile; in shaded black the respective ÎG(t) 25 and ÎG(t) 75 profiles.In dotted blue, the ÎG(t) 50 profile obtained from ReplayBG by adding rescue carbohydrate intakes as suggested by [30]; in shaded blue the respective ÎG(t) 25 and ÎG(t) 75 profiles.
interested in understanding if the algorithm designed by Camerlingo et al. [30] could have avoided the event by generating a preventive hypotreatment.
Figs. 4 and 5 show the simulation results obtained from the application of [29] and [30], respectively, on the two considered portions of data.By comparing the black line, which represents the median IG(t) profile, i.e., ÎG(t) 50 obtained from the simulation step of ReplayBG, and CGM data (red), we can first observe that, in both cases, our methodology is able to describe the glucose-insulin system and and fit the CGM traces.
Focusing on the portion of data with hyperglycemia (Fig. 4), replaying the same scenario by substituting the original bolus with the one calculated by the algorithm of Noaro et al. [29], ReplayBG suggests that it would have been possible to prevent, on average, hyperglycemia without inducing any hypoglycemia.In particular, the algorithm increases the original insulin dose from 0.10 U to 1.88 U.The same qualitative results are observed in Fig. 5, when replaying the same scenario with the addition of the preventive hypotreatment suggested by the methodology of Camerlingo et al. [30], ReplayBG allows to retrospectively prove that, on average, it would have been possible to prevent the adverse event thanks to the assumption of one rescue carbohydrate intake of 20 g (assumed 45 minutes in advance with respect to the start of hypoglycemia, at 17:00).
It is interesting to analyse also the confidence intervals ÎG 25−75 (t) observed in the two scenarios (reported as blue shaded areas).It is clear that in this case the simulated glucose profile obtained in the first scenario is much more uncertain than the one achieved using the algorithm of Camerlingo et al.Such a difference is a direct implication of the uncertainty around the estimated model parameters obtained from step 1 of ReplayBG.In fact, as suggested by the intuition, the wider the posterior distributions of model parameters, the more the 1000 realizations of model parameters, θr , r = 1, . . ., 1000, are far apart, the more each of the resulting 1000 simulated glucose profiles is different, leading to a wider confidence interval.On the contrary, as in the case of the second case of study, the narrower the model parameter posterior, the closer the values of θr , r = 1, . . ., 1000, the less the resulting 1000 simulated ÎG(t) are different, leading to a narrower ÎG 25−75 (t).

V. DISCUSSION AND CONCLUSIONS
In this work, we proposed and evaluated ReplayBG, a simulation methodology for the in silico assessment of new or alternative treatments for people with T1D.By leveraging already collected data of insulin and carbohydrate intake and CGM, and a nonlinear physiological model of glucose-insulin dynamics identified at the individual level through a Bayesian parameter estimation procedure, ReplayBG provides an indication of what would have been possible to obtain, in terms of (interstitial) glucose time-course, using different carbohydrates intakes and/or insulin regimens.In the evaluation reported in the present article, ReplayBG reconstructed with high accuracy the IG concentration profile resulting from insulin bolus modulations (scenario 1), and snack/bolus additions (scenarios 4 and 5), for all conditions considered (with extreme cases included).ReplayBG also proved to be a reliable simulation methodology to test the modulation of the amount of meal intake (scenario 2) from −60% to +100%, and the modulation of the basal insulin infusion rate (scenario 3) from −40% to +20%.On the other hand, for greater alterations in scenarios 2 and 3, the performance of ReplayBG degraded resulting in a median MARD above 10%.However, the situations in which ReplayBG does not perform satisfactorily, i.e., large alterations of the basal insulin infusion rate and complete removal of meal carbohydrate amounts, are Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
extreme cases that are unlikely to be applied or tested in common real-life applications.
ReplayBG has also been compared with the state-of-the-art approach proposed by Hughes et al. [10], which, as ReplayBG, uses data already collected in people with T1D to predict the glucose time-course resulting from the adoption of alternative carbohydrate/insulin therapies.Results demonstrated that ReplayBG is able to better estimate the target IG trace than Net-Effect in all five considered scenarios, making it a viable and robust tool to run such analyses.Anyway, an important remark must be made.From the computational burden point-of-view, the Net-Effect is faster than ReplayBG, since it employs a linear model and a simpler identification procedure.For this reason, depending on the specific setup, in which the potential researcher needs to adopt a methodology like ReplayBG or Net-Effect, the user has to pay attention to this aspect, privileging Net-Effect when the computational resources are limited.Furthermore, we would like to acknowledge that it is possible that, in other more difficult and challenging scenarios (here not evaluated), as those encountered in real data, Net-Effect could achieve comparable or better performance than ReplayBG.Following this direction, and for the sake of fairness, our research group will focus on this peculiar aspect during future dedicated and diligent work.
A limitation of this work could relate to the validation procedure.Indeed, we acknowledge that it would have been preferable to compare the glucose predictions made by ReplayBG with real data rather than vs the simulated outcome of a more complex, but reliable simulator, i.e., T1DS.However, as discussed in Section I, the assessment solution here proposed is the only one that allows obtaining the glycemic ground truth resulting from an alteration of the original treatment regime, since it enables one to run multiple ISCTs maintaining the same surrounding conditions, which is impossible to achieve in the "real" setup.
Furthermore, the validation procedure may suffer from the so-called "model-matching" bias since the model we used to generate the synthetic data [12], thus the ground truth, is, in some aspects, similar to the model to be identified.However, two main points must be taken into account.First, the model used in ReplayBG is composed of three main subsystems, i.e., (1), (2), and (4), and each of which differs from the one used in the UVa/Padova simulator [12] in a certain way.Starting from the closest, the subcutaneous insulin absorption subsystem neglects the direct flux between the non-monomeric insulin state compartment (I sc1 ) and the plasma insulin compartment (I p ) modeled in [12].The oral glucose absorption subsystem describes glucose transit through the stomach and upper small intestine as a simple linear chain of three compartments (Q sto1 , Q sto2 , and Q gut ) unlike the model of [12], which has a nonlinear structure to enable the description of both the biphasic nature of gastric emptying and the nonlinear dependence on gastric emptying of liquids on the size of the meal, its energy density, and the amount of nutrient in the stomach.Finally, the core glucose-insulin kinetics subsystem is a minimal model that does not account for the many different factors that, on the other hand, are modeled in [12].In turn, these simplifications plus other unmodeled phenomena (e.g., glucagon subsystem) make the model of ReplayBG still distinct, by many aspects, from the model used to generate the target ground-truth, mitigating the model-matching bias.Second, as already stressed, when validating methodologies such as ReplayBG, where it is necessary to evaluate if the approach under assessment can correctly reproduce the impact of certain input alterations (carbohydrate and insulin) on a given output (in our case, glucose) of a baseline scenario maintaining the same surrounding conditions, this bias is not totally avoidable, since data must be generated via a simulation tool.In fact, this is a common problem faced by other literature techniques as Patek et al. [8] and Hughes et al. [10], where they used the same validation procedure deployed in this work and the same maximal physiological model to generate the synthetic data.It is also important to note that the potential user of ReplayBG must be careful when choosing the portion of data to replay.As other similar methodologies for the scope, e.g., [8] and [10], ReplayBG formulates glucose predictions whose reliability depends on the quality of the data, e.g., reported meal/insulin inputs, whose correctness is key to model identification.In fact, incorrect model inputs, such as incorrectly entered meal carbohydrate amounts, inevitably affect first the model parameter estimates and, secondly, the glucose output predictions when different altered inputs are used to "replay" the scenario at hand.To mitigate this issue, future work will explore the possibility of adding, as a preliminary step to model identification, the application of ad-hoc dedicated modules to detect and correct potential data errors and faults, e.g., the approaches proposed by Meneghetti et al. [31], [32].Future work will also deal with the possibility of releasing the assumption that all model states start from steady state.This will be possible by modifying the identification scheme of ReplayBG to also include the estimation of the corresponding initial conditions.
It is worth mentioning that the proposed ReplayBG methodology can be potentially employed for a multitude of other purposes, such as refining and tuning state-of-the-art decision support algorithms and leveraging it as the methodological core of new educational tools for T1D.To expand the domain of applicability of ReplayBG, future work will concern the refinement of the methodology to better describe glucose dynamics by expanding the physiological model of ReplayBG to include a subsystem describing physical exercise [33], [34].Furthermore, the methodology will be validated in more challenging multiple meal/multiple day scenarios.Indeed, the ReplayBG methodology, as it is presented and validated in this manuscript, limits its domain of applicability to single-meal scenarios, where it is possible to assume that the underneath system is stationary.In the direction of releasing this assumption, current undergoing work in our laboratory is focusing on expanding the Re-playBG methodology to multiple-meal scenarios by introducing dedicated model parameters able to capture, for example, the intra-/inter-day insulin sensitivity variability one can observe in real subjects [35], multiple mixed meals with different content, stress, and natural physiological fluctuations that will impact glucose dynamics in ways that are not included in the proposed model.Finally, as previously done in [29], ReplayBG will be deployed within the common pipeline used to validate newly developed algorithms for T1D treatment.In these terms, we believe that ReplayBG could be a useful tool to help researchers in the field to test their methodologies, for example, new insulin bolus calculators, hypotreatment strategies, and, in general, decision support systems, for people with T1D.For this reason, Re-playBG is freely available to the community as open source software on GitHub (https://github.com/gcappon/replay-bg).This will hopefully allow leveraging the potential of the developer community to maintain and improve ReplayBG over time.Of note, it is worth underlining that the proposed approach provides just an indication of the performance of the treatment under assessment and it is not intended to substitute at any level its validation via formal clinical trials.

Fig. 1 .
Fig. 1.Overview of the two steps of ReplayBG.Step 1: model parameters are identified from CHO, insulin and CGM data already collected by the individual.Step 2: the model parameters identified in Step 1 are given in input to model (6) to simulate the (interstitial) glucose concentration that should have been obtained, in the same individual and in the same time window, by adopting an alternative insulin and/or CHO therapy.

Fig. 2 .
Fig. 2. Target simulated IG profiles (continuous lines) compared against the respective IG traces obtained from step 2 of ReplayBG (dashed lines) when the original meal and insulin input data are altered as defined in scenario 1 (a), scenario 2 (b), scenario 3 (c), scenario 4 (d), and scenario 5 (e) in a representative subject.Legends indicate the input alteration applied in the specific scenario corresponding to the respective IG profile.

Fig. 3 .
Fig. 3. MARD boxplot distributions obtained using ReplayBG (in red) and the Net-Effect methodology (in blue) when the original meal carbohydrate intake is altered as defined in scenario 1 (a), scenario 2 (b), scenario 3 (c), scenario 4 (d), and scenario 5 (e) in a representative subject.Legends indicate the input alteration applied in the specific scenario corresponding to the respective IG profile.Dashed black line indicates the MARD = 10% threshold.
−1) is the rate constant of gastric emptying; k abs (min −1 ) is the rate constant of intestinal absorption; CHO (mg/kg/min) is the ingested carbohydrate rate.Model (2) allows to estimate the rate of glucose appearance in plasma Ra (mg/kg/min) as:

TABLE I SCENARIO
1: MEDIAN [5TH-95TH PERCENTILE RANGE] OF MARD 1 AND RMSE 1 OBTAINED WITH REPLAYBG AND NET-EFFECT