A Practical Electronic Health Record-Based Dry Weight Supervision Model for Hemodialysis Patients

Objective: Dry Weight (DW) is a typical hemodialysis (HD) prescription for End-Stage Renal Disease (ESRD) patients. However, an accurate DW assessment is difficult due to the complication of body components and individual variations. Our objective is to model a clinically practicable DW estimator. Method: We proposed a time series-based regression method to evaluate the weight fluctuation of HD patients according to Electronic Health Record (EHR). A total of 34 patients with 5100 HD sessions data were selected and partitioned into three groups; in HD-stabilized, HD-intolerant, and near-death. Each group’s most recent 150 HD sessions data were adopted to evaluate the proposed model. Results: Within a 0.5 kg absolute error margin, our model achieved 95.44%, 91.95%, and 83.12% post-dialysis weight prediction accuracies for the HD-stabilized, HD-intolerant, and near-death groups, respectively. Within a 1%relative error margin, the proposed method achieved 97.99%, 95.36%, and 66.38% accuracies. For HD-stabilized patients, the Mean Absolute Error (MAE) of the proposed method was 0.17 kg ± 0.04 kg. In the model comparison experiment, the performance test showed that the quality of the proposed model was superior to those of the state-of-the-art models. Conclusion: The outcome of this research indicates that the proposed model could potentially automate the clinical weight management for HD patients. Clinical Impact: This work can aid physicians to monitor and estimate DW. It can also be a health risk indicator for HD patients.


I. INTRODUCTION
Chronic Kidney Disease (CKD) characterized by the gradual loss of kidney function is commonly recognized as one of the most severe global public health problems. CKD affected more than 753 million people and caused 1.2 million deaths in the year of 2016 [1]. The disease is the 18-th leading cause of death in globally [2]. The end-stage of CKD is called End-Stage Renal Disease (ESRD), and patients with ESRD must take Renal Replacement Therapy (RRT) including dialysis and transplantation. As of 2010, the number of individuals receiving RRT worldwide was reported as 2.6 million, with over two-thirds of them relying on Hemodialysis (HD) treatment. By the year of 2030, the number is expected to double roughly [3]. Although developing countries provide the most market for HD consumptions, the medical conditions and quality of HD there are limited. The availability of experienced clinicians and well-conditioned HD centers [4] can hardly meet patients' demands. The urge VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ to expand high-quality HD services has been brought to attention.
The primary objective of HD is to remove the free water that is excreted by healthy kidneys in the form of urine from the blood via Ultrafiltration (UF). To determine the UF volume for HD session, clinicians must evaluate the patients' fluid statuses with the prescription of Dry Weight (DW), which is defined as patients' lowest tolerated post-dialysis weight at which the patients have the minimal signs or symptoms of hypovolemia or hypervolemia [5]. Overestimation or underestimation of DW causes harmful consequences including cardiovascular disease, malnutrition, and increased rate of hospitalization and mortality [6]. Though many methods are proposed to approximate DW [7]- [10], none of them can be adopted as a golden standard or has shown better survival benefits [11]. The difficulty in estimating DW stems from variations and uncertainties of the human system, such as nutrition status, underlying illness, commodities, etc. Thus, the trial-and-error method is still widely applied in clinical situations. Experienced clinicians still estimate DW with the closest accuracy to the truth, but it requires an incredible amount of time and effort. We aim to develop personalized DW estimation models and implement automatic DW evaluation applications to supervise HD patients' weight adjustment behaviors. The proposed models could enhance the quality of HD and help to fulfill the enormous demands of highquality dialysis services. The following segment contains a brief overview of the relevant works.

A. RELATED WORK
Time series modeling aims to establish the connections between observed data and future events. It fits excellently with demands on medical applications where patient records often correlated with time and observatory markers. Many studies have focused on time series model developments [12], [13]. We summarized the conventional methods into two categories: (1) statistics-based techniques -works under this class usually elaborate models with problem-dependent assumptions, such as linearity, periodicity, data distributions, the order of the model, and more. Autoregressive Integrated Moving Average (ARIMA) is a representative model in a stochastic time series analysis [13]. According to different applications, it derives various subclass models, such as Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), and Seasonal ARIMA (SARIMA) [14]- [16]. Though there are limitations for statistics-based methods, they have still proven capable of achieving remarkable performances with specific problem.
(2) Machine learning based methods -works in this category have high feasibility to solve real-world problems with the characteristics of non-linearity, minimal knowledge of priori distributions, high dimensional variables. Artificial Neural Network (ANN) [17] is a a typical scheme that can approach the complex system with rigorous precision. There are also many variations of ANN-based methodologies, such as Long Short Term Memory (LSTM) network [18], Time Lagged Neural Network [19], and other implementations [20]. We also noticed that Random Forest (RF) [21], Support Vector Machines (SVM) [22], and Bayesian Networks (BN) [23] are wildly applied in literature because of their steady performance. Worth mentioning is that machine learning schemes face the difficulties in parameter optimizations, over-fitting functions, and model interpretability. Though many techniques have been proposed to provide solutions to these problems, some questions remained to be solved.
Studies exist on the use of machine learning assisted HD applications. For example, in an HD anemia treatment, the ANN method [24] was utilized to determine hemoglobin levels in HD patients, and [25] a reinforcement learning method to optimize the dose of erythropoiesis-stimulating agents has also been proposed. The RF model [26] has been applied to predict HD patients' cardiovascular risk, and a decision tree method [27] to detect early Arteriovenous Fistula (AVF) Failure has been adopted. In regards to HD quality control, a Temporal Abstractions (TA) method [28] to monitor the quality of HD process has been proposed, and an applied Bayesian network [29] to recognize patient temporalstate transition patterns and detect the exception events is in place. A study [30] proposed a Bioimpedance analysis (BIA) based on a multiple variable regression model to predict DW, and the accuracy is controlled within 0.5 kg with a standard deviation of 2 kg. Research [31] has applied the Multi-Layer Perceptron (MLP) neural network to predict DW using the inputs of patients' BIA and blood volume monitoring data, with findings revealing a 0.5 kg outcome with a standard deviation of 1.3 kg. Although studies [30], [31] have reported significant DW modeling progress, the model remains dependent on crowd data, which may lead to data bias. Moreover, we propose that DW is a dynamic value, and a personalized training model is necessary for achieving the better precision. The studies listed here illustrate the remarkable potential of the application of machine learning methodology in HD.
The rest of the paper is organized as follows: Section II explains the proposed methodology, Section III shows the experimental results and comparisons with other methods, and Section IV is a detailed discussion based on the observed results. The final part is the conclusion.

A. METHOD OVERVIEW
Theoretically, precise DW can hardly be obtained. We treated the trial-and-error post-dialysis body weight as the domain knowledge to approximate DW value. It is worth mentioning that all the patients enrolled were received an extended period of care from medical experts, and all DW-assessed Electronic Health Record (EHR) data are from the HD center of the Huashan Hospital. The physiological variables are stored by automatic transmission from the dialysis machine or hospital information system, which makes the research data highly authentic and complete. We retrieved the HD sessions data of each HD patient and trained a customized post-dialysis weight estimation model. To explore models' dependencies on life conditions, we partitioned patients into three groups, namely, HD-stabilized, HD-intolerant and near-death. The clinical data were collected from the devision of Nephrology, the Huashan Hospital. This research was approved by the Ethics Committee at the Huashan hospital and written informed consent was collected from all study participants.

B. PARTICIPANTS
A total of 34 patients with 150 HD entries of EHR data for each were selected for the research. Participants were divided into three groups: (1) HD-stabilized group -participants in good health conditions without any sign of hypovolemia and hypervolemia or underlying illness; (2) HD-intolerant group -participants showing typical symptoms of hypotension during an HD session indicating HD-intolerant status; (3) Near-death group -participants were dead one week after the last recorded data were collected. The statistical characteristics for enrolled patients are shown in Table 1.
This study aimed to establish a DW supervision system to monitor the weight adjustment behavior of HD-stabilized patients. It is a necessary measure to setup control groups to observe evidence of inaccurate predictions that may refer to exceptional medical events. Therefore, HD-intolerant and near-death groups were put in place. Details on the results from different groups are discussed in later sections.

C. EHR DATA FORMAT
All data were auto-collected from the dialysis machine and hospital information system. Each HD session contains the following data entries: (1) Patient ID, which is a unique code for patient identity; (2) Date of HD; (3) Pre-dialysis and post-dialysis Blood Pressure (BP); (4) Pre-dialysis and postdialysis body weight. (5) UF volume, which is the amount of fluid removed from HD patients. In this study, we excluded the BP data since they are open variables results from factors, such as patient mind status, medications, food, and others.

D. MISSING DATA TREATMENT
We found the original EHR data missing a few weight values. The data frame was discarded if both pre-dialysis and post-dialysis weights were missing. Otherwise, the missing weights were estimated from the equation (1).
where w pre is pre-dialysis weight, w post is post-dialysis weight, u is the volume of UF (mL), and δ is the average weight draft which is defined by equation (2).
Though combining the mean UF with its bias and pre-/post-dialysis weight can approximate the missing weight values. Individually, the post-dialysis weight is not simply a case of pre-dialysis weight minus UF volume (L). Figure 2 scatters the dialysis weight difference and UF volume (L). In most cases, the UF volume (converted to weight) is higher than the weight difference because of the supplementary normal saline at the end-stage of dialysis section. The implantation dose of normal saline depends on the clinical status of a patient. In a few cases the UF volume is less than the weight difference because of the excretion during the dialysis. Very few patients had residual kidney functions allowing them to urinate, causing further weight loss. Considering the above scenarios where the UF volume may indicate the dry weight status, we included the UF volume as a variable.

E. MODEL FEATURES 1) HISTORIC WEIGHT
The feature domain F is defined by equation (3).
where W k = {w 1 , w 2 , ..., w k }, U k = {u 1 , u 2 , ..., u k }, k is the most recently k sessions measurements of weight and UF, W pre 0 is the current pre-dialysis weight.

2) OUTLIER REMOVAL
We filtered the outlier data via the differences in pre-and post-dialysis weight values to obtain a robust mathematical model. The training set excluded the largest and the smallest weight differences in data entries with a rate of 10% for each side.

F. OBJECTIVE 1) EXPONENTIAL AVERAGE DIFFERENCE
From clinical observations, the post-dialysis weight is timesensitive. We used the Exponential Moving Average (EMA), which is defined by equation (4), to model the time decay effect of input features. where f i is the i-th input of the variable f , α i is a timeattenuated weight factor, as shown in equation (5).
The model objective, namely the EMD of post-dialysis weight is defined by equation (6).
EMD exhibits good stationary properties whose statistical information is relatively constant over time. Thus, instead of directly connecting data features to w post , building a model dependent on EMD simplifies the future weight forecasting by utilizing system regularity. Figure 4(a) illustrates the difference between w post , EMA(W post ), and EMD(W post ) data.

G. MODELING METHOD
In this study, the EMD value is approximated by equation (7).
where EMD base describes the average data tendency, and EMD is the local correction factor. The detail implementations are discussed as the follows.

1) DATA NORMALIZATION
All input features are scaled to zero mean and unit variance through equation (8).
where f i is the i-th element of input feature f , Mean() is the mean value function, and STD() is the standard variance function.

2) LINEAR REGRESSION
We used the naive linear regression method to model the EMD baseline (EMD base ). The formula is defined by equation (9).
where λ represents the linear regression coefficients that are solved using the least square method. An illustration of an EMD base modeling is shown in figure 4(b).

3) GAUSSIAN KERNEL-BASED CALIBRATION
The nonlinearity of the model draft is modeled with the local weighted regression method which is defined by equation (10).
where f i is weighted training data based on the current input sample X and gaussian kernel method which is defined by equation (11) [32].
where τ is a constant parameter. The weighted regression method can approximate the local fluctuation efficiently by biasing the Eulerian nearest training points from the history data repository. Gaussian kernel-based local regressions are illustrated in figure 4(c)-(d).

4) MODELING FLOW
The modeling flow is summarized in figure 1. We established the personalized post-dialysis weight model via four steps: (i) Extracting patients' EHR data from the dialysis information system. The data of interest is defined by equation (3); (ii) Performing the data pre-processing procedure to remove outliers and fill missing values; (iii) Mixing the EMD regression tendency (EMD base ) and local regression (EMD ) results to approach the EMD value; (iv) Post-dialysis weight VOLUME 7, 2019 was calculated using equation (6), where EMA(W post ) is a known variable.

H. METHOD IMPLEMENTATIONS
The ARIMA method is implemented with statsmodels [33] package, the RF method is implemented with Scikitlearn [34] package, and the LSTM is implemented with Ten-sorFlow [35] and Keras [36] packages. Figure 3 illustrates k as being insensitive to the model when smaller than 5.
On the other hand, the model accuracy degrades as k increases because larger k values bring up more features that make the model harder to fit. We set the parameter k in equation (3) to 5 (16 input features) for the following reasons: (1) DW is an interval value. Larger k values may give more reliable results.
(2) Setting k at 5 in clinical practice can achieve a reasonable observation time window that is equivalent to 2 weeks follow-up of the patients who visit the HD center 2-3 times a week. The ARIMA model is fitted with the parameters with a lag order of 5 for autoregression, difference order is 1 for stationary data enhancement, and moving average order is set to 0. RF is set up with a maximum depth of 2 for each estimator, and the total number of estimators is 200. LSTM is constructed with one hidden layer and 16 units, and the time step is 1. The same training and testing data are fed to all methods.

I. PERFORMANCE MATRICES
The model quality is evaluated via multiple measures: the Mean Absolute Error (MAE) in equation (12) that emphasizes the overall error in magnitude, the Mean Absolute Percentage Error (MAPE) in equation (13) that addresses the relative mismatch of predictions, the Mean Squared Error (MSE) in equation (14) that weights more on large deviations and the coefficient of determination r 2 that measures the fitness of regression in equation 15.
where EMD is the mean value of EMD testing samples defined in equation (16), and n is the number of testing samples. A perfect predictive model occurs if MAE = 0, MAPE = 0, MSE = 0, and r 2 = 1, otherwise, the larger the deviation, the worse the model is.

III. EXPERIMENTAL RESULTS
The proposed method was running on a 1.1 GHz Intel core M CPU with 8G 1600 MHz DDR3 memory. The average runtime of the proposed model (training and predicting for one patient) is 0.25s.

A. ERROR DISTRIBUTIONS
We evaluated the accuracy of the proposed method with the absolute and relative error matrices, where the absolute error is a non-negative kilogram weight difference between patients' actual post-dialysis weights. The relative error is the absolute error that divides the patients' post-dialysis weights. As a rule of thumb, the absolute weight error would be kept within 0.5 kg, and the relative error would be kept within 1% by the clinical staff to gain the best treatment for HD patients. We fitted the gamma error distributions in figure 5. The proposed model showed smaller variations and average errors when comparing HD-stabilized patients with patients from the other two groups. This result indicated that HD-intolerant and near-death patients' recent historical weight data might consist of inner chaos that is challenging to model the regularity. In this sense, larger prediction errors were generated. Overall, the value of the estimated Cumulative Distribution Function (CDF) of the gamma distribution with a ≤ 0.

B. MODEL COMPARISONS
The mean, standard deviation, and 95% confidence interval for the models discussed in previous sections are summarized in Table 2. The experimental results showed that the proposed model outperformed other methods in terms of mean results regarding all patient groups. The mean of MAE for the HD-stabilized group of the proposed method was 0.17 kg, which is equivalent to 37%, 32%, and 26% improvements compared with ARIMA, RF, and LSTM models, respectively. The MAPE and MSE matrices also confirmed that our proposed method had the best error margin. The r 2 measurement indicates that the model proposed is the most suitable, as it was 66%, 42%, and 19% better than the ARIMA, RF, and LSTM models, respectively, in HD-stabilized group.
The p-value corresponding to the F-statistic of one-way ANOVA [38] in Table 3 suggested that the proposed method was significantly better than one or more methods regarding HD-stabilized group. We then used T-statistic of Bonferroni-Holm [39] in Table 4 to isolate the significance with other methods. The p-value showed that the proposed method was significantly better than ARIMA and RF methods (HD-stabilized group, all metrics). The proposed method showed significantly improved in MAPE (p<0.05) and on the edge of the significance in MAE (p=0.053), MSE (p=0.064), r 2 (p=0.077) when comparing with LSTM for HD-stabilized group. Figure 6 is an intuitive illustration of the comparisons of different algorithms from one HD-stabilized patient.

IV. DISCUSSION
The prediction accuracy degraded with worsening patient conditions. The near-death group manifested a substantial accuracy deviation compared with the HD-stabilized group, indicating that the machine learning and statistical-based methods can potentially follow the regularity of the human system. When exceptional medical events happen, the chaos weight adjustment behavior may damage the stability of the model training, which could lead to more significant deviations. From this point of view, the proposed model can behave like a marker to track HD patients' regularity and exceptional medical events. Naturally, patients with stable life signs may exhibit a better regularity than unstable patients. Since most of the HDs are undertaken at HD centers with nursing staff, it potentially lacks expert knowledge to timely tune or to rise awareness of the risk factors, such as subtle changes in patients' conditions. The proposed method in this research can supervise and record the weight adjustment behaviors efficiently, such that any operation that drafts off the prediction to a certain threshold will produce a medical warning to alert patients and doctors, and help to buildup personalized HD plans. As aforementioned, the weight prediction model is from the personalized data record, and it avoids the crowd bias of population data and matches the concept of personalized medical care.

V. CONCLUSION
This paper proposes a scheme that could process EHR data and establish a DW estimation and supervision model for HD patients. As far as our knowledge goes, this is the first work to have developed a personalized model based on mature DW assessment dataset to monitor DW adjustment behavior. We kept the absolute error under 0.5 kg, and relative error under 1% for most HD-stabilized cases included. This accuracy suggests that this model has great potential for utilization in dialysis centers monitoring the quality of HD automatically. We also observed a more substantial predictive deviations in the HD-intolerant and near-death groups, which indicates that the proposed model should be revised by exceptional medical events to fit situations of clinical instability. The proposed method can be consolidated by further clinical verifications.