Electrocardiographic Machine Learning to Predict Left Ventricular Diastolic Dysfunction in Asian Young Male Adults

Left ventricular diastolic dysfunction (LVDD) occurs at the initial stage of heart failure. Electrocardiographic (ECG) criteria and machine learning for ECG features have been applied to predict LVDD in middle- and old-aged individuals. The purpose of this study is to clarify the performance of machine learning in young adults. Three machine learning classifiers including random forest (RF), support vector machine (SVM) and gradient boosting decision tree (GBDT) for the input of 26 ECG features with or without 6 other biological features (age, anthropometrics and blood pressures) are compared with the corrected QT (QTc) interval, a traditional ECG criterion for LVDD. The definition of LVDD is based on either one of the following echocardiographic criteria: (1) an E/A ratio of mitral inflow < 0.8; (2) a lateral mitral annulus velocity e’ < 10 cm/s; and (3) an E/e’ ratio >14. The best areas under the receiver operating characteristic curve were observed in machine learning of the RF for ECG only (84.1%) and of the SVM for all ECG and biological features (82.1%), both of which were superior to the QTc interval (64.6%). If the specificity is chosen to be approximately 75.0%, the sensitivity of the RF for ECG only reaches 81.0% and that of the SVM for all ECG and biological features is raised to 85.7%, both of which are higher than 47.6% by the QTc interval. This study suggests that using machine learning for ECG features only or with other biological features to predict LVDD in young Asian adults is reliable. The proposed methods provide for the early detection of LVDD for young adults and are helpful for taking preventive action on heart failure.


I. INTRODUCTION
Left ventricular diastolic dysfunction (LVDD) occurs at the early stage of heart failure [1]. In the initial phase of cardiac remodeling, the left ventricle becomes less compliant and stiffer, impairing diastolic or relaxation function and thus elevating left ventricular end-diastolic and pulmonary wedge pressure during stress or rigorous exercise [2]. The presence of LVDD has been reported with an increased risk of morbidity and mortality in those with preserved left ventricular ejection fraction [3], [4]. Therefore, it is important to screen LVDD early in the general population. Echocardiography is a classic examination used for diagnosing LVDD in The associate editor coordinating the review of this manuscript and approving it for publication was Kin Fong Lei . symptomatic individuals [3]. LVDD usually presents at midlife and the prevalence is estimated to be 11%-35% [4]- [6]. Several cardiovascular risk factors such as obesity, arterial hypertension, diabetes, dyslipidemia and coronary heart disease, that commonly develop at midlife have been associated with LVDD [7]- [9]. However, the prevalence of LVDD among young adults is low, estimated at 1.1% with severe diastolic dysfunction and 9.3% with abnormal relaxation in the Coronary Artery Risk Development in Young Adults (CARDIA) study [10]. The CARDIA study revealed that left ventricular diastolic filling in young adults was related to age, sex, body weight, left ventricular systolic function, heart rate, blood pressure, lung function and physical fitness but not to electrocardiographic (ECG) left ventricular mass index [10], [11]. In middle-and old-aged individuals, prolongation of the corrected QT (QTc) interval ≥ 435 ms defined by Bazett's formula [12], which represents the diastolic phase of electrical repolarization and left ventricular relaxation, has been identified to be the most powerful ECG parameter, providing good performance in predicting LVDD (area under the curve (AUC) of the receiver operating characteristic curve (ROC): 82%, sensitivity: 73% and specificity: 74%) [13]. In one study of individuals whose ages were mainly between 40 and 55 years [14], the cutoff point of QTc = 395 ms were selected to screen for LVDD, and the sensitivity and specificity were 81% and 79%, respectively. However, this study is limited by the inclusion of merely 140 subjects referred to a treadmill exercise test and a prolonged QTc, possibly for the presence of microvascular angina rather than LVDD alone [15]. Whether using the conventional ECG criteria can predict LVDD in young adults remains unknown, as there have been no studies exclusively focused on healthy young individuals.
Machine learning, a category of computational statistics in artificial intelligence, has been used for processing the big data of medicine to forecast the presence of disease [16]- [30]. For instance, [16] used a support vector machine (SVM) classifier for both ECG and anthropometric characteristics to predict left ventricular hypertrophy, and the performance was superior to the conventional ECG criteria. Using machine learning techniques has become increasingly popular in the real world, and could provide reliable and efficient information for clinical physicians to diagnose and manage diseases. Some studies have investigated the performance of machine learning techniques for the features of traditional ECG or signal processed surface ECG with or without other biological characteristics to predict the presence of LVDD, and all were applied to middle-and old-aged individuals [27]- [30]. In general, the AUCs of the ROC were acceptably good, ranging from 83% to 91%, and [27] discovered that the overall performance in the subgroups of subjects aged less than 60 years, male sex and Asian race/ethnicity was relatively inferior to their counterparts, highlighting a necessity for further study on these specific populations.
This paper compares the performance of various machine learning classifiers for traditional 12-lead surface ECG features with or without other vital biological features in a large sample of young Asian males in Taiwan to predict LVDD, as revealed in Fig. 1. The rest of this paper is described as follows. In Section II, the materials are outlined. In Section III, the applied machine learning classifiers for feature training are presented in detail. Section IV reveals the experimental results. In Section V, the conclusion of this study is offered.

II. MATERIALS
This paper utilizes a population of 2,206 military males, aged 17-43 (average 27.99) years, from the ancillary cardiorespiratory fitness and hospitalization events in armed forces heart (CHIEF Heart) study in the Hualien Armed Forces General Hospital, the main referral hospital for military personnel in Eastern Taiwan. All participants underwent a regular annual health examination for anthropometric measurements, blood tests and a plain chest roentgenogram in 2016-2020. A conventional 12-lead ECG and a transthoracic echocardiography were also performed to assure their cardiac health for military rank promotions and awards on the same day. The main and ancillary study designs have been described in detail previously [31]- [41].
The interpretations of the 12-lead ECG features are made via Schiller AG CARDIOVIT MS-2015 (Baar, Switzerland) or Philips TC 70 CARDIOGRAPH (Amsterdam, Netherlands) software. The ECG report is required with a smooth baseline and visually interpretable; otherwise, the ECG is repeated to ensure good quality. The built-in filters of the ECG machine used to remove the signal noise include artifact filter, AC filter and baseline wander filter. The artifact filter is a low-pass filter used to eliminate skeletal muscle artifact. The filter removes up to 50 µV of signals in the 5 Hz to 150 Hz frequency range. This may affect P waves and the entire QRS-T complex. The AC filter removes 60 Hz power line interference created by the magnetic fields associated with electrical power interacting with the lead wires. The baseline wander filter suppresses all frequencies below 0.5 Hz to eliminate the slow (typically 0.1 -0.2 Hz) drifting of the ECG baseline up or down during ECG recording. Real-time two-dimensional imaging and Doppler transthoracic echocardiography were acquired and displayed by a Philips IE33 ultrasound (Amsterdam, Netherlands). Both the reports of ECG and echocardiography were reviewed and approved by a certificated cardiologist, the lead author Gen-Min Lin. The 26 ECG features selected as the input for automatic training in the supervised machine learning classifiers include heart rate and the features in lead II regarding the P wave duration as well as the PR, QRS and QT intervals, and the P, QRS and T waves' axes. In addition, the voltages of the R wave in all limb leads I, II, III, aVR, aVL and aVF, and the voltages of both the R and S waves in all precordial leads V1-V6, where every 10 mm equals to a voltage of 1 mV. There were 6 important biological features as additional inputs in the machine learning including age, body height, body weight, waist circumference, systolic and diastolic blood pressures. The output related to the machine learning is LVDD. According to the 2016 updated recommendations from the American Society of Echocardiography and the European Association of Cardiovascular Imaging [42], LVDD was identified if either one of three echocardiographic criteria was met: (1) a mitral inflow conventional Doppler E/A ratio <0.8; (2) a lateral mitral annulus velocity tissue Doppler e' <10 cm/s; or (3) an E/e' ratio >14. The prevalence of LVDD in young male adults was 4.26% (94/2206), and the biological and ECG profiles of those with LVDD and those without LVDD are shown in Table 1. The study protocol and design were approved by the Institutional Review Broad of Mennonite Christian Hospital (No. 16-05-008) in Hualien City, Taiwan.

III. PROPOSED METHODS
This paper utilizes three machine learning classifiers including the random forest (RF) [43], support vector machine (SVM) [44] with Gaussian kernel and gradient boosting decision tree (GBDT) [45], for the prediction of LVDD in young Asian males in Taiwan.

A. DATA PREPROCESSING
To solve different dynamic ranges among the input variables, we apply the normalization of min-max scaling [46] for the input data to execute a linear transformation on the original data. The actual data of each feature are adjusted to a normalized value within the range of 0 to 1.

VOLUME 9, 2021
The data of 2,206 military males are segmented into two sets, one set for training and validation and another set for testing, with a ratio of 3:1 as shown in Table 2. The training and validation set was partitioned into three roughly equal size groups. Among the three groups, one group was treated as the validation set for validating the model, and the remaining two groups were treated as the training set. The cases of LVDD were distributed evenly in these three groups. The data numbers of the three folds are described in detail in Table 3. As the prevalence of LVDD in young male adults is close to 5%, an imbalance in the sample sizes between the non-LVDD and LVDD cases is obvious. For example, in Table 3, with regard to the 1st cross-validation, the case numbers used for training are 1,102 (non-LVDD: 1,049, LVDD: 53) and for validation are 552 (non-LVDD: 532, LVDD: 20). The solution for this imbalance issue is to increase the cases of LVDD in preprocessing by applying the synthetic minority oversampling technique (SMOTE) [47]. The principal concept of SMOTE is to create new minority class samples by randomly choosing a near minority class neighbor and interpolating. The cross-validation process is accordingly repeated three times. Three AUCs of the precision-recall (PR) curves produced from the three folds are averaged as a single performance of the results. A better generalization assessment of the performance for training can thus be obtained by using a 3-fold cross-validation.

B. MACHINE LEARNING MODELS
Random forest (RF) [43] builds multiple decision trees and assembles the results together for classification to predict the presence of LVDD. The RF constructs decision trees (DTs) with different input features and initial values selected by the bootstrap aggregating (bagging) strategy. The 26 ECG parameters or 6 incorporated biological parameters (total 32 parameters) are the inputs for the proposed method. The model of CART (classification and regression tree) [48] by minimizing Gini impurity is adopted for the individual DTs. The final non-LVDD/LVDD prediction result is determined by the majority-voting of the prediction obtained from the DTs.
The number of decision trees is the optimized hyperparameter in the proposed algorithm.
The support vector machine (SVM) [44] with a Gaussian radial basis function (RBF) kernel [49] is utilized for our proposed method. Each data point is treated as a 26-or 32-dimensional vector. The SVM separates all of the data points with the maximum-margin hyperplane to maximize the distance from the nearest training data point of any class (non-LVDD or LVDD) to the maximum-margin hyperplane. Since the training data are not linearly separable, with the target of transforming nonseparable data points into a separable form, the adopted nonlinear SVM model with the RBF kernel maps all of the data points into higher-dimensional points. The hyperparameter γ controls how far the influence of a single training data reaches and is optimized in our algorithm. The soft-margin SVM, which allows a wide decision margin and some outliers on the wrong side or inside of the margin, is used in the proposed method. A regularization technique with a constraint by a regularization term can fit the training data and avoid overfitting. The 2-norm is adopted in our SVM method. The regularization parameter can trade off correct classification of the training data against maximization of the decision margin and is also optimized in the proposed method.
The gradient boosting decision tree (GBDT) [45] constructs multiple additive DTs and is also an ensemble machine learning technique. The GBDT adaptively combines week learners (binary split DT) into a single strong learner by exploiting gradient-based optimization and boosting to yield the better performance. The pseudoresiduals of preceding cumulative DTs are predicted for each successive DT, and the training minimizes the mean squared error in an iterative manner. The maximum number of iterations for training is set to 100. The hyperparameter of our applied GBDT to be optimized is the maximum tree depth of DT to avoid overfitting.
Since the prevalence of LVDD in young adults in our experimental dataset was 4.26%, the data were heavily skewed or imbalanced. The AUC of the PR curve may be a better choice for performance evaluation, and SMOTE is a good data augmentation technique for imbalanced datasets. After choosing the optimized hyperparameter by grid search with the largest AUC of the PR curve averaged from the 3-fold cross-validation for each machine learning classifier, the training models are determined by the data in the set of training and validation, in which the data for the LVDD group are preprocessed by SMOTE, and the number is increased to 1,581.

IV. EXPERIMENTAL RESULTS
The software, scikit learn v0.20.2, with the Python programming language [50] is used for the implementation of the proposed method. The optimized hyperparameters for the three machine learning classifiers and two kinds of input features are tabulated in Table 4. For example, the resolution for grid search is one tree for random forest classifier. The three  machine learning classifiers determine the most appropriate test cutoff values based on the criterion of specificity in the range of 70%-80%. The performance evaluation in testing set for the prediction of LVDD was assessed by sensitivity, specificity, accuracy, precision, F 1 score, the AUC of the receiver operating characteristic (ROC) curve [51] and the AUC of the PR curve. Eqs. (1)-(4) represent the sensitivity, specificity, accuracy and precision defined by true positive (TP), true negative (TN), false positive (FP) and false negative (FN) results. Eq. (5) denotes the F 1 score defined by the precision and recall (sensitivity).
The AUCs of the ROC and PR curves are revealed in Fig. 2. The largest AUCs of the ROC curves were observed in machine learning of the RF for ECG only (84.1%) and of the SVM for all ECG and biological features (82.1%), both of which were superior to the QTc interval (64.6%). The largest AUCs of the PR curves are seen in the RF for both kinds of input features. In addition, Table 5 provides a comparison of the performances of the three machine learning classifiers with the QTc interval [13]. When the specificity was set to approximately 75.0%, the sensitivity of the RF for ECG only reached 81.0% and that of the SVM for all ECG and biological features increased to 85.7%, both of which were higher than 47.6% by the QTc interval. In this case, the accuracies for the three machine learning classifiers were all greater than VOLUME 9, 2021 70%. The results demonstrate that the RF and SVM classifiers provide the greatest predictions, both higher than 76.8%. The precision values and F 1 scores of the RF and SVM also outperform the GBDT classifier. Compared with previous studies primarily for middle-and old-aged White and Black individuals, this study for young Asian adults shows similar performance of the machine learning classifiers for ECG and biological features to predict LVDD. Sengupta et al. also used an RF-based classifier for signal processed surface ECG features to predict LVDD, and the AUCs of the ROC were 88% for people younger than 60 years, 85% for male adults and 71% for Asian populations [27]. However, the sample size of that study was small, estimated to be merely 188 subjects. In their subsequent study [29], Kagiyama et al. used the RF-based classifier for traditional ECG, signal processed surface ECG and 10 basic clinical features to predict LVDD in a total of 1,202 subjects. The AUCs of the ROCs were estimated to be 83% in the internal test set and 84% in the external test set. In addition, Tison et al. utilized a convolutional neural network-based segmentation to derive 725-component ECG vectors and applied the gradient boosting machine for preprocessed ECG features to predict LVDD from the ECG database with 3,629 subjects [28]. The AUC of the ROC was estimated at 84%.
The distribution of feature importance for the 26 ECG features for the RF is revealed in Fig. 3. The feature (input variable) importance describes which features are relevant. The sum of feature importance for all input variables is 100%. The R wave of the aVL lead is shown with the first priority in the importance analysis of the RF, followed by the R wave of the aVF lead, the P-axis of lead II and heart rate. The importance regarding the ECG features ranges from 2.0% to 10.5%. Tison et al. revealed that the PR interval, QT interval and P wave duration were the three most vital ECG features in their gradient boosting model [28]. Notably in the 10 most important ECG features in the RF model in this study, more than half of the ECG features are obtained from the limb leads, and the QT interval is not as important as the previous study finding [13] in the RF model. The baseline characteristics of the study military males with LVDD were found to have greater R wave amplitudes in leads I and aVL, representing a larger cardiac left electrical vector, and smaller R wave amplitudes in leads II, III and aVF, implying a less cardiac inferior electrical vector [52]. Whether the clinical meaning of these ECG features in the RF model indicates the presence of LVDD needs further investigation. In addition, some novel machine learning techniques have been developed for signal analysis in the past 2 years [53]- [55] and whether the prediction of LVDD in young adults can be improved by utilizing the latest machine learning methods for ECG signals also requires further study in the future.

V. CONCLUSION
This study suggests that using machine learning for ECG features only or with other biological features to predict LVDD in young Asian adults is reliable. The results are in line with previous study findings for middle-and old-aged individuals of different races/ethnicities. Machine learning techniques for simple ECG parameters only or with other biological features can be feasible in ECG equipment in the future. It is possible that the use of machine learning in a periodic health check for young people takes an early preventive action on heart failure.