A Precision Health Service for Chronic Diseases: Development and Cohort Study Using Wearable Device, Machine Learning, and Deep Learning

This paper presents an integrated and scalable precision health service for health promotion and chronic disease prevention. Continuous real-time monitoring of lifestyle and environmental factors is implemented by integrating wearable devices, open environmental data, indoor air quality sensing devices, a location-based smartphone app, and an AI-assisted telecare platform. The AI-assisted telecare platform provided comprehensive insight into patients’ clinical, lifestyle, and environmental data, and generated reliable predictions of future acute exacerbation events. All data from 1,667 patients were collected prospectively during a 24-month follow-up period, resulting in the detection of 386 abnormal episodes. Machine learning algorithms and deep learning algorithms were used to train modular chronic disease models. The modular chronic disease prediction models that have passed external validation include obesity, panic disorder, and chronic obstructive pulmonary disease, with an average accuracy of 88.46%, a sensitivity of 75.6%, a specificity of 93.0%, and an F1 score of 79.8%. Compared with previous studies, we establish an effective way to collect lifestyle, life trajectory, and symptom records, as well as environmental factors, and improve the performance of the prediction model by adding objective comprehensive data and feature selection. Our results also demonstrate that lifestyle and environmental factors are highly correlated with patient health and have the potential to predict future abnormal events better than using only questionnaire data. Furthermore, we have constructed a cost-effective model that needs only a few features to support the prediction task, which is helpful for deploying real-world modular prediction models.


I. INTRODUCTION
With the rapid progress of precision medicine, patients have 28 the additional option of advanced and personalized medical 29 treatment in hospitals. From a treatment point of view, this 30 facilitates the selection of drugs that minimize side effects 31 and produce the best results. However, from the perspective 32 of prevention, many studies claim that numerous challenges 33 remain in the field of precision medicine [1], [2]. Since 34 on daily monitoring, health promotion, and disease preven- control situations to make the data meaningful [7].  [10]. 66 The mortality risk of COVID-19 has also been revealed to 67 be related to underlying health conditions, including obesity,  [15]. 79 Based on the above points, there is no efficient way to 80 (1) integrate lifestyle, environmental factors, and medical 81 records to provide personalized health recommendation and 82 (2) support multiple chronic disease groups simultaneously. Several chronic disease prediction models have been devel-98 oped in recent years. Goto et al. proposed an AECOPD 99 (acute exacerbations of chronic obstructive pulmonary dis-100 ease) model using demographic features, vital signs, and 101 electronic medical records in the emergency department [16]. 102 They found that the use of machine learning improves the 103 ability to predict critical care and hospitalization among 104 emergency patients with COPD exacerbation over the tra-105 ditional statistical approach with emergency severity index 106 information. Likewise, Peng et al. developed a machine learn-107 ing approach to predict the prognosis of AECOPD hospital-108 ized patients with clinical indicators. They used vital signs, 109 medical history, inflammatory indicators, and decision trees 110 to help respiratory physicians assess the severity of the patient 111 early and improve patient prognosis [17].  lected 59 panic disorder patients and compared brain activa-113 tion areas before and after specific treatment. Comorbidity 114 status has been predicted using a random undersampling 115 tree and MRI images [18]. Butler et al. proposed an early 116 childhood obesity prediction model for predicting obesity 117 in 4-to 5-year-old children, using parental and infant data 118 from the Growing Up in New Zealand (GUiNZ) cohort [19]. 119 Despite the good performance of these prediction models 120 using machine learning algorithms and medical records, they 121 are difficult to implement in real-world situations because 122 patients with chronic disease are not always in the hospi-123 tal and have real-time medical records. Lifestyle and living 124 environment also affect disease control after a patient is 125 discharged from hospital. Nevertheless, there is no predictive 126 models incorporating lifestyle, living environment and medi-127 cal questionnaires. Comprehensive data collection may have 128 the potential to achieve better predictive power and provide 129 personalized health promotions to help patients improve their 130 health outcomes. and a 3D printed robot hand [21]. The authors demon-144 strated the real-time assistance from the system help users to 145 strengthen their motion patterns after stroke. The availability 146 of continuous and real-time data will be a key factor in the 147 development of smart healthcare systems, because stakehold-148 ers can use these data to make well-informed decisions [22].  Figure 1 shows the architecture of the precision health 169 service. The service consists of the NTU Medical Genie 170 iOS/Android smartphone app, wearable devices, an air qual-171 ity sensing device, the open environmental data API, the NTU 172 Medical Genie platform, and modular prediction models. 173 After patients are discharged from hospital, all lifestyle and 174 environmental key information would be effectively collected 175 from a wearable device, an air quality sensing device, and a 176 smartphone App. Then, real-time data would be displayed on 177 the platform for medical staff to assist in decision-making. 178 Modular prediction models would be triggered on some very 179 important abnormal vital signs immediately and daily at 2am, 180 ensuring emergency safety and cost-effectiveness. In addi-181 tion, to achieve high scalability and flexibility, all dataflow 182 nodes such as the number of disease groups, vital signs 183 monitoring devices, or prediction models are designed to run 184 in parallel. So, the nodes could easily be added from platform 185 side or APP side when the load increasing. The corresponding 186 computer resources can be added for stable operation. The 187 following is the detailed description of each component.  Through these four information and communication tech-246 nology methods, comprehensive patient data were collected. 247 To establish an effective connection between patients and 248 physicians, the data platform was designed to provide key 249 information and trend charts to physicians and case man-250 agers, facilitating a rapid understanding of the patient's cur-251 rent condition on one interface and providing patients with 252 personalized health promotion suggestions. In addition to 253 data visualization, this platform provides real-time warning 254 function to assist physicians and case managers in decision 255 making. Physicians and case managers set thresholds for 256 abnormal vital sign warnings according to the patient's status. 257 When the vital signs exceeded the thresholds, the platform 258 actively triggered the health risks computation process and 259 notified medical staff to intervene if necessary. Regarding 260 the precision health management and prevention of chronic 261 diseases, the platform calculated personal health risks based 262 on modular chronic disease prediction models and the vari-263 ous collected data. Chronic disease prediction models were 264 deployed in online case groups, providing medical staff with 265 optional triggers. As mentioned, the health risk value was computed by a robust 270 prediction model and provided as decision support for physi-271 cians. The results of chronic diseases such as COPD, panic 272 disorder, and obesity are closely related to the improvement 273 of daily life behavior. Therefore, we implemented these three 274 chronic disease prediction models to demonstrate the scala-275 bility of our services. The comprehensive dataset (sections 276 A-D) are pre-processed to extract the key features, followed 277 by the training process. The data pre-processing consists of 278 the last observation carried forward (LOCF) interpolation for 279 inconsistent frequency or null point and re-sampling to deal 280 with the disparate ratio of abnormal event. The normalized 281 data would be trained with a kinds of models and passed an 282 external validation to ensure that models were reliable and 283 applicable to different case groups in the real world. The 284 detailed implementation process of these three models is as 285 follows. According to World Health Organization estimates, chronic 289 obstructive pulmonary disease (COPD) will be the 290 third-leading cause of mortality worldwide in 2030 [25]. 291 Acute exacerbations of chronic obstructive pulmonary dis-292 ease (AECOPD) are associated with substantial morbid-293 ity and mortality. Early AECOPD detection will help to 294 reduce mortality. Increasing evidence shows that lifestyle 295 VOLUME 10, 2022 modifications improve efficiency in the self-management and 296 prevention of COPD. Therefore, the aim with our AECOPD   Table 1. Decision trees, random 307 forests, linear discriminant analysis, and adaptive boosting 308 were used to implement the AECOPD prediction model. 309 We also propose a deep neural network for comparison with 310 machine learning methods. This was constructed using fully  is unexpected and consists of repeated, intense fear attacks, 331 appearing suddenly and reaching a peak within a few minutes. 332 Patients who suffer from panic disorder tend to worry about 333 the occurrence of the next attack and actively try to prevent 334 future attacks by avoiding locations, situations, or behaviors 335 related to the panic attack. Predicting panic attacks accurately 336 may help clinicians to provide timely, appropriate treatment 337 and optimize personalized medicine. Hence, the purpose of 338 this model is to predict whether panic disorder patients will 339 have a panic attack within the next seven days. Random 340 forest, decision tree, linear discriminant analysis, adaptive 341 boosting (AdaBoost), and regularized greedy forest models 342 were implemented to predict panic attacks. The models and 343 hyperparameters are shown in Table 2.

344
A deep-learning-based model was also proposed in this 345 study with four fully connected, hidden layers. The activation 346 function in the hidden layers was the rectified linear unit 347 (ReLU), which addressed the problem of disappearing gradi-348 ents. Batch normalization was applied on each layer after the 349 activation function to accelerate model training and prevent 350 model overfitting. After batch normalization, we also applied 351 dropout to reduce overfitting. We used sigmoid activations for 352 the output layer because we require only a true or false result. 353 The loss function and the optimizer used binary cross entropy 354 (BCE) and Adam, respectively. We selected BCE because 355 the output of the study was binary. BCELoss is defined as 356  cardiovascular diseases, diabetes, musculoskeletal disorders, 381 and certain cancers. Lifestyle modification and low health 382 literacy are associated with obesity [28], [29]. Hence, the pur-383 pose of the obesity model is to predict whether the patient's 384 BMI will rise within the upcoming 7 days using lifestyle data, 385 environmental data, and health literacy assessment. Machine 386 learning and a deep neural network algorithm were applied 387 to implement the prediction model. The model hyperparam-388 eters are presented in Table 3. The deep neural network was 389 constructed using two fully connected layers. Batch normal-390 ization and parametric rectified linear units were applied in 391 the process. Figure 4 shows the structure of the DNN model. 392

393
We used 3-fold cross-validation to evaluate the stability of 394 the prediction models. Accuracy, precision, sensitivity, and 395 specificity were used as assessment metrics to evaluate the 396 overall performance, including the closeness and the devia-397 tion of the prediction, and the performance on negative and 398 positive cases of the identification models separately based 399 on the validation and test sets. To tune the models for the 400 best performance on the test set, the F1 score was chosen 401 to adjust and evaluate the performance of our multi-feature 402 prediction tasks by varying the outcome thresholds using the 403 VOLUME 10, 2022   interviews to offer total care to patients. Comprehensive 437 patient information including lifestyle, living environment, 438 life trajectory, disease control, and data on vital signs were 439 collected by a location-based personal health app, open envi-440 ronmental data API, air quality sensing device, and wearable 441 devices that were provided to all participants. All derived data 442 were displayed on the AI-assisted platform and used to train 443 modular prediction models to predict whether a patient with 444 chronic disease would experience acute exacerbation of their 445 condition within the next 7 days. An AI-assisted platform for medical staff was developed 449 using the ReactJS frontend framework and the Node.js back-450 end framework. This platform displays the patient's lifestyle 451 and environmental data trends on a single user interface to 452 help doctors quickly grasp the key information. Fig. 6 shows 453 the overview of our data collection, including both personal 454 lifestyle and environment data.

455
Detailed real-time information such as heart rate and 456 SpO2 changes within a few minutes and daily sleep status 457 are viewed by switching to different pages, as shown in 458 Figures 7 and 8. Figure 9 shows that daily sleep status can 459 be divided into four stages: awake, rapid eye movement, 460 light sleep, and deep sleep. This information was collected 461 mainly via wearable devices. To simultaneously support mul-462 tiple chronic disease healthcare tasks, the platform provides 463 group management functions; modular prediction models 464 2700414 VOLUME 10, 2022

496
During the study period, we recruited 177 patients diagnosed 497 with COPD according to the Global Initiative for Chronic 498 Obstructive Lung Disease (GOLD) criteria and adult COPD 499 patients who were not implanted with a pacemaker and were 500 not pregnant. To prevent AECOPD earlier and fit diverse 501 scenarios, we implemented multiple models using various 502 combinations of data features to predict acute exacerbations 503 in the next seven days. Table 4 shows the performance of 504 the implemented models on the validation dataset. Compared 505 with the other algorithms, the random forest and deep neural 506 network algorithms yielded the best performance in most 507 indicators.

508
For 7-day AECOPD prediction, the original AECOPD 509 predictive model achieved an accuracy of 91.4%, a precision 510 of 95.5%, and an F1 score of 91.4% on the validation dataset. 511 VOLUME 10, 2022   To ensure the model applies to the real world, we trained the 512 model with different feature sets and extracted the best perfor-513 mance model for deployment on the platform. Table 5   task requires 27 features to complete the calculation, which 535 is difficult for real-world apps. To reduce the computational 536 costs and number of variables, feature selection and the 537 SHAP module were applied to further analyze the impact 538 of each feature on the prediction model. First, we identi-539 fied important features affecting the prediction of AECOPD 540 through the feature importance map and SHAP module, 541 as shown in Figures 13 and 14. Then, we implemented back-542 ward elimination to compare the performance differences 543 between models without specific features. Figure 15 shows 544 that serious declines in performance occur only when the 545 model does not contain deep sleep time, carbon monoxide 546 concentration (CO), suspended particulate matter (PM10), 547 and total score of COPD assessment test (CAT_total). Hence, 548 we performed the same testing process on the combination 549 of these features to realize the most cost effective prediction 550 model. Table 7 illustrates that the proposed model with the 551 most cost effective feature set achieved superior performance 552 due to the removal of unimportant features. The area under the 553 receiver operating characteristic curve of this model reached 554 94.7%. In addition, the summary plot also indicated that 555 higher values for features such as the total score of COPD 556 assessment test (CAT_total), air quality index (AQI), and 557 ozone (O3) increase the risk of AECOPD events. Regular 558 exercise (average_step) reduces the risk of AECOPD events. 559

560
We enrolled 70 participants with panic disorder at the 561 En Chu Kong Hospital and MacKay Memorial Hospi-562 tal. To accurately predict panic attacks, we experimented 563 with deep neural networks and machine-learning classifiers 564 including random forests, decision trees, linear discriminant 565 analysis, adaptive boosting, and regularized greedy forests. 566 Tables 8 and 9 show the model performance on the validation 567 testing dataset. The first 50 patients were included in the 568 training and validation dataset; others were regarded as the 569 testing dataset. The experimental results show the random 570 forest achieved the best performance. However, the sensitiv-571 ity is worse on the testing dataset. This may reflect data imbal-572 ance because the number of panic attacks decreased among 573 patients who were recruited later. To reflect the diversity 574    the data distribution of feature value in our dataset. Raising 592 the values of physical activity features, such as stairs climbed, 593 heart rate, and total sleep time help patients reduce the pos-594 sibility of panic attacks. Moreover, Fig. 18 shows a severe 595 drop in performance when the model does not include Beck 596 Depression Inventory (BDI_total), Beck Anxiety Inventory 597 (BAI_total), Mini International Neuropsychiatric Interview 598 (MINI_value), and total sleep time. The combination of these 599 features was imported into the same experimental config-600 uration for model training. In Table 11, the cost-effective 601 model achieved an accuracy of 83.1%, a sensitivity of 78.1%, 602 a specificity of 86.1%. and an F1 score of 77.5%. The model 603 requires only four features to yield reliable predictions, which 604 facilitates the real-world deployment of the service. We enrolled 120 obese participants. The main prediction 607 target was whether BMI would worsen within the next seven 608 days. Following the above two methods, we experimented 609 with machine learning methods to train multiple models. 610 Table 12 shows the performance of the proposed models on 611 the validation dataset. The random forest and decision tree 612 achieved better performance. When training the all-feature 613      feature selection to identify the most cost-effective model. 624 Figures 19 and 20 show the distribution of lifestyle factors, 625 living environment, and health literacy data. The results 626 demonstrate that lower values for features such as health 627 literacy, consumption in calories, average heart rate, and rapid 628 eye movement time increase the risk of becoming overweight 629 and obese. Figure 21 shows that serious declines in perfor-630 mance occur only when the model does not contain con-631 sumption in calories, health literacy total score, average heart 632 rate, and minimum heart rate. Therefore, the combination 633 of these four features may be the most influential and cost-634 effective feature set. We executed the same model training 635 and testing process on this feature set. Table 15 illustrates that 636 the proposed model achieved good performance even with a 637 large reduction in features. Moreover, the sensitivity is signif-638 icantly improved due to the removal of unimportant features. 639      included the name, birthday, phone number, attending physi-656 cian, and so on. All relevant patient registration information 657 was stored in a firebase. A background location tracking fea-658 ture was activated after the user authorized location data. The 659 collected real-time latitude and longitude data was converted 660 into parameters for calling the open environmental data API 661 to calculate the environmental exposure risk for the user's 662 location. The content of the app varied depending on the 663 type of chronic disease. For example, patients with panic 664 disorder were presented with four main functions when enter-665 ing the homepage: real-time physiological data measure-666 ment, a self-evaluation questionnaire, symptom recording, 667 and video chat. On the physiological data measurement page, 668 real-time physiological data to be collected include heart rate, 669 SpO2, heart rate variability, and acceleration. When the user 670 successfully entered the physiological data monitoring page, 671 the data trend graph started, and the data sampling rate was 672 changed to once per second for upload to the InfluxDB time 673 VOLUME 10, 2022    698 We designed and implemented a scalable precision health 699 service for patients with chronic diseases. The results demon-700 strate that this service provides continuous monitoring of 701 lifestyle and environment, instant warnings in the event of 702 abnormal vital signs, and decision support based on modular 703 predictive models. Compared with existing studies, we have 704 created an unprecedented new service and improved the per-705 formance of chronic prediction models by applying objec-706 tive lifestyle and environmental factors. At the same time, 707 we have used feature engineering to reduce the computa-708 tional costs and enhance the practicality of real-world AI 709 prediction models. The proposed prediction models require 710 a small number of features to achieve excellent performance 711 in predicting whether a patient with chronic disease will 712 experience an abnormal event within the next 7 days. Fur-713 thermore, we address the inability to quantify and extract 714 lifestyle and environmental information in past studies by 715 integrating wearable devices, open data, indoor air quality 716 sensors, smartphone applications, and a healthcare platform. 717 To the best of our knowledge, this is the first study to use 718 continuous lifestyle factors, environmental factors, clinical 719 factors, feature selection, and artificial intelligence to predict 720 abnormal events in chronic diseases and deploy to the real 721 world with external validation.

722
As of May 25, 2022, the precision health service had served 723 1,667 patients and 32 medical personnel in Taiwan and Japan, 724 derived and monitored 186,986,625 physical data, and con-725 ducted 6,869 interviews to offer total care to patients. The par-726 allel operation of system dataflows can improve scalability 727 and flexibility, and is not limited by a single process or device 728 control, which can support the increase of different care 729 needs in the future. It has the potential to become the next-730 generation e-health system to assist physicians in remote care 731 and establish an effective communication channel between 732 medical personnel and patients. Traditionally, patients with 733 chronic diseases must return to the hospital periodically for 734 numerous clinical tests to observe their health condition. 735 They may run the risk of acute exacerbation between rou-736 tine visits. However, with the proposed service, all chronic 737 disease related data are uploaded automatically, including 738 questionnaire assessments and lifestyle and environmental 739 information. The patient's health risk value is computed 740 through modular predictive models, reminding patients and 741 medical personnel in advance to improve their health out-742 come. The resultant comprehensive view of patient data could 743 help physicians and patients to formulate personalized health 744 promotion plans and achieve precision health management. 745 Our results also confirm that lifestyle and environmental data 746 are highly correlated to patient health conditions, and have 747 a strong influence on the early warning of acute exacer-748 bations. By applying the SHAP module and feature engi-749 neering, we clearly identify the impact of physical activity, 750 sleep quality, and heart rate on chronic disease control, and 751 provide precise recommendations for health improvements 752 for physicians and patients. In the future, we will strengthen 753 the precision health service to support more data collection 754 for lifestyle factors, and implement digital twin models [30] 755 to further automatically provide concrete health promotion 756 advice for patients with chronic diseases.