Early Detection of Parkinson’s Disease Using Deep Learning and Machine Learning

Accurately detecting Parkinson’s disease (PD) at an early stage is certainly indispensable for slowing down its progress and providing patients the possibility of accessing to disease-modifying therapy. Towards this end, the premotor stage in PD should be carefully monitored. An innovative deep-learning technique is introduced to early uncover whether an individual is affected with PD or not based on premotor features. Specifically, to uncover PD at an early stage, several indicators have been considered in this study, including Rapid Eye Movement and olfactory loss, Cerebrospinal fluid data, and dopaminergic imaging markers. A comparison between the proposed deep learning model and twelve machine learning and ensemble learning methods based on relatively small data including 183 healthy individuals and 401 early PD patients shows the superior detection performance of the designed model, which achieves the highest accuracy, 96.45% on average. Besides detecting the PD, we also provide the feature importance on the PD detection process based on the Boosting method.


I. INTRODUCTION
Parkinson's disease (PD) is becoming an important degenerative disease of the central nervous system, affecting the quality of lives of millions of seniors worldwide [1]. Symptoms of PD can progress differently from one person to another because of the variety of the disease. Patients with Parkinson may show symptoms including tremors mainly at rest. Different types of tremors are possible: tremors in hands, limb rigidity, and gait and balance problems. Generally, two types of symptoms of PD can be distinguished: movement-related (i.e., motor) and unrelated to movement (non-motor). In fact, patients showing non-motor symptoms are more affected than whose main symptoms are motor. Non-motor symptoms may include depression, sleep behavior disorders, loss of sense of smell, and cognitive impairment. It has been reported by the Centers for Disease Control and Prevention (CDC) that PD complications are ranked as the 14th leading causes of death in the United States. To date, the cause of PD rests principally unknown. Particularly, the economic burden The associate editor coordinating the review of this manuscript and approving it for publication was Fatih Emre Boran . due to direct and indirect cost of PD covering treatment, social security payments, and lost income is estimated to be approximately $52 billion per year in the United States alone. Actually, the number of people affected by PD has exceeded 10 million worldwide. It should be noted that the timely detection of the PD facilitates rapid treatment and alleviate symptoms significantly as reported in [2]. Therefore, detection of PD at an earlier stage is certainly a key element to slowing down its progression and could give patients the possibility of accessing to disease-modifying therapy, when available.
Till now, there is no way to diagnose Parkinson's disease (PD) [2]. However, there are various symptoms and diagnostic tests used in combination. Several biomarkers have been investigated by scientists to early identify PD to slow down the disease process. Currently, all therapies used for PD improve symptoms without slowing or halting the disease progression. Various methods have proposed to help detection PD based on different kinds of measurements including speech data [3]- [6], gait patterns [7], force tracking data [8], smell identification data [9] and spontaneous cardiovascular oscillations [10]. In [11], an approach using the sawtooth VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ inspired pitch estimator (SWIPE) scheme is used to assess speech disorders recorded via smartphone caused by Parkinson's disease. Acceptable results have been achieved by the SWIPE scheme in discriminating PD from healthy patients. However, at a lower signal to noise ratio level, an improved algorithm is needed to obtain efficient robustness to noise. In [12], an early detection algorithm of PD based on reduced vocal features is designed. It has been illustrated that the use of Wrappers subset selection is suitable because of the low dimensionality of the selected feature and improved PD detection capability. In [13], a PD detection system is introduced using a 1D convolutional neural network based on the gait signals. However, the performance of PD detection based on both speech and gait analyses is generally limited by the sensitivity to background noise in speech recording, causing a high number of false alarms and missed detection. Also, gait tracking and inspection need specialized devices and sufficient space for walking [14]. The authors of [15] suggest a method based on wavelet to analyze data collected from smartwatches worn by nineteen patients affected by PD. This method showed good ability in detecting symptoms of tremor, bradykinesia, and dyskinesia. In [16], an approach to detect motor impairment in PD based on mobile touchscreen typing is introduced. Essentially, the proposed algorithm uncovers signs of PD motor by analyzing touchscreen typing features that include descriptive statistics (covariance, skewness, and kurtosis) and time information. In [17], multi-source of data including imaging, genetics, clinical and demographic data are incorporated in developing models for PD prediction. Other approaches employed handwriting measurement for Parkinson's diagnosis [14], [18], [19]. In [14], the PD diagnosis approach has been proposed based on handwriting measurements gathered from patients with PD. It has been shown that improved PD diagnosis is obtained when taking into consideration the age and sex information in the decision process [13], [14]. Accurate and early detection of PD is vital due to its ability to provide crucial information to slow down the progression of PD. All over the years, various data-driven methods have been developed to improve the detection of PD. In contrast to the model-based detection techniques, where prior availability of an analytical model is required, in data-driven techniques, only the availability of historical data is needed. Recently, machine learning (ML) has emerged as a promising field of research in PD diagnosis, both in academia and industry [20]. Owing to its data-driven approaches, ML has brought a paradigm shift in the way relevant information in PD biomarkers are extracted and analyzed. Furthermore, machine learning techniques provide pertinent information that offers guidance related to PD classification and diagnosis to speed up decision making. Various machine learning techniques have been applied in the literature to address the PD detection problem. For instance, in [21], dysphonia measurements have been used to detect patients with PD from healthy people. The support vector machine (SVM) is applied to only four dysphonic features for PD classification due to its ability to extract nonlinearity by using nonlinear kernels. In [6], three common machine learning algorithms, namely Random Forest (RF) or Support Vector Machine (SVM) and neural network, have been applied to detect Parkinson's disease based on acoustic analysis of speech. It has been shown the promising results of RF an SVM in early PD detection. In [22], the performance of four classifiers, Decision Trees, Regression, DMneural, and Neural Networks (NN), has been compared in detecting PD, and the best accuracy of 92.9% is obtained using NN algorithm. Recently, deep learning-based techniques have gained special attention in PD diagnosis due to their capacity to handle big data and achieving high accuracy with free-assumption on data distribution [3], [23]. Authors in [23] applied a Long short term memory algorithm to detect the Freezing of Gait (FOG), which a good indicator of PD patients that may cause falling. It has been shown that LSTM outperforms the SVM in detecting FOG.
To guarantee early detection of PD, the premotor or prodromal stage in PD should be carefully monitored [24], [25]. This premotor stage is generally characterized by different symptoms than the usual motor symptoms, including Rapid Eye Movement (REM) sleep Behaviour Disorder (RBD) and olfactory loss [24]. The purpose of the present paper is threefold. Firstly, to uncover PD at an early stage, several indicators have been considered in this study, including RBD and olfactory loss, Cerebrospinal fluid (CSF) data, and SPECT imaging markers. Secondly, Still within the data-driven techniques, this paper is aimed at presenting a comparative study between the most advanced data-driven prediction methods in detecting PD. Here, three kinds of data-driven methods are compared: shallow machine learning-based, ensemble learning-based, and deep learning-based. In this work, a deep learning model is designed to discriminate normal individuals and patients affected by PD. Essentially, the aim of this study is to provide a comparative study and throw light on the performance of these advanced prediction methods when applied to small PD data sets. Indeed, the used PD data from the Parkinson's Progression Markers Initiative (PPMI) is relatively small and includes features from 183 healthy individuals and 401 early PD patients, which may make the application the machine learning methods attractive to investigate under this small dataset problem. Results showed that the designed deep learning offers superior detection performance compared to the twelve considered machine learning models in discriminating normal people with patients who have Parkinson's disease. Lastly, feature importance and selection frequency computed based on the Boosting method highlight the largest impact of the imaging markers of SBR's for left and right putamen on the PD detection process.
The next section presents the involved PD dataset and provides a brief description of the proposed deep learning model and the considered machine learning models. Section III discusses the PD detection results and comparisons. Lastly, conclusions are drawn in section IV.

II. DATA AND METHODS
This study proposes a deep learning framework for the early detection of PD. The general framework of the proposed detection approach is illustrated in Figure 1. PD detection is done into two main stages: training and testing. In the first stage, the raw data is preprocessed and standardized and then it is used to construct the deep learning model. The values of parameters of deep learning models are selected such that the loss function is minimized during the training. After that, in the testing stage, the previously constructed model with the selected parameters are used for PD detection.

A. PPMI DATA
• RBDSQ score. The REM sleep Behavior Disorder Screening Questionnaire (RBDSQ) is a specific questionnaire for rapid eye movement behavior disorder (RBD) [28]. The higher score indicates the more likely to have RBD.
• UPSIT score. The University of Pennsylvania Smell Identification Test (UPSIT) evaluates olfactory function [29]. It consists of comprehensive 40 questions. The maximum test score is 40 when all the odors are identified correctly.
• CAUDATE_L. The striatal binding ratio (SBR) of the left caudate from Single Photon Emission Computed Tomography (SPECT) imaging.
• CAUDATE_R. The SBR of the right caudate from SPECT imaging.
• PUTAMEN_L. The SBR of the left putamen from SPECT imaging. Figure 2 shows the histogram of the features. Most of the covariates, except UPSIT scores, are severely right-skewed. Some of the covariates, e.g., α-syn and T-tau, have large outliers. The feature RBDSQ score takes discrete values.

B. EXPLORATORY ANALYSIS
To reduce the degree of skewness and reduce the influence of outlier observations on the models to be built, we logtransform all the features except RBDSQ scores in the following analysis. Also note that the scale of the features varies from thousands, e.g., Aβ1-42, to less than 1, e.g., T-tau/Aβ1-42. We center and scale the features after the log-transformation unifies the scales of the features. From Figure 2, we can readily identify five features that have significantly distinct distributions between healthy people and patients with Parkinson's disease. These five features are CAUDATE_L, CAUDATE_R, PUTAMEN_L, PUT-MEN_R, and UPSIT score. The patients with Parkinson's disease have lower scores in all of the five features. These five features are critical in discriminating healthy people with patients with Parkinson's disease. Note that the marginal relationship between features and disease indicators shown VOLUME 8, 2020 in Figure 2 does not exclude the possibility that other features might also be important in identifying Parkinson patients, because the features may have a nonlinear relationship with the disease indicator, and the features may interact with each other and be distinguishable in high-dimensions.
The Pearson correlation coefficient among all the features is shown in Figure 3. The features CAUDATE_L, CAUDATE_R, PUTAMEN_L, and PUTMEN_R are highly correlated with each other.

C. DESCRIPTION OF THE METHODS
In this article, we use machining learning methods to study the link between Parkinson's disease and build a model for early diagnosis of Parkinson's disease. Denote the training data by {(X 1 , y 1 ), . . . , (X n , y n )}, the following classifiers are used in our study.
• Deep Learning (DEEP). In recent years, deep learning algorithms have achieved striking performances in fields such as computer vision [30], natural language processing [31], and speech recognition [32]. Deep learning is also changing other fields such as biology [33] and engineering [34]. The deep learning system is composed of multiple layers of connected artificial neurons. The neurons are information processing modules, which essentially are simple nonlinear transformations of inputs. For the supervised feed-forward neural network (FNN) considered in this article, when raw-data are fed into the network, the deep learning algorithm can automatically extract hierarchical representations of the data which are best suited for the underlying learning task, e.g., classification in our paper. In this article, we construct an FNN which has two hidden layers. The first hidden layer has twenty neurons, and the second hidden layer has ten neurons. The structure of the network is shown in Figure 4. The deep learning algorithms also have some weaknesses. Usually, training deep learning algorithms needs large amount of data to achieve desired accuracy. Deep learning is routinely used as a black box algorithm, it is hard to interpret the trained neural networks. Also, theoretically, it is still hard to understand why and how deep learning achieves good performances.
• Classification trees (TREE). Classification tree [35] recursively splits the feature space into sub-sets such that the Gini impurity is minimized at each step. The Gini impurity is defined as where p 0 and p 1 are the proportion of normal people and patients, respectively. The classification tree algorithm first grows a tree to the maximum depth such that each leaf node is pure, then prunes upwards to balance the classification error and the number of terminal nodes of the tree, that is minimizing where R(T ) is the classification error of a candidate tree T , |T | is the number of terminal nodes of the tree T , and the parameter α which is a trade-off between estimation error and size is selected by cross-validation.
• Boosting. Boosting is an ensemble algorithm designed to convert the performance of weak base learners, such as shallow regression trees, to strong learners [36]. References [37] and [38] discovered the connection between the Boosting algorithm and estimation in functional space, thus opened the way for applications other than classification. In this article, we implemented three Boosting algorithms with different base learners. All the Boosting algorithms use the cross-entropy as the loss function. The first Boosting algorithm adopts classification trees [35] as its base learner, and we abbreviate it as BOOST_TREE. Moreover, we also employ linear models of one variable, i.e., β j X j where X j is the jth element of the features X and β j is a coefficient, and B-splines of one variable, i.e., b(X j ) where b(.) is a cubic spline function, as the base learners, and we abbreviate the corresponding Boosting algorithms as BOOST_GLM and BOOST_GAM, respectively. Besides classification, BOOST_GLM and BOOST_GAM have the additional feature of variable selection, i.e., they can automatically select variables during the training process [39].
• Random forest (RF). Random forest [40] is closely related to the boosting algorithm because both algorithms aggregate the results of a cluster of base learners. RF trains a cluster of classification trees using bootstrap samples of the training data. A key feature of random forest is that it de-correlates the trees by randomizing the candidate splits when building the trees. That is, the actual splitting feature is selected from a random subset of the features. Let X s = {X i 1 , . . . , X i k } be a random subset of the features X = {X 1 , . . . , X p }, where k ≤ p denotes the number of candidates. RF restricts the input feature to be X s when minimizing the Gini impurity (1). The number k is a major parameter of RF, and it is selected by cross-validation in our implementation.
• Logistic regression (LOGIS). Logistic regression is a member of the generalized linear model family, which is backed up by rich statistical theory. In logistic regression, we model the conditional probability of having the disease π i = p(y i = 1|X i ) as and the coefficient β is estimated by maximum the log-likelihood function Besides ordinary logistic regression, we also considered penalized logistic regression (LOGIS_PEN), where β is estimated by where p is the number of features. The penalized logistic regression can select features from the feature sets and fit a parsimonious model [41].
• Discriminant analysis (DIS). In linear discriminant analysis, the conditional distribution p(X i |y i = 0) and p(X i |y i = 1) are assumed to be normally distributed with common variance and mean µ 0 and µ 1 , respectively. For a new input X, if then the case is classified to be a patient, otherwise the case is classified as a normal people.
• K-nearest neighbor (KNN). In KNN, a new case is classified as a patient if more than half of its k-nearest neighbors measured by the distance between the corresponding features are patients. The number of neighbors k used in the algorithm is selected by cross-validation. VOLUME 8, 2020 • Support vector machines (SVM). SVM maps the features into a high-dimensional space using the kernel trick and builds a hyperplane in the mapped space which optimally separate the patients and normal people. In SVM, a patient is coded as y = 1 whereas a normal people is coded as y = −1. Let k(X i , X j ) = exp −γ X i − X j 2 be the Gaussian radial basis kernel where γ > 0 is a parameter. SVM classifies a new case with feature X as a patient if n i=1 c i y i k(X i , X) > 0, where c i , i = 1, . . . , n are determined by the optimization problem , for all i.

D. IMPLEMENTATION DETAILS AND COMPUTATION COSTS
To evaluate the accuracy of the above algorithms on Parkinson's disease discrimination, we randomly split the data and use 70% as training data and the rest as testing data. The ratio of patients with healthy people in the training and testing data is kept the same as the original data using stratified sampling. We train the machine learning methods on the training data, and predict whether the cases in the test data are Parkinson patients or not using the trained model. The splitting is repeated 100 times. We report the performance measures on the testing data using accuracy, specificity, sensitivity and area under the ROC curve (AUC), precision and F1 in Section III. We train deep learning models in mini-batches of 16 samples using the stochastic gradient descent (SGD) algorithm with the cross-entropy function as the loss. The algorithm is trained with dropouts [42] to prevent overfitting. We also use batch normalization [43] to accelerate training. The history of the loss function and the accuracy of the network on the testing data in one of the cross-validation data for 100 epochs is shown in Figure 5. For each epoch, the computational cost of a naive implementation of SGD is about O(nm), where n is the number of samples, and m is the number of weights and biases in the network. That is, the computational cost of training a network increases as the number of samples, the depth of the network and the number of neurons in each layer increase. With modern parallel computing devices, the training of deep learning methods can be largely accelerated.
We train three deep learning models with different structures to show robustness of the results with respect to the hyperparameters. The three trained models are all feed-forwad neural networks with two hidden layers. The first network, abbreviated as DEEP1, has 40 and 20 neurons in the first and second hidden layers, respectively. The second To make the comparisons as fair as possible, we tune all other methods to achieve their best performance. For example, the number of candidate splits in RF are selected by out-of-bag error rates. This method is recommended by Breiman [40]. The number of trees in the boosting methods, the number of neighbors in KNN and the penalty parameter in penalized logistic regression are all selected by 5-fold crossvalidation. For the TREE method, a large tree is first grown to its maximum depth, and the pruned by 5-fold cross-validation [35]. For the SVM method, we use a Gaussian kernel, and the parameters are selected by 5-fold cross-validation.

A. METRICS
To evaluate the performance of machine methods for discriminating Parkinson patients, we employ the following criterion: . where TP is the number of true positives, FP is the number of false positives, TN is the number of true negatives and FN is the number of false negatives. Besides the five metrics defined above, we also use the area under the receiver operating characteristic curve (AUC). Accuracy evaluates the proportion of correct predictions. A higher accuracy value means a better overall prediction performance.
Sensitivity refers to the ability to correctly detect Parkinson patients. Note that the recall is identical to sensitivity in binary classification. Specificity shows the proportion of actual negatives that are correctly predicted. Specificity refers to the ability to correctly detect normal people. Precision refers to the relevance of the predicted positives. F1 score is the harmonic mean of the precision and sensitivity.

B. RESULTS OF THE DEEP LEARNING METHODS
We first show the performance of deep learning methods for discriminating Parkinson patients. We trained all the deep learning models for 25 and 50 epochs on the training data and summarize the prediction results on the testing data. The average evaluation metrics are summarized in Table 1. The distribution of the evaluation metrics over the 100 splits are shown in Figure 6.
We observe from Table 1 and Figure 6 that the deep learning models are robust to the structure of the networks, e.g., number of neurons in the hidden layers, and the number of epochs when training. For example, the accuracy of DEEP1, DEEP2 and DEEP3 are 96.55%, 96.15% and 96.33%, respectively, when trained for 25 epochs, with minor difference to the accuracy 96.43%, 96.44% and 96.53%, respectively, when the number of epochs increases to 50. Larger networks (DEEP1) do not show higher accuracy in discriminating Parkinson patients, and they do not show signs of overfitting either. The second observation is that the ensemble network (DEEP_EN), which combines the results of DEEP1, DEEP2 and DEEP3, effectively boosts the performance of individual networks. The ensemble network, whether trained for 25 epochs or 50 epochs, achieves better performance in every measure compared to any single network.

C. COMPARISONS WITH OTHER MACHINE LEARNING METHODS
In this section, we compare the deep learning methods with other machine learning methods in discriminating Parkinson patients. Since deep learning models are not sensitive to the number of epochs explained in the last section, we only compare the results for deep learning models trained for 50 epochs. The performance measures of the competing methods are summarized in Table 2. The distribution of accuracy, sensitivity, specificity, AUC, precision and F1 are depicted in Figures 7. Overall, all the deep learning models have a highest accuracy than other machine learning methods in discriminating healthy people with patients who have Parkinson's disease. Especially, ensemble network (DEEP_EN) achieves the highest accuracy over all methods, 96.68% on average over 100 splittings of the data. It also achieves a better balance between sensitivity and specificity, and has the highest F1 scores. The boosting methods, including BOOST_GAM, BOOST_GLM, and BOOST_TREE, follow Deep learning closely, and all have accuracy higher than 96.2%. The linear discriminate analysis method performs the best in terms of sensitivity; that is, it has the best chance to distinguish a real patient. However, it has a low sensitivity, 91.09%, and is prone to misclassify a normal people. Tree-based methods, such as BOOST_TREE and random forest, also have sensitivity above 97.3%. Deep learning has 97.17% sensitivity. BOOST_GLM has a specificity of 95.31% on average, which is the highest among all the competing methods. The specificity measures how accurate a method identifies a true healthy people. Deep learning achieves the second largest specificity 94.84%, which is a little bit less than that of BOOST_GLM. BOOST_GLM and Deep learning achieve a good balance in sensitivity and specificity, in the sense that these methods achieve the smallest gap in sensitivity and specificity. Lastly, most of the methods, except TREE, have AUC greater than 98%.
The feature importance and selection frequency calculated using the Boosting method with smooth base learners (BOOST_GAM) is reported in Figure 8. The imaging markers of SBR's for left and right putamen (PUTMEN_L and PUTMEN_R) have the highest importance among all the features, followed by UPSIT score. The importance of other features is insignificant compared to that of the top three features. The variable importance indicates that dopaminergic imaging has a high value in discriminating Parkinson's disease. The selection frequency in Figure 8 shows the frequency of features being selected as the base learners in the training process. All important features, e.g., PUTMEN_L and PUTMEN_R, are selected frequently in the training process. Figure 9 depicts the effect of the features estimated by BOOST_GAM. The important features, e.g., PUTMEN_L, PUTMEN_R, and UPSIT scores, have a monotonic effect in discriminating Parkinson's disease. That is, people with small values of PUTMEN_L, PUTMEN_R, and UPSIT score have a higher tendency in developing Parkinson's disease. This observation is consistent with Figure 2. Several features have highly nonlinear effects, e.g., A.beta, P.tau.t.tau, and CAUDATE_L. The effects of these features illustrate the complexity of discriminating patients with the Parkinson's disease. Three features are not selected by BOOST_GAM, which are p.tau, P.tau.A.beta and CAUDATE_R. These features do not contribute in the BOOST_GAM model.

D. PERFORMANCE WITH VARIABLE SELECTION
The features A.beta and P.tau.t.tau are unimportant in the variable importance analysis, and they are marginally uncorrelated with the response. In this section, we show the performance of machine learning methods when these two variables are removed from the analysis. The performance measures of the competing methods are summarized in Table 3. The distribution of accuracy, sensitivity, specificity, AUC, precision and F1 are depicted in Figures 10. The results are quite similar to the results with all features involved in the analysis (Table 2). Especially, ensemble network (DEEP_EN) achieves the highest accuracy over all methods, 96.60% on average over 100 splittings of the data. Although, this accuracy is a little bit less than that when all features are involved in the analysis.

E. COMPUTING TIME
All the methods are run on a workstation with a Intel Xeon CPU E5-2680 V4 and 128 gigabytes of memory. For the deep learning methods, we also use a NVIDIA Quadro K2200 graphical card with 4 gigabytes of memory for parallel computing. The discriminate analysis (DIS), k-nearest neighbor (KNN), support vector machine (SVM) and tree methods are implemented in Matlab 2020, all other methods are implemented in R. Especially, the deep learning methods are implemented using the keras package with the tensorflow as the computational backend. Table 4 shows the summary of computing time for all the competing methods, where the interquartile range (IQR) is defined as the difference between 75th and 25th percentiles. All the deep learning methods are trained for 50 epochs. First, we can see that the logistic regression (LOGIS), penalized logistic regression (LOGIS_PEN), random forest (RF), discriminate analysis (DIS), KNN, SVM and TREE methods are very fast. On average, they use less than 1 second to process the data. The deep learning methods and boosting methods take significantly more time to train. The deep learning methods take 11 seconds to train on average. Interestingly, the training time for different networks are approximately the same, irrespective of relatively larges networks (DEEP1) or  small networks(DEEP3). This is mainly because the parallel computing implemented in the tensorflow software.

IV. CONCLUSION
The early detection of PD is essential to a better understanding of the disease causes, initiate therapeutic interventions, and enable developing appropriate treatments. This study proposed a deep learning model to automatically discriminate normal individuals and patients affected by PD based on premotor features (i.e., Rapid Eye Movement (REM) sleep Behaviour Disorder (RBD) and olfactory loss). The proposed deep learning model showed good detection capacity by reaching an accuracy of 96.45%. This is mainly due to the desirable characteristics of the deep learning model in learning linear and nonlinear features from PD data without the need for hand-crafted features extraction. Results showed that the designed deep learning offers superior detection performance compared to the twelve considered machine learning models in discriminating normal people with patients who have Parkinson's disease. The boosting methods also provide comparable performances. Even though deep learning offers superior performance compared to the machine learning models, it is hard to say that the deep learning dominates the others. This is because we designed deep learning using small PD data that are collected from 584 individuals (183 healthy and 401 early PD). However, it is expected that deep learning will demonstrate its capacity when the data is getting bigger and more complicated as day goes by. Accordingly, this outcome of this work can be viewed as a promising first step towards the application of cutting-edge research for early disease detection. YING SUN received the Ph.D. degree in statistics from Texas A&M University, in 2011. She held a two-year postdoctoral research position at the Statistical and Applied Mathematical Sciences Institute, University of Chicago. She was an Assistant Professor with The Ohio State University for a year before joining the King Abdullah University of Science and Technology (KAUST), in 2014. At KAUST, she established and leads the Environmental Statistics Research Group, which works on developing statistical models and methods for complex data to address important environmental problems. She has made original contributions to environmental statistics, in particular in the areas of spatiotemporal statistics, functional data analysis, visualization, and computational statistics, with an exceptionally broad array of applications. She received two prestigious awards: the Early Investigator Award in Environmental Statistics presented by the American Statistical Association and the Abdel El-Shaarawi Young Research Award from the International Environmetrics Society.