Machine Learning Algorithms and Quantitative Electroencephalography Predictors for Outcome Prediction in Traumatic Brain Injury: A Systematic Review

Recent developments in the field of machine learning (ML) have led to a renewed interest in the use of electroencephalography (EEG) to predict the outcome after traumatic brain injury (TBI). This systematic review aims to determine how previous studies have taken into consideration the important modeling issues for quantitative EEG (qEEG) predictors in developing prognostic models. A systematic search in the PubMed and Google Scholar databases was performed to identify all predictive models for the extended Glasgow outcome scale (GOSE) and Glasgow outcome scale (GOS) based on EEG data. Fourteen studies were identified that evaluated ML algorithms using qEEG predictors to predict outcome in patients with moderate to severe TBI. In each model, a maximum of five qEEG predictors were selected to determine the association between these parameters, and favorable or unfavorable predicted outcomes. The most common ML technique used was logistic regression, but the algorithms varied depending on the types and numbers of qEEG predictors selected in each model. The qEEG variability for the relative and absolute band powers were the most common qEEG predictors included in the models (46%) followed by total EEG power of all frequency bands (31%), EEG-reactivity (31%) and coherence (15%). Model performance was often quantified by the area under the receiving-operating characteristic curve (AUROC) rather than by accuracy rate. Various ML models have demonstrated great potential, especially using qEEG predictors, to predict outcome in patients with moderate to severe TBI.


I. INTRODUCTION
Traumatic brain injury (TBI) in definition is a medical condition caused by a unique combination of external or internal mechanical forces which directly gives an impact to the brain. TBI occurs from various mechanisms including road-traffic accidents, military blast injury, blunt object, falls and sport-related concussion [1]. TBI can be classified based on its seriousness. Glasgow Coma Scale (GCS) is a neurological scale that is used to classify the severity of TBI patients based on three categories (i.e., mild (GCS [13][14][15], moderate (GCS 9-12) and severe (GCS < 9)) [2]- [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Junhua Li .
TBI results in continuous and permanent effects not only to brain structure but also effects on morbidity, memory, emotions, attention and executive functions, personality, as well as social consequences that significantly alter the life and productivity [5], [6].
In clinical practice, numerous neuroimaging techniques are extensively used in diagnosis, prognosis and patient management of TBI-related brain dysfunction. It assists in the detection of injury for neurological evaluation, in the planning of the treatment, and in the prediction of outcomes [7], [8]. Generally, structural imaging techniques (e.g., computed tomography (CT) and magnetic resonance imaging (MRI)) play a role in critical diagnosis and management, while functional imaging techniques (e.g., functional MRI (fMRI), VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ positron emission tomography (PET), diffusion tensor imaging (DTI)), electroencephalography (EEG) and magnetoencephalography (MEG) act as promising tools for clarification of pathophysiology, symptom of genesis and mechanisms of recovery [9], [10].
In investigating the timeline of TBI-related abnormalities and future recovery, the use of CT/MRI alone has been found inadequate, relatively because the sensitivity and specificity of these techniques cannot be obtained by using only one technique. Due to this, EEG or MEG would become a better experimental technique especially for evaluating the electrophysiological changes (i.e., slowing, missing or interrupted brain circuits, alterations of surviving structures) [9], [10]. MEG and EEG are particularly non-invasive electrophysiological methods and both are able to evaluate the cerebral cortex activities with superior temporal resolution (i.e., in milliseconds). However, EEG holds high promise for prognostic evaluations due to its ability to do diagnose, monitoring, and prognosis at different stages of brain injury including in critically ill patients [11]. EEG is the electrical wave of the brain that is recorded by using noninvasive electrodes positioned on the scalp, and it is relatively cheaper, portable, and easy to handle as compared to MEG that has limitations in general routine for clinical evaluation of TBI patients [12].
Recent developments in the field of machine learning (ML) have led to a renewed interest in the use of EEG for the purpose of accurately predicting the outcome by taking into account the sensitivity of this technique [13]- [18]. ML, which is a field of computer science, falls into a subdivision of artificial intelligence (AI) that enables algorithms to learn patterns in large, complex datasets without being explicitly programmed [19], [20]. ML models are generally grouped into three categories: (i) supervised; (ii) unsupervised; and (iii) semi-supervised learning. Thus far, ML techniques have the potential to enhance the capabilities of using electrophysiological data (i.e. EEG or MEG) by allowing the diagnosis and prognosis (i.e., prediction) at the single-subject and multi-subject levels [21]. Numerous studies have attempted to reveal the efficacy of using multivariate prediction models to systematically identify the most prominent quantitative EEG (qEEG) predictors that are robust to predict TBI outcomes [18], [22]- [28]. The qEEG is a method of analyzing the brain electrophysiological data using the modern mathematical and statistical analysis (e.g., Fourier transform, wavelet analysis) on traditional EEG recordings to extract quantitative patterns that may contribute to diagnostic information and/or cognitive deficiencies.
In addition to that, the research to date has been designed to determine the ability of the current ML models with EEG data to provide reliable prediction of outcome in achieving acceptable performance measure with relation to TBI outcome. For accurate outcome prediction, multiple predictors (i.e., age, clinical variables, GCS, imaging and functional data) need to be included in a prognostic model because single factors are inaccurate to provide sufficient predictive value to discriminate patient outcomes. In the work by Jennett et al. [29] and other studies published on prediction outcomes in TBI, the Glasgow Outcome Scale (GOS) has been accepted and used over the past forty years to evaluate the TBI patient outcomes [30].
Predictive models are statistical modeling that employs the ML techniques, which combine two or more information of patient clinical data to learn certain features from the training set, finally contributing to a predictable outcome in TBI patients. For a predictive model to be clinically useful, it must fulfill two requirements: it must be clinically valid and methodologically valid [31]. This systematic review aims to determine how previous studies have taken into consideration of four important modeling issues to better understand the potential of qEEG features for developing a predictive model. The modeling issues are (1) the study population, (2) choice of qEEG predictors and outcome, (3) model development, and (4) model validation.

II. METHODOLOGY
This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines [32], [33]. A comprehensive search strategy was developed using the PubMed and Google Scholar databases without applying any specific date range to identify all relevant studies up to April 30, 2020.

A. SEARCH STRATEGY
The search was performed from January to April 2020, by creating search syntax with the following keywords in the PubMed (http://www.ncbi.nlm.nih.gov/pubmed) database: ''machine learning'', ''artificial intelligence'', ''predict'', ''traumatic brain injury'' and ''electroencephalography''. Related articles in English language published up to April 30, 2020 were evaluated. In the PubMed, we inserted the search terms by combining the Medical Subject Headings (MeSH) to find other relevant keywords for machine learning.
Then, we inserted the truncation for the word ''predict*'' to search variations or synonyms on the word stem. Finally, we entered the following search terms into the PubMed using the Boolean operator OR and AND: (((((''machine learning'') OR ''artificial intelligence'') OR predict*) AND ''traumatic brain injury'') AND ''electroencephalography''). The search results were 69 articles. Another search syntax combination of ''electroencephalography'', ''machine learning'' OR ''artificial intelligence'' OR ''predict'' ''traumatic brain injury'' was inserted into the Google Scholar database, generated 499 results after excluding the review papers (see Figure 1).

B. STUDY SELECTION
Articles were included in the systematic review if the following conditions were met: (a) the article was written in English language; (b) original article in a peer-reviewed journal (c) the article reported on a method of ML model and EEG data for application in TBI outcome prediction; (d) the article evaluated the performance of the ML model technique used; (e) the article described the qEEG predictor variables including technique applied for features extraction of data; (f) study population of the article was limited to TBI group (i.e., mild, moderate, severe).
Articles were excluded if the following criteria were met: (a) the article did not report the original contribution to ML and EEG for outcome prediction in TBI (e.g., the paper reported on the application of ML and EEG for seizures outcome prediction); (b) the full-text article was not available; (c) the conference abstracts, review paper, poster presentation, letters to the editor; (d) the article reported in languages other than English; (e) the article involved non-human study; (f) the articles reported ML for TBI outcome prediction on other neuroimaging studies (i.e. MRI, fMRI, DTI, PET, MEG, ERP). The articles were excluded as the application of ML for TBI outcome prediction using these modalities had been reviewed elsewhere [15], [19], [34]- [38]. After screening the titles and abstracts and removing duplicate articles (n = 384), 46 articles were selected for full-text review. No specific data range was applied in the search. The selection of full-text articles for inclusion was conducted by two authors (i.e., N. S. E. Mohd Noor and H. Ibrahim) independently for further analysis.

C. DATA EXTRACTION AND ANALYSIS APPROACH
For this systematic review, data extraction focused on four main aspects: (i) sample size; (ii) qEEG predictors variables; (iii) predictive model development; and (iv) predictive model validation. We performed qualitative and quantitative analysis of the studies included. Data extracted from each article were: (a) year of publication; (b) TBI groups (e.g., mild, moderate and severe); (c) number of TBI patients; (d) types of ML model used (e.g., supervised, unsupervised techniques); (e) qEEG input features; (f) size of training data; (g) size of testing data; (h) validation method (e.g., use of cross-validation method if applicable); (i) performance of ML model in predicting recovery of TBI outcomes (e.g., classification accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC)); (j) EEG features extraction method; (k) EEG data types (e.g., resting-state or continuous EEG, task-related EEG); (l) outcomes measures (e.g., Glasgow Outcome Scale (GOS), survival, disability, mortality, etc.). There was no restriction on the age of the target group (i.e., children, adults or elderly) and no time restriction for the evaluation of the outcomes.

D. DATA EXTRACTION AND ANALYSIS APPROACHES FOR QUANTITATIVE EEG PREDICTORS AND OUTCOME
Data extraction on qEEG predictors and outcomes concentrated on the number and type of qEEG predictors in each prognostic model construction. Data extracted from each article were: (i) number and type of selected qEEG predictors; (ii) types of EEG recording; (iii) method for extracting each qEEG predictors; and (iv) time of assessment of outcome. The outcome measure used in all studies was the GOS score as tabulated in Table 1.

A. SEARCH RESULTS
After the elimination of duplicates, a total of 568 publications in the PubMed and Google Scholar databases were identified. Following exclusion, 46 full-text articles were assessed for eligibility by title and abstract screening. Twelve studies described the modeling technique used for prediction of outcome in TBI but did not apply ML techniques. Ten studies applied ML algorithms to compare TBI patients from healthy subjects but did not report or evaluate the outcome of recovery. Ten studies evaluated the modeling technique using ML approaches for TBI outcome prediction but EEG data was not included as a predictor of outcome. A total of 14 studies has fully fulfilled the inclusion criteria for data extraction and qualitative synthesis (please refer to Figure 1).

B. STUDY POPULATION
From the 14 studies, 85% of the predictive models were derived from a population that combined the older adults and children populations in the age range 5.5 to 85 years old. 14% were derived from an adult population (i.e., 44 -67 years old). In relation to the severity of the TBI studied, ten out of fourteen (i.e., 71%) predictive models included severe TBI patients while four out of fourteen (i.e., 28%) models included moderate to severe TBI groups of patients. This systematic review has included the studies that presented a predictive model for patients with moderate to severe TBI because so far, no studies have reported on mild or moderate TBI patients as a targeted group for modeling the prognostic model. The predictive model or prognostic model term was used interchangeably. Eleven of the publications developed models on relatively small sample sizes (i.e., less than 100) and the largest study included 680 of severe TBI patients [40].

C. QUANTITATIVE EEG PREDICTORS AND OUTCOME
There were multiple qEEG features selected and tested for inclusion in the predictive models (see Table 2).

1) qEEG PREDICTORS
The description of each of the qEEG features derived from the fourteen studies was summarized as follows: 1) Absolute power per band: The absolute power of each band was calculated by integrating all the power values within each frequency band: delta δ (i.e., 0.5 -4 Hz); theta θ (i.e., 4 -8 Hz); alpha α (i.e., 8 -13 Hz); and beta β (i.e., 13 -30 Hz). 2) EEG total power: The EEG total power is a summation of all power bands from 0.5Hz to 30 Hz. 3) Relative power per band: The relative power for each band was derived by expressing the absolute power in each frequency band as a percentage of the absolute power summed across four frequency bands. For example, the relative power over the two frequency bands from 0.5 to 30 Hz is given by Equation (1): (1) where P refers to power, f 1 and f 2 represent the low and high frequencies, respectively [53]. 4) Variability per frequency band: The variability in the power of each frequency band was estimated from the ratio of the median absolute deviation (MAD) to the median power [41], [45] in each frequency band and electrode position from each subject. 5) SEF 90% : The spectral edge frequency is the frequency below 90% of the total EEG power is located. 6) Frequency band-specific amplitude: The amplitude at each frequency band was expressed as percentage of the total EEG spectrum across EEG electrodes. 7) Connectivity analysis: For assessing the connectivity in the brain network, the magnitude squared coherence and the imaginary part of the coherence were computed for each patient from each EEG electrodes pair in the four frequency bands (i.e., δ, θ, α, β,). The brain connectivity between two groups of outcome (e.g., improved vs. unimproved) was determined through coherence, the imaginary part of coherence, weighted symbolic mutual information and symbolic transfer entropy [47], [54]- [56]. 8) Coherence: The coherence was determined by calculating the mean of all magnitude-squared coherence of the signals in between all combinations of EEG channels [45], [47]. Coherence measures would indicate the functional relationship across brain region. The magnitude squared coherence of signals w(t) and r(t) for frequency f is defined as: where M ww (f ) and M rr (f ) are the power spectral densities of w(t) and r(t), respectively and M wr (f ) is their cross-spectral density [45], [47]. The imaginary part of coherence was considered to overcome the volume conduction effects (i.e., transmission of electrical signals from a primary current through brain tissue) on the real part of coherence. To correctly calculate connectivity across EEG scalp, the influence of volume conduction and differences between groups have to be considered.

9) Entropy:
Entropy is an elementary measure of information theory [58] that is used to discriminate among complex system. The entropy was applied to EEG signals to measure the randomness of outputs of the complex system of neuronal networks underlying coma and consciousness. 10) Approximate entropy (ApEn): ApEn was introduced by Pincus as a measure of system complexity. Theoretically, ApEn measures the logarithmic likelihood of that at the next incremental comparisons, the patterns of data that are close to each other, remain close. Pincus [59], [60] described ApEn based on the following algorithm: • STEP 2: Fix m as integer r is a positive real number, where the value of m represents the vector length and r indicates a filtering level.
• STEP 4: Using the series of vectors x(1), where N is the m scalar component of x, d is the distance between the vectors of x(i) and x(j) at the maximum difference in their respective scalar component.
• STEP 6: Define approximate entropy (ApEn) as [45]: where m and r are fixed as in STEP 2. 11) Permutation entropy (PerEn): As contrary to ApEn, permutation entropy (PerEn) used the symbolic transform, which the signal was represented by a sequence of discrete symbols. The probability density was evaluated to obtain the entropy. where it could be constructed such that, . Each x(i) was represented by a symbol (i.e., 1 and m!), dependent on the order of amplitudes of the signal. Permutation entropy can be expressed as: where p i is the probability of the i-th symbol [47]. 12) Microstate analysis: Microstate analysis is especially useful for classifying transitive brain state (e.g., Unresponsive Wakefulness Syndrome (UWS)/ Minimally Conscious State (MCS)) to probe the brain in different sleep stages and to understand the primary differences in brain functions of patient with different severities of coma. Microstate analysis was performed by analyzing the topographical maps of electrical potentials across the EEG electrodes as well as the temporal evolution of the selected topographical maps. 13) Complex network analysis: Complex network analysis was a part of connectivity, which employed the connectivity measures to represent the complex systems as network and extract the key information from topographical maps of these network. Complex network analysis represented the relationship of the structuralfunctional connectivity by defining functional connections from spatial-map of the brain. In EEG analysis, the networks have been created by representing the electrode position as nodes and forming the links between the nodes as functional connections. The graph-theoretical statistic (e.g., clustering coefficient, characteristic path length, modularity, participation coefficient and network level modular span) have been used to assess and compare the topology of these networks. 14) EEG-Reactivity (EEG-R): EEG-R is defined as diffuse and temporary changes in EEG activity in response to external stimuli (e.g., from sensory, visual or auditory stimuli) and useful for prognostication in patients with impaired consciousness [11]. The EEG-R was assessed according to different categories: (1) present (amplitude and/or frequency changes); and (2) [61]. Loss of specific continuous EEG (cEEG) features over time especially for bedridden TBI patients would be associated with outcome. From our observation, qEEG variability for the relative and absolute band powers was the most common qEEG predictor included in the models (i.e., 46%) followed by total EEG power of all frequency bands between 0.5-20Hz (i.e., 31%), EEG-reactivity (EEG-R) (i.e., 31%) and also coherence for all possible combinations of EEG channels (i.e., 15%). In 8% of the models, other qEEG predictors were included (e.g., microstate,ApEn, transfer entropy (TE), connectivity, SEF 90% , PerEn or complex network analysis).
One of the studies included the clinical variables (i.e., level of consciousness (LOC), LOC plus post-traumatic amnesia (PTA)) to compare the predictive performance of the models by adding qEEG predictors to increase the predictive power of the model [40]. EEG normal sleep characteristic (i.e., in the presence of PDR, N2 and delta-band (δ) activity) (i.e., 1 study) [52] and interactions between predictors (i.e., LOC variables, IMPACT variables, age, mean arterial blood pressure (MAP)) and qEEG parameters were investigated in two manuscripts [40], [52].
2) OUTCOME PREDICTION Some variables needed transformation for further analysis. At the endpoint of the prediction model, the functional outcome was categorical variables (i.e., dichotomous GOS: favorable/unfavorable, or good/bad or positive/negative or awakening/death). Five studies used GOS as the primary output, but the time of outcome varied substantially; GOS at 12-month after injury (i.e., 1 study), GOS between 6 -to 12-month (i.e., 1 study), GOS at 6-month (i.e., 1 study), GOS at 17-month (i.e., 1 study) and also GOS at 30-day after injury (i.e., 1 study). In three studies, the GOSE had also been used as an output of the models (i.e., dichotomous: favorable/ unfavorable, or good/bad or positive/ negative outcomes), in different time of assessment (i.e., at 3-month and 12-month after injury). Other categorizations of outcome variables were also made especially for predicting recovery outcome TBI patients in a vegetative state (i.e., dichotomous: UWS/MCS or improved/unimproved) (i.e., 3 studies). For the EEG-R, the outcome variables were dichotomous (present/absence), or (yes/no) or (reactive/unreactive) at 5-month of level of cognitive functioning scale (LCFS) (i.e., 1 study). Please refer to Table 3.
In six studies (i.e., 43%), the FFT was applied to compute the amplitude at each frequency band.  van den Brink et al. [46], used the FFT to create a metric of global frequency band-specific power across electrodes and across frequencies within four selected frequency bands. Five studies (i.e., 36%) used PSD analysis for computing the absolute power, total power, relative power and SEF 90% within each frequency band: δ (0.5-4 Hz); θ (4-8Hz); α (8-13 Hz); and β (13-20 Hz). By applying this method, Mikola et al. [44] were able to extract 186 EEG features (e.g., normalized variability of the relative power in α and fast θ , normalized variability of mean frequency, SEF 90% , and total power in β, fast θ and α bands) for model training and testing (i.e., classification and prediction) using LDA classifier for predicting the positive or negative outcomes.
In two studies, coherence analysis [47], [48] was used to compute magnitude-squared coherence and the imaginary part of coherence of each pair of electrodes in the θ , δ, α and β bands and two studies used connectivity analysis by applying two different approach in each study [17], [46]. A study employed Hilbert Transform analysis [46] to compute connectivity as complex analytic signal (X ) over time (t) across scalp EEG electrodes and Chennu et al. [17] calculated debiased weighted phase lag index (dwPLI) value at the peak frequency of the signal oscillation across all the EEG channels to represent the connectivity between channel pairs. The authors constructed symmetric 173 × 173 dwPLI connectivity matrices of each selected frequency band (e.g., δ, θ, and α).
Stefan et al. [47] calculated the global field power (GFP) by implementing a modified k-means clustering algorithm to extract the EEG topographic maps into four global microstates classes in different outcome groups (i.e., improved vs. unimproved). In the same study, the authors also determined the entropy of EEG signals to quantify the unpredictability neural networks due to consciousness. The authors reported that ApEn and PerEn provided valuable prognostic information of disorder of consciousness (DOC) due to severe head injury. Haveman et al. [45] trained an RF classifier to combine multiple individual classification tree using bootstrapped samples. With bootstrapped samples, the individual tree randomly selected each EEG feature and formed a split at each node. Based on their studies, modeling was performed by training an RF classifier with combinations of all qEEG features, age and mean of arterial pressure (MAP) to predict of outcome at 24, 48, 72 and 96 hours after TBI. Prediction results were determined from the percentage vote of different classification tree [62].
Hack et al. [40] developed prognostic model using an LASSO classifier which used a regularized logistic regression model. In this approach, the authors used LASSO which consisted of a weighted combination of selected linear and non-linear EEG features together with selected clinical features. Chennu et al. [17] constructed SVM classifiers which trained two EEG predictors (i.e., δ modularity and clustering coefficients) to significantly predict future GOSE dichotomized outcomes in individual patients (accuracy = 82%).

V. PREDICTIVE MODEL VALIDATION
For model validation, fourteen studies performed split sample into training and testing sets. Six of these performed split-sample analysis using 70% for training and 30% for testing set or 67% for training and 33% model testing, respectively. In another eight studies, the split-sample analysis was not clearly reported. Concerning evaluation of the model performance, the discrimination was reported in the almost (8/14 studies, 93%) of the models through the area under curve (AUC). Accuracy rates were presented in four studies, sensitivity and specificity were reported in seven studies. Sensitivity, which is also called as True Positive Rate (TPR), describes the probability of measuring a positive value for a positive outcome. TPR is given in Equation (7).
TP and FN denote true positive and false negative, respectively (i.e., TP: actual positive values correctly predicted as positive; FN: actual positive values mistakenly predicted as negative). FNR denontes false negative rate. The FPR can be calculated as (1 -specificity) or (1 -TNR). The specificity or TNR is given by Equation (8): TN and FP denote true negative and false positive, respectively (i.e., TN: actual negative value correctly predicted negative; FP: actual negative values incorrectly predicted as positive).
In binary classification, the accuracy is used to evaluate the model performance. The accuracy rate is the proportion of patients or predictors with a certain outcome that is correctly predicted. The accuracy is determined by the following definition: The receiver operating characteristic curve (ROC) is a graphical representation of the performance of classifier or predictor as the classification threshold is altered. The information contained in the ROC curve is defined by the most popular index which is an AUC. The AUC provides a cumulative measure of performance across all possible classification thresholds [63], [64]. Since the AUC is a part of area in the unit square, its value will always be between 0 to 1.0. A perfect discrimination of the model has an AUC of 1.0, equivalent to a sensitivity and specificity of 100%.

3) PREDICTIVE MODEL VALIDATION PROCEDURES
Three studies performed classification using a leave-one-out cross validation (LOOCV) procedure in which each iteration used an only single sample for testing and the remaining for training. Two studies performed splitting up the sample into k folds (i.e., 1-fold, 4-fold CV) [17], [43], using k-th fold as testing set and the residual (k − 1)-fold as training set [21]. A study used 10-fold stratified CV to avoid over-fitting and circulation analysis. In another eight studies, the cross-validation procedure was not clearly described. Table 4 reports the summary of key findings regarding predictive value of qEEG in predicting TBI outcomes.

4) DESCRIPTION OF PREDICTIVE MODEL VALIDATION
Lee et al. [52] developed a predictive model to predict functional outcome of unfavorable outcome (GOSE 1-4; death, unresponsive wakefulness or severe disability) and favorable outcome (GOSE 5-8; moderate disability or good recovery at 3-month) [52]. The authors developed the model from 152 moderates to severe TBI patients and validated the model in 46 patients from the same samples. The absence of PDR, absence of N2 sleep transients and predominant background in δ activity during 24-hours cEEG were qEEG predictors of the model. They reported sensitivity and specificity for the unfavorable outcome of 94% and 49% with an AUC of 0.77. With the presence of these qEEG features within 72-hour, the model predicted good outcomes with a specificity of 96% and sensitivity of 28% with an AUC 0.76.
Haveman et al. [45] developed three prognostic models. In the first model, they used eight of qEEG features and in others, they added age and clinical variables (i.e., mean arterial pressure (MAP)) and nine IMPACT predictor variables on the second model and the third model used only IMPACT predictor for model comparison. From these findings, they demonstrated the contribution of qEEG predictors to enhance the predictive ability of poor outcomes and showed the potential of RF models to predict the poor outcomes. They reported the hierarchy contribution of qEEG features (i.e., eight qEEG predictors), age, MAP and other clinical factors (i.e., CT scan images, pupil reactivity, hypotension, hypoxia) in the prediction of good or poor outcome for selecting the best predictive model at 72 + 96hr after TBI in the easiest and user-friendly way.
Stefan et al. [47] developed a predictive model by fitting a generalized linear model (GLM) on training data and testing the model on test data. GLM model discriminated the patients into two classes (i.e., improved vs. unimproved) to determine the predictive power of the selected qEEG features (i.e., microstate, ApEn, power in the α and δ frequency, connectivity, complex network analysis). The 10-fold stratified cross-validation was implemented on the training set to avoid model from predicting noise and circular analysis. The authors revealed that coherence in the θ band got the highest classification accuracy with AUC θ = 78 ± 2%, q < 0.0001 compared to α coherence (AUC α = 62 ± 4%, q < 0.0001) and β coherence (AUC β = 67 ± 1%, q < 0.0001). For complex network analysis, they found that clustering coefficients from β coherence (AUC β = 82 ± 1%, q < 0.0001) and α coherence (AUC α = 82 ± 2%, q < 0.0001) performed well at discriminating patients into the two outcome classes (UWS to MCS or death or a permanent DOC) at varied thresholds values.
Chennu et al. [17] built a predictive model using a supervised ML algorithm (i.e., SVM) trained with 4-fold crossvalidation, in which each of the training and validation set contained homogenous proportions of each subject group to avoid any fold-specific effects. Seven qEEG features were calculated for three EEG selected frequency bands (i.e., δ, θ , α): (1) mean relative power over all channels; (2) clustering coefficients; (3) median connectivity across all paired channels; (4) modularity; (5) participation coefficients; (6) characteristic path length; and (7) modular span, which resulted in a total of 21 metrics. They reported the ability of each qEEG metric to discriminate positive and negative outcomes of recovery by calculating the AUC from ROC analysis.
They reported the ability of each qEEG metric to discriminate positive and negative outcomes of recovery by calculating the AUC from ROC analysis. They reported that patients with positive outcomes got higher microscale clustering coefficients in their δ network connectivity and local topographical connectivity in δ band with an AUC = 0.78. They confirmed that the presence of δ modulatory and clustering coefficient in δ network had significantly predicted future GOSE in isolating the positive and negative outcomes of 82% accuracy, 92% sensitivity, and 64% specificity.
Vespa et al. [43] reported three different models. The outcomes were: predicting alive compared with the dead (i.e., GOS score at 5 vs. GOS score 1) and predicting good outcome compared with the bad outcome (GOS score at 4 or 5 vs. GOS score 1 or 2), respectively. For the validation of the two outcomes, they used 1-fold-cross-validation for each model in 89 TBI patients. The predictors were divided into two categories which were: (1) qEEG predictor (PAV mean in 3-day post-injury); and (2) six clinical predictors (GCS score, pupillary response to light, patient age, the result of computerized tomography and early hypotension). They reported the model discrimination of each model separately for model comparison.

VI. DISCUSSIONS
This systematic review evaluates fourteen predictive models for use in a different cohort of TBI patients (i.e., moderate to severe) that considered qEEG data. In this systematic review, the authors developed a multifactorial predictive model as proof of concepts for using ML techniques including qEEG features for prediction of outcomes in moderate to severe TBI patients. Substantial limitations are scrutinized in the development of predictive models that limit their predictive performances.

A. STUDY POPULATION AND EEG PREDICTORS
Based on patient groups reported in fourteen publications of prognostic models for TBI patients, the models for moderate and severe TBI patients were frequently investigated rather than for mild and moderate TBI. As described in Rapp et al. [65], mild TBI prompted a dense adjustment of neuronal oscillation process that cannot be simply detected by standard magnetic resonance imaging (MRI) at any stage of injury. For examples: (i) shortening in the dendrite system of neurons may cause a decrease in the power of fast frequencies; (ii) fluctuations of firing properties of thalamic and cortical neurons may boost the θ frequency activity. Therefore, the EEG would become a promising tool in detecting the physiological changes of these processes [12] as well as for predicting the long-term outcomes of patients with TBI [66], [67].
Most of the studies (n = 14) focused on the development of the prognostic models in the chronic stage of TBI, however specific qEEG variables obtainable from mild and moderate TBI patients would possibly provide significant prognostic information in predicting outcomes after mild or moderate TBI. In addition to that, the association between qEEG features and predictive ability (i.e., predicting poor or good outcome) of prognostic model for these groups (i.e. mild and moderate) have not been clearly established.
Large number of sample sizes are necessary for reliable selection of predictors [35], [68], [69]. Limitation of a low number in sample size often gives an impact on the performance of the predictive model. Thus, manipulating data is a challenge to ML applications to determine the specific association of qEEG predictors to yield the most accurate prognosis in TBI outcomes [46]. In the present systematic review, the predictive models are developed on a relatively small number of patients (e.g., n = 16) but since a very little is known about the usefulness of any qEEG predictors for the prognosis of TBI patients, the available number of patients are still useful for providing a valuable predictive performance of each ML model.
Two studies included a large number of EEG predictors. Tolonen et al. [41] considered over 59 qEEG predictors, and Mikola et al. [44] included 186 qEEG predictors meanwhile the remaining studies used 3 to 5 qEEG predictors for training and model validation. In the twelve studies, the final models included the maximum five EEG predictors variables for each predictive model. Haveman et al. [45] demonstrated the positive association of including multiple qEEG parameters with TBI outcomes in RF model since no ML models based on multiple qEEG parameters were reported to predict patient outcome after moderate to severe TBI. In accordance with Tolonen et al. [41], the multiple qEEG parameters were grouped into five major classes (i.e., (i) absolute power; (ii) relative power; (iii) asymmetry; (iv) variability; and (v) others: spectral entropy, burst suppression ratio) and the authors found that 17 of the 59 qEEG parameters were strongly associated with a good outcome. For example, the EEG absolute power showed a positive correlation to the outcome, as the patients who recovered, had stronger amplitude in their EEG than those who did not.
In other study, Mikola et al. [44] confirmed that 12 of the 186 qEEG predictors contributed high predictive value based on a calculation of their features probabilities of AUROC. Based on the above findings, these numbers of qEEG predictors would appear reasonable for developing predictive models without decreasing the model performance. In contrast, the predictive models which included five or seven clinical predictors outperformed models that included only three or four predictors [70]. Mushkudiani et al. [71] in a review suggested to use a limited set of predictors because including too many predictors would increase the risk of overfitting models and thus limiting the prognostic value of the model leading to poor discrimination or calibration.

B. PREDICTIVE MODEL DEVELOPMENT
In predictive model development, ML techniques ranging from linear regression to deep neural networks would be used to represent a powerful technology capable of effectively predicting outcomes after TBI to support clinical decisionmaking. In this systematic review, discriminant analysis, LDA, SVM, linear and logistic regression analyses, network analysis using the LASSO technique, classification tree using RF technique and generalized linear model (GLM) are considered for model development. Regression analysis is the most frequently used. Logistic regression analysis, which is a supervised learning algorithm is frequently used to calculate the percentage of correct or incorrect predictions according to dichotomized output (i.e. good or bad outcome) of the model.
Logistic regression relates a number (n) of qEEG predictors (M = {M 1 , . . . , M n }) to an outcome (K ) by multiplying the characteristics with regression coefficients (β = {β i , . . . , β n }). These regression coefficients represent the strength of the relation between qEEG predictors characteristic and the dichotomized GOSE or GOS outcome. The output of the regression model is dichotomized into: (i) good outcome -can take the value 1 with a probability P; (ii) bad outcome -can take the value 0 with a probability 1 − P. Thus, the relationship between the qEEG predictor and outcome variables is expressed by the logit transformation of P: (10) Logistic regression model is assessed with maximum likelihood methods by assuming that the outcome K follows a binomial distribution [72], [73]. The advantages of logistic regression over discriminant analysis is that it requires less assumptions. For example, independent variables do not need to be normally distributed, linearly related or have same variance within-group. In addition to that, the regression models are more statistically robust, work based on analysis of relationship between variables and trends, and handle predictions on continuous as well as categorical variables. In contrast, most of the authors agreed that other ML approaches (e.g., decision tree (DT), k-nearest neighbor (k-NN), neural networks (NN)) outperformed logistic regression in prediction of various disease conditions [19], [20], [34], [35], [69], [74], but there was substantial heterogeneity in the modeling process, input and output variables and methods for validating predictive model performance.
Supervised ML methods like SVM, LDA, GLM RF, and logistic regression fall under classification algorithms which involve the process of finding a model or function (f ) that helps in separating the data into multiple categorical classes (i.e., discrete values). In classification techniques, the algorithms learn from given data input (i.e. labeled data) and then use this learning to classify new observations. In contrast to logistic regression, SVM is flexible in representing sophisticated relationships but tends to overfit [68], [75]. SVM classifies data points by choosing the ''best hyperplane'' for separating a set of points into two groups of data. In SVM, a data point is given as a k-dimensional vector (a list of k numbers) and SVM separates these points with a (k − 1) dimensional hyperplane. SVM will generate many hyperplanes that might classify the data but SVM will only select the ''best hyperplane'' to represent the largest separation or margin between the two classes [76].
In their article, Stefan et al. [47] used a GLM classifier in training and testing data of qEEG predictors (e.g., power in α and δ) to determine the predictive power of two groups of outcomes (i.e. improved vs. unimproved). The predictive performance was determined by comparing the binary output of GLM to the actual labels of unimproved vs. improved represented by 0 and 1. GLM is the extension of the linear regression model which assumes that the outcome given by the input features follows a Gaussian distribution. The GLM is implemented from the basic formula of linear regression (see Equation (11)) for classification of continuous outcomes and logistic regression for classification of binary output.
K is the outcome of an instance expressed by a weighted sum of its n features with an individual error ε which follows a Gaussian distribution. We notice that several studies used the advantage of ML techniques (i.e. LASSO, LDA and RF) to identify or develop a more appropriate prognostic model to predict TBI outcomes. A study developed an RF model that combined multiple individual classification trees to randomly select qEEG features at each node [45]. The optimal number of trees (n = 100) was computed by observing the out-of-bag (OOB) error and setting the number of maximum nodes (n = 20). RF classifier generates a model with less bias and variance by fitting and averaging several decorrelation decision trees [66].
Decorrelation means that at each decision node, the variables will select the next branch of each tree in a random subset of all available variables, to make sure that the individual trees are heterogeneous and have variability in the data. RF classifier is an ensemble of learning method and is preferred because of its ability to avoid overfitting of a single tree [19], [77]. However, a problem occurred when using multifactorial data; the RF models excluded observations with missing data, meaning that the other possible relevant predictors (i.e., clinical or electrophysiological data), which were incomplete in all patients, were not considered in modeling process [45].
Network analysis (i.e., LASSO) is a supervised ML algorithm under regression analysis. The LASSO method develops a classifier using a regularized logistic regression model with a LASSO penalty (L1 absolute value). LASSO includes a term to penalize model complexity and diminish less importance variables towards zero to produce a scattered model and more interpretable [66], [78]. Hack et al. [40] used the LASSO classifier to demonstrate the performance of an algorithm to develop three predictive models derived from different input features (i.e., LOC only, LOC + amnesia information (i.e., PTA) and LOC + qEEG features). qEEG-based classification algorithm demonstrated significant improvement (AUC = 83%) in predictive power compared with other two models.
Based on data presented in our systematic review, most of the studies included a comprehensive description of the chosen ML techniques (i.e., 85%) which is important for understanding and mimicking of the developed predictive model. In our viewpoint, any predictive ML model should have their codes accessible for anyone to examine and do some improvement, as well as to reduce the black-box properties of ML models resulting in more practical implementation predictive ML models in clinical practice.

C. PREDICTIVE MODEL VALIDATION
Predictive model performance and generalizability are important for evaluating each prognostic model performance. Models validation involves two steps: (i) internal validation; and (ii) external validation. In this systematic review, most of the studies conducted internal validation within the development performed by split-sample, cross-validation or bootstrap techniques compared to external validation [71]. The predictive model performance can be estimated by evaluating calibration and discriminant characteristics. No studies used calibration tests in this review. Most of the studies (i.e., n = 14) reported discriminant tests in terms of accuracy rate (Acc), Sensitivity (Se), Specificity (Sp) and AUC.
The accuracy of the prognostic model is determined from the ''testing set'' that consists of observations of previously unseen by the model. From four prognostic models reporting accuracy, Chennu et al. [17] presented the connectivity in δ modularity coefficient with excellent predictive accuracy above 85% and other qEEG predictors presented at least good accuracy above 70% [46] and two studies presented their fair accuracy below 60% in different qEEG predictors (e.g., PAV) [42], [43]. In addition to that, most of the prognostic model (n = 10) considered GOS or GOSE for favorable or unfavorable as an outcome for AUC > 0.69 which was considered as excellent discrimination. The AUC > 0.69 also demonstrated the strong associations of individual EEG predictors to predict the favorable or unfavorable outcomes.
The purpose of the prognostic model is to provide valid outcome predictions for new data patients. Prognostic models have been proposed as an alternative to improve power in long-term TBI outcome prediction [79]- [81] and assist in clinical audit [82], [83]. For this purpose, it is crucial to evaluate both internal and external validities of proposed prognostic models. Internal validation is important to ensure that the model can be implemented in a patient similar to the development series using a similar setting. While for the external validation, the model could practically be used to assess performance in new patients, for example, patients from different settings.
Internal validation using split-sample analysis (i.e., splitting data into one part of the study population for model training and another part for validation or testing) was reported in eight studies. With split-sample validation, the patient population is randomly separated into two groups (i.e., training and testing). In training datasets, the model is created from 2 3 of the data, and the model performance is evaluated in from 1 3 of the data. This is a classical procedure and has various disadvantages such as inefficiency and uncertainty [74].
However, cross-validation (CV) and bootstrapping methods are the most preferred and efficient for model validation. Cross-validation involves extensive split-sample into k parts, using k-th part as testing data and remaining (k − 1) parts as the training set. This process is repeated k times until all data have been used as a testing set once. For example, leave one out cross-validation (LOOCV) is the most extreme and computationally demanding because each iteration uses a single data for testing and the rest for training. Stefan et al. [47] used an option to split the dataset into ten parts, used one part for testing and nine parts for training, which were equivalent to ten-fold CV. They required ten iterations but it is recommended to repeat the whole procedure many times to obtain more reliable performance measures [21].
To the best of our knowledge, the result of the current systematic review indicates the substantial benefits of using qEEG parameters for predicting outcome in patients with TBI. With the application of ML models, several qEEG features will potentially provide useful prognostic information in prediction of outcome in patients with moderate to severe TBI. In the future, we suggest that the predictive power of the qEEG features should be compared with clinical predictive models such as IMPACT in TBI in order to find out the added value of adding qEEG variables to the models. For this purpose, a combination of ML models and qEEG features would reveal the efficiency of considering qEEG features as reliable prognostic markers in TBI outcome prediction.

VII. CONCLUSION
To conclude, from the present relatively small sample population and few qEEG variables, we found that ML techniques have great potential for improving TBI outcome prediction. The ML models (i.e., logistic regression, RF, LASSO, GLM) described in this review would benefit from further development. We believe that development of prognostic model based on combination of multiple qEEG features can be rendered more efficiently using an advanced ML algorithm (i.e., neural networks) by considering the heterogeneity of brain injury (i.e., mild, moderate and severe), which in turn can hopefully increase the predictive power of a model. Future studies should explore the hurdles in model creation, validation and consideration of applying these methods for routine clinical TBI decision making.