Machine Learning and Deep Learning Approaches for Brain Disease Diagnosis: Principles and Recent Advances

Brain is the controlling center of our body. With the advent of time, newer and newer brain diseases are being discovered. Thus, because of the variability of brain diseases, existing diagnosis or detection systems are becoming challenging and are still an open problem for research. Detection of brain diseases at an early stage can make a huge difference in attempting to cure them. In recent years, the use of artificial intelligence (AI) is surging through all spheres of science, and no doubt, it is revolutionizing the field of neurology. Application of AI in medical science has made brain disease prediction and detection more accurate and precise. In this study, we present a review on recent machine learning and deep learning approaches in detecting four brain diseases such as Alzheimer’s disease (AD), brain tumor, epilepsy, and Parkinson’s disease. 147 recent articles on four brain diseases are reviewed considering diverse machine learning and deep learning approaches, modalities, datasets etc. Twenty-two datasets are discussed which are used most frequently in the reviewed articles as a primary source of brain disease data. Moreover, a brief overview of different feature extraction techniques that are used in diagnosing brain diseases is provided. Finally, key findings from the reviewed articles are summarized and a number of major issues related to machine learning/deep learning-based brain disease diagnostic approaches are discussed. Through this study, we aim at finding the most accurate technique for detecting different brain diseases which can be employed for future betterment.


I. INTRODUCTION
Over the most recent couple of decades, brain-computer interface (BCI) turned into one of the most favorite fields of research due to its unlimited possible applications such as The associate editor coordinating the review of this manuscript and approving it for publication was K. C. Santosh .brain fingerprinting, detection and prevention of neurological diseases, adaptive e-learning, fatigue, stress, and depression monitoring and so on [1].BCI establishes an effective communication link between a brain and a device by capturing the most relevant feature required for the establishment.Among the applications of BCI given above, detection of neurological diseases has turned into an acute research field due to its growing importance which need not be mentioned.Due to the complex structure of the brain that varies with age and pathological history, it has always been very hard to detect neuro-degenerative diseases.It is very much important to diagnose these diseases in early stages.Computer-aided mechanisms play a better role than conventional manual practices in detection of different brain diseases [2].However, the main focus of this study is to provide a brief review on recent ML and DL approaches to detect four different most common types of brain diseases such as Alzheimer's [3], brain tumor [4], epilepsy [5], and Parkinson's [6].In the following section, a brief discussion on ML and DL is provided.

A. BACKGROUND KNOWLEDGE ON ML AND DL
ML is a process of training a computer to apply its past experience to solve a problem given to it.The concept of application of ML in different fields to solve problems faster than human has gained significant interest due to the current availability of cheaper computing power and inexpensive memory.This makes it possible to process and analyze a very large amount of data to discover insights and correlations amongst the data which are not so obvious to human eye.Its intelligent behavior is based on different algorithms which enables the machine to make abstractions based on experience, in order to produce salient judgments.On the other hand, DL is a sub-field of ML, however, a more advanced approach which enables computers to automatically extract, analyze and understand the useful information from the raw data by imitating how humans think and learn [7].Precisely, deep learning is a group of techniques that is neural data driven and based on automatic feature engineering processes.The automatic learning of features from inputs is what makes it so accurate and of excellent performance [7].A quick overview of the difference between artificial intelligence (AI), ML, and DL is provided in Fig. 1.Success in making the right decision in ML and DL relies on the classification algorithm.There are different classification algorithms available in ML which are specially designed for classification purposes and the performance is quite decent.Even though performance of ML is quite up to rank, it is currently being replaced by DL in most classification applications.The principle difference between ML and DL is in the technique of extracting the features on which the classifier works on.Extracted features of DL from several non-linear hidden layers makes its classification performance far better than ML's classification which relies on handcrafted feature.In order to understand the difference between ML and DL, let us refer to Fig. 2.

B. CLASSIFIERS
Data to be examined under ML/DL must go through a bunch of preprocessing steps in order to transform the raw data into machine readable data and to prepare it to undergo feature extraction.Analysis of data that has been collected is done based on certain characteristics called features.The features being considered must have the ability to discriminate and must be non-redundant.This way the training time and overfitting issues are decreased.There are different methods of extracting features.A brief overview of the feature extraction methods that are most commonly used in brain disease detection is provided in Section IV.After the extraction of features, the data can be labeled.The method by which the machine takes decisions of labeling data is called a classifier.In other words, a machine uses different classifier algorithms to classify data.Some of the most frequently used classifiers are SVM, RF, LR, DT, NB, KNN, and so on.On the contrary, instead of the step-by-step process like ML, DL forms an  entire network inspired by a biological neural network in order to perform the entire process of ML.It uses several layers of nonlinear processing units.The output of a unit is fed as input to the next unit.Throughout the hierarchical structure of data movement, each level transforms the data it receives into more abstract data to be fed to the next level.DL employs different kinds of classifiers including RNN, CNN, Boltzmann machine, autoencoders, and DBN.Considering the literature surveyed in this work, the ML and DL classifiers to detect brain diseases can be classified as shown in Fig. 3.

C. SEARCH STRATEGY
We searched articles related to ML and DL approaches on above-mentioned 4 brain diseases till October 2020 mainly from IEEE Xplore (https://ieeexplore.ieee.org/),Sciencedirect (https://www.sciencedirect.com/)and Google Scholar (https://scholar.google.com/).147 papers are selected in total for review considering different criteria such as diverse ML/DL approaches, different modalities of data for classifying diseases, source of datasets etc. Articles related to AD are selected from the period 2018 to 2020, whereas articles related to brain tumors, epilepsy, and PD are chosen only from 2020.

D. PERFORMANCE METRICS
Evaluation of ML/DL detection systems in order to shape the likelihood of correctly classifying AD, MCI, and NC is based on some performance parameters including accuracy (A cy ), sensitivity (S ny ) /recall, specificity (S py ), precision (P rn ), AUC, and F1 score.Different performance metrics imply different conclusions for a detection model.While a model may give outstanding results in terms of accuracy, it may give very poor results in terms of specificity.Based on the rationale, we summarize the papers in tabular form including the performance metrics stated.The elementary evaluation metric of any classification system is accuracy.It is as simple as the number of accurate predictions to the total number of predictions being made.Mathematically, it can be defined as where τ P and τ N are true positive and true negative respectively, which refer to correctly labeling positive as positive and negative as negative.Labeling negative as positive and vice versa results in false positive (F P ) and false negative (F P ), respectively.While accuracy deals with both positive and negative results, the performance of a specific model in terms of detecting either positive or negative is evaluated using sensitivity/recall and specificity, respectively.Therefore, mathematically sensitivity and specificity are defined respectively as These are also known as true positive rate and true negative rate, respectively.The formula of sensitivity implied that it is a measure of the successful diagnosis of diseased patients.On the other hand, precision measures the actuality of the diagnosis i.e., the proportion of the patients diagnosed by a system, who were actually affected by the disease.Mathematically, it can be defined as On the other hand, the harmonic mean of sensitivity and precision is called the F1 score of that model which is defined as F1 = 2 × S ny × P rn S ny + P rn .
Moreover, the plot of true positive rate vs. false positive rate is widely used for the assessment of the diagnostic ability of a binary classification system and is referred to as receiver operating characteristic curve (ROC).The area under the ROC curve (AUC) defines the ability of the model to distinguish between the binary choices under diverse discrimination threshold.Furthermore, MCC is defined as the ratio of specificity and sensitivity.Mathematically, it can be represented as Another evaluation metric is known as Jaccard similarity index (JSI) which can be calculated mathematically as

E. IMAGE MODALITY AND OTHER DATA
One of the most prominent factor is the source of data to detect different types of brain diseases.These data can be in the form of MRI images, PET, SPECT, speech, blood, protein, saliva, sensors data related to gait patterns, and so on.In Section V, image modality and other data that are used most commonly to detect four different types of brain diseases (e.g., AD, brain tumors, epilepsy, and PD) are summarized.

F. CONTRIBUTIONS
The main contributions of this survey are summarized as follows: • We have brought together recent researches on four brain diseases (e.g., AD, brain tumor, epilepsy, and PD) exploiting ML and DL with the goal of searching for the most accurate technique of detection.
• A brief overview on each of the twenty-two brain disease databases that are used most frequently in the reviewed articles is provided.
• A brief overview on most commonly used feature extraction methods in diagnosis of brain diseases is provided.
• Finally, the key finding from the reviewed articles are summarized.Moreover, various open issues and future research directions are provided.

G. PAPER ORGANIZATION
The rest of the paper is organized as follows.In Section II, the different brain disease databases are described.Literature review on four different brain diseases is provided in Section III.Section IV demonstrates commonly used feature extraction methods.Key findings from the reviewed articles are provided in Section V; the section also discusses a number of open research issues and possible future directions.Finally, the paper is concluded in Section VI.To look at a glance, the organization of this article is demonstrated in Fig. 4.

II. DATABASES RELATED TO BRAIN DISEASE
A. ADNI [8] AD is assumed to be a slow process and discernible in older people.The symptoms are not visible for years and are hard to detect.But detection of AD in the early stages is essential before starting any clinical procedures.MCI is an initial stage of AD and might convert to AD.So identification of MCI is of great significance.Neuroimaging and biomarkers are the preeminent sources of information in these detection processes.ADNI is an association of medical centers and universities located in the USA and Canada.Its main aim is to provide open-source data sets to discover biomarkers and to identify and track AD accurately.It developed to become an ideal source of longitudinal, multisite MRI and PET images of patients with AD, MCI, NC, and elderly controls.The data sets formed to make the detection system powerful by providing baseline information regarding changes in brain structure and metabolism and also through clinical, cognitive, and biochemical data.The study of ADNI has been taking place for about 17 years from 2004 in four phases -ADNI-1 (5 years), ADNI-GO (2 years), ADNI-2 (5 years), and ADNI-3 (5 years).E. MILAN [12] This dataset is a revised form of the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's disease and related disorders association criteria of 1984.The revised version would be versatile enough to be utilized by each general healthcare provider and researcher.They present criteria for all-cause dementia and AD dementia.The general framework of probable AD dementia from 1984 remains the same.On the premise of the past twenty-seven years of expertise, they tend to create many changes within the clinical criteria for diagnosing.They preserve the term possible AD dementia.However, redefined it in a manner additional targeted than before.Bio-marker proof was additionally integrated into the diagnostic formulations for probable and possible AD dementia to use in analysis settings.The core clinical criteria for AD dementia are still the cornerstone of the diagnosis in clinical apply.However, biomarker proof boosts the pathophysiological specificity of the diagnosing of AD dementia.
F. MIRIAD [13] The dataset includes longitudinal volumetric T1 magnetic resonance imaging scans of forty-six mild-moderate Alzheimer's subjects and twenty-three controls.G. FHS [14] FHS aims to spot the relevant factors that contribute to CVD.

1) SimBraTS
SimBrats is a dataset of MR intensities used in the detection of brain tumors.The data is manually segmented and classified to background, edema, or tumor core.The modalities of image data used here are T1, T1C, T2, and FLAIR.Annotations are done based on protocols given by qualified doctors.

L. FIGSHARE [19]
Figshare consists of a total of 3064 2D T1-weighted MRI images of the brain tumor.It consists of three kinds of brain tumor images (1426 gliomas, 708 meningiomas, and 930 pituitary tumors).The images were taken from 233 patients.All these images were taken by expert radiologists and were publicly shared.
M. KAGGLE REPOSITORY [20] Kaggle is a repository containing over 50,000 publicly available datasets.UCI comprises databases, domain theories, and data generators for analysis of different ML algorithms.The repository consists of 559 datasets.It is considered a prime source of ML and has an immense impact in this field.It has been cited over 1000 times which makes it one of the best 100 repositories of computer science.
O. ISLES CHALLENGE [22] ISLES incorporates imaging data of acute stroke patients.The patients were presented within 8 hours of stroke onset and underwent stroke MRI DWI data within 3 hours after computed tomography perfusion.The brain lesions evaluation of some patients covers the data of two slabs.Also, for easy analysis of data, the training and mapping names are presented.This training data set comprises data of 63 subjects and encompasses diffusion and perfusion map information.
P. CHB-MIT [23] CHB-MIT contains EEG signal recordings of 22 pediatric refractory seizure patients.These records were taken by observing the patients for several days after the withdrawal of anti-seizure medication for characterizing their seizures and also to determine if they require further surgical intervention.T. PC-GITA [27] The PC-GITA database contains speech recordings of Spanish people having PD and their respective controls matched by gender and age.In total, it accommodates the speech of 50 PD patients and 50 healthy controls.It's the first dataset that provides recordings in Spanish.All the recordings are taken in noise controlled environment and using professional instruments.Also, the protocols are designed under the supervision of experts.Variation of the pitch and the stability of the phonation can be extracted and used for the analysis of phonation, articulation, prosody, and intelligibility of the patients.
U. HABS [28] HABS provides baseline data on neuropsychological, clinical, and imaging.Spreadsheets on Pittsburgh compound B (PiB)-PET and ROI of sMRI provided here.Also, the number of neuropsychological tests, clinical assessments, and demographic information is available.The dataset is accessible only to the researchers.The dataset is available by filling out a simple form online. Previously only baseline data were available.Now they are planning to provide longitudinal data as well.
V. HMSD [29] Harvard medical school dataset is an online base database that is available to all.It contains data on different cerebrovascular, neoplastic, degenerative, and inflammatory diseases of the brain.Among those AD, gliomas, and stroke are noteworthy.It also has images of a normal brain.The images are visible in the browser with some medical terms.Modalities of images include CT, MRI, and SPECT/PET.responsible for language and memory resulting in memory loss of the patient and also the ability to perform regular tasks.As the disease progresses, it makes the affected person lose his/her control over bodily function which one day leads to death [3].To diagnose the progression of different stages of AD, manual detection systems were performed before by Radiologists.However, these manual systems may lead to errors which can be serious for the patients.The recent approaches based on ML and DL can perform automatic detection of early stages of AD [30].Such attempts were made in the following works.It should be mentioned that the performance results in this survey are shown mainly for AD vs NC/HC class for simplicity of the presentation.For further details, it is recommended to check the corresponding article.

1) ML-BASED APPROACHES IN AD DIAGNOSIS
Here, we present recent works related to ML approaches to identify patients with AD.Note the summary of the presented works for a quick overview is provided in Table 2.
To predict AD early, a computational method exploiting SVM-based ML approach was investigated in [31], where gene-protein sequence was used as a source of possible information.On the basis of obtained classification performance, it was suggested that ML based strategy can a promising approach to predict AD by exploiting the sequence information of gene-coding proteins.In [32], an ML model was proposed to diagnose AD early, where various linguistic features were extracted through speech processing.The extracted linguistic features (syntactic, semantic, and pragmatic) from 242 affected and 242 non-affected subjects with AD were further processed with different feature selection techniques.The selected features are then fed to the ML classifier.The proposed ML model achieves the highest precision of 79% using KNN feature selection with SVM classifier in distinguishing AD patients from NC.The authors in [33] investigated an ML prediction model for early AD detection based on neuropathological changes of patients.Here, post-mortem neuropathological lesions were considered to be more explicit and certain than clinical symptoms.However, considering the obtained accuracy of 77%, the authors suggested that the proposed model might not fit for clinical application but it can be a step towards precision medicine in AD.In [34], ML was applied in order to differentiate among age-matched 48 AD, 75 EMCI, 39 LMCI patients, and 51 NC.Six types of multi-regional WM metrics, preprocessed from DTI scans, were jointly used as discriminative features.SVM and logistic regression (LR) ML classifiers were applied to categorize the four classes (AD, EMCI, LMCI, HC) where SVM outperforms LR with an average accuracy of 92% using combined metrics.Permutation test, ROC curves, and AUC were further performed to validate the robustness and stability of the classification methods.The proposed WM-based ML binary classification method can also be used as an alternative way to perceive persons with Alzheimer's.A novel switching-delayed PSO based optimized SVM (SDPSO-SVM) approach was investigated in [35] to classify the patients with AD from NC and MCI.For the experiment, a total of 361 subjects was selected from the four different groups such as AD, stable MCI (sMCI), progressive MCI (pMCI), and NC having 92, 82, 95, and 92 subjects, respectively.It was shown that the proposed scheme obtained excellent classification accuracy as compared to several conventional ML approaches.
In [36], an ML framework is proposed with a precise feature selection algorithm and hierarchical grouping method for multiway categorization of AD and MCI subtypes.T1 weighted MRI data of four classes namely, AD, cMCI (converted to MCI), MCI (do not convert to MCI), and NC, 100 subjects from each class, are obtained for training and testing.The hierarchical grouping process converts the 4-way classification into 5-way binary classification problems.The proposed feature selection algorithm selects features based on relative importance which results in simpler feature space for each classifier compared to conventional methods.Further employment of revised classifiers resulted in even better performance in terms of classifying.In [37], a methodology based on EEG to diagnose AD and MCI through applying frequency and time domain analyses on EEG rhythms was proposed.Brain anomalies associated with AD and MCI are characterized through the extraction of spectral and non-linear features from the EEG recordings acquired from 37 AD, 37 MCI patients, and 37 NC subjects.A fast-correlation filter based automatic feature selection technique is adopted which avoids redundancy in features.Three different ML classification methods are trained using these features in order to classify AD, MCI, and NC.In terms of the features considered, MLP outperforms other classifiers in diagnosing both healthy and AD subjects.Considering the effectiveness of treatment at an early stage of AD, a study based on standard neuropsychological tests and simple cognitive task was proposed in [38].Numerous cognitive features were collected from 28 mild AD or mild cognitive impairment patients and 50 cognitively normal (CN) older adults via the neuropsychological tests and the cognitive task.Three self-generated datasets were formed using data from neuropsychological tests and cognitive task separately and jointly.These datasets in their original forms, after principal component analysis (PCA) feature extraction and feature selection were classified as AD and NC using four supervised ML algorithms.RF performed better for the dataset from neuropsychological tests while for the combined dataset, SVM outperformed other classifiers.Instead of brain images, self-generated patient's speech signals were used in [39] to identify patients with mild AD from MCI and NC.It was observed that combining both acoustic and linguistic features can provide better classification accuracy than an individual feature.Moreover, it was expected that full automation in speech signal processing can be the basis for the automatic identification of patients with AD in the future.In [40], the proposed model distinguishes among AD, MCI, and NC through the extraction of 2D textures from T1 weighted MRI images of 189 AD, 165 convert to MCI, 231 MCI-non converters, and 227 NC subjects.Rough ROI (RROI) technique was applied to extract features from the specified ROIs which are further generalized with high dimensional feature selection techniques and then classified patients via ML approaches in different classes.Among the different feature selection techniques, it was identified that Fisher performs better.
A DT based ML model was proposed in [41], where CSF biomarkers were mainly utilized to distinguish among AD, MCI, and NC using the collected data from 1004 probable AD and 442 NC patients.The decision tree algorithm was based on classification and regression tree analysis.A joint human connectome project multi-modal parcellation (HCPMMP) model linked with network-based analysis was proposed in [42] which performs binary classification among AD, EMCI, LMCI patients, and NC subjects.Numerous network features were considered in the connectivity network as the candidate features and filter & wrapper feature selection methods are sequentially applied leading to ML classification.The joint HCPMMP (J-HCPMMP) basically outlined the cortical architecture, function, and connectivity features related to AD and different stages of MCI and achieved the highest accuracy with SVM classifier for each group of subjects.In [43], the authors applied a three layer (input, hidden, and output) ANN to illustrate its effectiveness in AD diagnosis.The diagnosis was based on SPECT brain images of cerebral blood flow of 132 subjects with 72 AD patients and 60 NC.A total of 36 numerical values from 12 areas of Parietal, Ventricular, and Thalamus brain profiles were taken into consideration.The performance of ANN was also compared with discriminant analysis (standard statistics method) where ANN was found to be more sensitive and specific than discriminant analysis in identifying AD patients from NC.In [44], a computer-aided-diagnosis (CAD) system was proposed based on the fractal dimensions of cortical surfaces and Alzheimer's disease assessment scale cognitive scores (ADAS-Cog scores) collected from 70 subjects of ADNI database.The sMRI data of 35 mild AD patients and 35 healthy controls were utilized to acquire cortical models with fractal dimensions of cortical ribbon, pial surface, gray/white surface, and cortical metrics with fractal dimensions of cortical thickness and gyrification index.The cortical measures and ADAS-cog score were considered separately and jointly using several ML classifiers to discriminate AD subjects from healthy controls.The performance with cortical metrics was found to be better than with cortical models for all classifiers and of all the cases cortical metrics combined with ADAS-cog scores showed the best performance for SVM algorithm.Instead of MRI data, gene protein information was used in [45] to classify the patients with AD from non-AD by exploiting an RF classifier with the k-skip-n-gram feature extraction method.
By exploiting DKPCA as a feature extraction and dimensionality reduction technique, an ML approach was proposed in [46] for the diagnosis of different stages towards the progression of AD.The superiority of the proposed method as compared to the conventional techniques was proved via different performance metrics.A computer aided diagnostic system to determine the sign of AD was proposed in [47], where seven different types of feature extraction techniques were used to study the performance of the system.Among the different feature extraction techniques, it was identified that the ST technique outperformed the others when the same number of features were considered.Moreover, Student's t-test technique was used for feature selection.A prediction model for AD detection by analyzing miRNA was proposed in [48], where gradient boosted trees from the LightGBM framework (version 2.1.0)was used as the ML classification model.The suggested method in [49] can effectively diagnose AD, where multi-stage classifiers comprising of the three state-of-the-art classifiers such as GNB, SVM, and KNN are used.Moreover, FreeSurfer was used to extract the features and PSO was used as a feature selection technique.A novel AD detection scheme using ML was proposed in [50], where saliva samples collected from 39 volunteers at the local community was analyzed by Raman hyperspectroscopy.By exploiting the different ROIs from brain MRI images, an SVM-based classification approach for the diagnosis of AD was proposed and performance was studied in terms of accuracy in [51].In [52], three different experiments were performed to predict AD early exploiting four state-of-the-art ML classifiers such as SVM, ANN, 1NN, and NB.Among the classifier, ANN and NB score higher rating respectively in manual and automatic biomarker selection in terms of ROC.Moreover, ensemble or hybrid modeling, where all the four classifiers are combined, impressively improves classification results.The proposed method in [53] for early detection of AD was based on blood plasma protein which is comparatively inexpensive and easier to access.The blood proteomic data was collected from the ADNI database.The correlation-based feature subset selection method was used to select the 16 proteins as relevant biomarkers for the classification.The SVM with a 2-degree polynomial kernel was used to classify AD.The proposed approach in [54] was developed to predict AD and MCI early and classify them from elderly cognitively normal.To compute CT of several anatomical regions from segmented gray matter tissue, the FreeSurfer method was used and required features were extracted.It was identified that non-linear SVM with RBF kernel showed better performance than some other classifiers.Brain region-gene pairs were proposed as the multimodal fusion features to detect AD in [55].SNP and fMRI were used to detect correlation between genes and brain regions and build the fusion features.PCA technique was used to extract those features and then fed to the CERF framework.By selecting distinguishing biomarkers between AD and NC this framework could easily detect abnormal brain regions and genes.To investigate the performance of an ML algorithm on the classification of patients in different classes such as AD vs or non-MCI, AD vs MCI, and MCI vs non-MCI, a dCDT method was investigated in [56], where data was collected via a memory assessment program.In [57], on the basis of communicability at the whole brain level, an ML framework was developed for the classification of AD by using DWI data.The detection performance of AD from NC was investigated by applying three state-of-the-art ML classifiers such as SVM, RF, and ANN.The outcome of this study suggests that the alterations in the brain's structural communicability because of AD, can be a worthful biomarker to characterize pathological conditions.
Considering volumetric information of right and left hippocampus of brain, age, and gender, AD prediction approach was proposed in [58].The performance of the proposed study was investigated by six different ML classifiers.In [59], the authors claimed that they have first proposed an SVM-based ML approach to identify AD patients from NC, where graph theory parameters were used from EEG signals.However, it is not evident that either the graph theory based model is better than other types of EEG analysis or not to identify patients with AD from NC.A five-stage ML pipeline was proposed in [60] for the diagnosis of AD, where MMSE, Atlas Scaling Factor, and clinical dementia rating scores were used for the analysis.Among the different classifiers, RF showed the best performance in terms of different performance metrics.To identify the patients with AD, an ML approach named LogisticRegressionCV was proposed in [61], where the spectrogram features extracted from speech data were utilized.The speech data was collected from wearable IoT devices and created a database by the authors named VBSD.Moreover, the existing Dem@Care dataset was also used to verify the proposed strategy.From the experimental results, it was observed that the proposed LogisticRegres-sionCV model shows improved performance on Dem@Care dataset as compared to VBSD.Based on the functional features extracted from 5 core brain regions, a classification method was proposed in [62] to identify different stages accurately towards AD.In [63], to identify the patients with AD, DNA methylation expression profiles were collected from GEO database and then integrated genome-wide analysis was performed.Three different ML classifiers (e.g.SVM, DT, and RF) were exploited to predict AD.It was identified that RF classifier predicts AD more effectively.To identify the presence of AD, three different diagnostic biosignatures were produced in [64] and the performance metrics were validated through AutoML tool JADBIO.The produced biosignatures were based on blood miRNA, mRNA, and protein, respectively.In [65], a combination of laser-induced breakdown spectroscopy and a supervised ML algorithm (QDA) was used for analyzing the micro drops of plasma samples to diagnose a patient with AD or no AD.As a specimen for the analysis, 67 plasma samples from 31 AD patients and 36 NC were taken.The manual selected features from the difference spectra was exploited to study the performance of the proposed system.

2) DL-BASED APPROACHES IN AD DIAGNOSIS
In this section, we present recent works related to DL approaches to identify patients with different stages of AD.Note the summary of the presented works for a quick overview is provided in Table 3.
A neuroimaging study with deep CNN was performed in [66] to detect different stages of AD such as non demented, very mild AD, mild AD, and moderate AD by exploiting axial, coronal, and sagittal planes of MRI image.Though the precision of detecting non demented and very mild stage was satisfactory, precision of detecting moderate and mild dementia was poor.[67] proposed a novel 8 layered 3D CovNet specialized in automatic detection of significant features required to classify between AD and NC.The impact of different factors such as pre-processing, data partitioning strategy, tuning hyperparameter, and dataset on results was discussed.The use of DNN for detecting different stages of AD using MMSE was validated in [68].This work ranked third considering the overall accuracy in ''The International Challenge for Automated Prediction of MCI from MRI data''.The performance of this study shows the competency of DNN for future developments of AD detecting systems.A methodology using RNN with LSTM to diagnose preclinical or early AD was proposed in [69].The superiority of the proposed approach as compared to the conventional ML approach was authenticated in terms of accuracy.CNN-AlexNet was used in [70] to classify the processed fMRI data into 5 categories naming NC, significant memory concern, EMCI, LMCI, and AD.A good number of preprocessing of the raw data including removal of unwanted tissues, slice timing corrections, spatial smoothing, high pass filtering, and spatial normalization resulted in very high accuracy of detection by AlexNet.
In [71], a cascaded deep CNN using Softmax function to detect AD, MCI, and NC was investigated, where fuses features from 3D patches of MRI and PET images were used.
Results not only showed that multimodality is superior to unimodality but also showed that deep CNN can perform better than autoencoder to detect AD from NC.A classification strategy on the basis of multiple clusters DenseNets with Softmax function was proposed in [72], where each MRI (T1) image was divided into local regions instead of considering ROIs in order to save time and computational cost.[73] combined hippocampal morphology features from 2.5D patches gone through CNN with other brain morphology in ROI gone through FreeSurfer for detecting sMCI,cMCI, and AD using ELM.The employment of both features extractions resulted in higher accuracy as compared to the accuracy obtained while considering only one type of feature extraction.In [74], out of 256 2D slices of preprocessed sMRI, most informative slices were selected based on image entropy.A classification strategy using CNN for the automatic detection of patients suffering from AD, sMCI, and cMCI was proposed in [75], where high levels of accuracy were obtained for all the different classes.A new technique termed ''attention based'' 3D ResNet for diagnosing AD by identifying chief brain regions associated with AD symptoms was proposed in [76].The attention seeking protocol resulted in 92% accuracy which would rather be 90% without it.In [77], a scratched-trained CNN model having a minimal number of layers with optimal performance was proposed to identify patients with AD.For the experiment, a total of 56 subjects was selected, where the patients with AD and NC were 28 and 28, respectively.It was identified that the proposed model outperformed Alexnet, Googlenet, and Resnet50 in terms of classification accuracy.The methodology proposed in [78] identified MCI from NC based on a dataset including R-fMRI time series data from ADNI and resulting CCD data due to preprocessing the data.Detection using the proposed autoencoder resulted in around 20% improvement in terms of accuracy compared to traditional classifiers.The detection accuracy was validated by the AUC under the ROC curve.An innovative deep convolutional generative Boltzmann Machine with multitask learning model was proposed in [79] to define a connection between feature extraction and classification.The proposed method obtains an accuracy of 95.04% and gains an increase of 2.5% than the existing model as mentioned in the study.
In [80], a novel approach was proposed to identify MCI patients who are at a higher risk of developing the MCI to AD.This proposed method classified MCI to AD conversion and AD vs NC.Besides the modalities of images used here, it can be applied to other modalities like PET.Moreover, the convolutional framework used here makes the system more flexible as any kind of 3D image dataset is applicable to it.Deep learning is applied on the basis of dual learning and an adhoc layer.The neural network uses fewer parameters and thus prevents data overfitting.Here, they have used a multi-modal feature extractor and 10-fold cross validation for testing purposes.In [81], a novel DBN framework was proposed that uses limited 18F-FDG-PET data from ADNI to identify AD from MCI patients.In this method, the images were pre-processed first and then ROIs were identified.From ROIs, features were then extracted using DBN.DBN makes the prediction simpler.In the last step, SVM was used with three kernels (e.g., linear, polynomial, and RBF) for classification as it is advantageous in the classification of a small dataset.Among the different kernels, RBF showed the highest  performance.A multi-modal system was considered in [82], where a gated recurrent unit approach, a variant of RNN was used for each modality to classify MCI patients that were converted to AD or not.The system doesn't need any preprocessing steps and is capable of working with longitudinal data with any irregular length.In [83], MRI and PET modalities were being used to differentiate between AD from NC, pMCI from NC, and sMCI from NC.A novel method was proposed, where 3D-CNN was first applied to extract the primary features and next instead of the general FC layer, the FSBi-LSTM was used to get more accurate spatial information.Afterward, the features are classified using SoftMax classifier.Also, the number of filters in the convolution layer was reduced to avoid overfitting.In [84], a DL algorithm was proposed that detected either a patient had AD, MCI, or none.18F-FDG PET was being used from ADNI dataset where 90% was used for training and 10% for testing.Auxiliary diagnosis of AD was investigated in [85] using deep learning.It was multimodal in the sense that, two independent CNN were used to extract features from two different modalities (e.g., PET and MRI) of images of the same patient to classify.Next, the results were judged using correlation analysis.Moreover, the obtained results were integrated with the neuropsychological diagnosis for classification which made the whole process much more efficient.It is also mentionable that the image format converted from DICOM to PNG makes the processing method less complicated.
In [86], an end-to-end learning approach was applied in deep learning that increased the performance of the whole system.Four classifications were made i.e AD vs NC, pMCI vs NC, sMCI vs NC, and pMCI vs sMCI using volumetric CNN.Both supervised and unsupervised learning methods were applied.In [87], a DL approach using CAE was investigated to classify patients with AD from NC, where MRI was decomposed and the extracted features were compared with neuropsychological tests and other clinical data.Through this, a link between these data had been found with a correlation of more than 0.6.In [88], a deep learning algorithm was proposed that used R-fMRI to detect AD.Training and classification were done using all f-MRI and clinical data.It is found that the accuracy has approximately increased to about 25% compared with other mentioned existing methods.In [89], AD was diagnosed using the most recent DL object detection techniques.Three different techniques, i.e., Faster R-CNN, SSD, and YOLOv3 were used, where no preprocessing of images was needed.A DL approach using H-FCN was investigated in [90], where discriminative local patches and regions of brain were identified from sMRI automatically.A multi-modal ensemble DL method was proposed in [91] to detect AD progression, where local and longitudinal features were extracted from each modality.Moreover, background knowledge was used to extract local features.All the extracted features were then fused together for regression and classification tasks.In [92], a multi-modal process was proposed, where automatic segmentation of hippocampal was performed for the classification of AD.A minimal RNN model to predict longitudinal AD dementia progression was proposed in [93] using 1677 participants.It was identified that the proposed model achieved better classification performance as compared to the baseline algorithms.
The application of SSA was investigated in [94] for classifying AD from a single 2D slice of sMRI.Neurodegeneration patterns were visualized and fused with disease information.Moreover, the regions of the disease were identified using a local patch based method.
An automatic prediction approach using unsupervised DL was proposed in [95], where unsupervised CNN was used for feature extraction and an unsupervised classifier was used to take the final decision for classifying patients with AD from MCI.A brain network classification problem for identifying AD utilizing two DL methods was studied in [96], where deep regional-connectivity and adjacent positional features were learned by convolutional and recurrent learning respectively.Finally, to improve the ability of learning, the ELM-boosted structure was implemented.For the assistance of patients suffering from AD, an internet-of-things based healthcare framework was suggested in [97].By analyzing data obtained from different sensors embedded in internet of health ecosystem, a RNN method was exploited to identify patients experiencing AD.Moreover, to track abnormal activities of AD patients, CNN based emotion identification and language processing using timestamp window methods were also investigated.Utilizing sagittal MRI, a DL approach for the automatic identification of AD was studied in [98] and a satisfactory performance was obtained as compared to the state-of-art method.A DL strategy to improve the diagnosis of AD from multi-modal inputs was proposed in [99].The model was trained using AD and NC subjects from ADNI and validated on three different databases such as AIBL, FHS, and NACC.The superiority of the 3D-CNN-SVM model as compared to the other reported classification models illustrated that the DL model has great potential for medical diagnostics [100].A multi-modal DL approach exploiting hybrid CNN and DBN was investigated in [101].From the experimental results, it was apparent that the hybrid method outperformed conventional methods like CNN, DNN, and SVM.A DL approach using CNN was studied in [102], where the OASIS dataset was used only for training and the MIRIAD dataset was used only for evaluating the model.The outcome of this paper suggested that it was more difficult to identify patients with MCI than AD.Brain sub regions were exploited in [103], to identify patients with AD.Among the various optimization algorithms reported here for the proper selection of features, it was revealed that Grey Wolf Optimization showed promising results.Motivated by Oxford Net, the Siamese CNN model was studied in [104] for multi-class classification of AD.The superiority of the proposed model as compared to the state-of-the art models was authenticated by obtaining an excellent classification accuracy of 99.05%.A multi-model DL approach using diffusion maps and GM volumes was studied in [105] to classify patients with AD and MCI from NC.The authors claimed that this was the first study, where the impact of more than one scan per subject was evaluated.A competitive performance result was also obtained as compared to the existing literature.

B. BRAIN TUMOR
Brain cancer is one of the life-threatening diseases at present and detecting the tumor at an early stage is very much important to save lives.Brain tumor is basically the abnormal growth of cells.There are two types of brain tumor: benign and malignant.Brain tumors are of different varieties based on appearance and it is hard to differentiate between tumor and normal brain tissues.For this, the extraction of tumor regions becomes very difficult.Manual detection systems were performed before by Radiologists.However, these manual systems may lead to errors which can be serious for the patients.The recent approaches based on ML and DL can perform automatic detection of brain tumors.Such attempts were made in the following works.Note the summary of the presented works for a quick overview is provided in Table 4.

1) ML-BASED APPROACHES IN BRAIN TUMOR DIAGNOSIS
In [106], an AutoML model was proposed to do three-way and binary classification of the main types of pediatric posterior fossa tumors based on routine MRI prior to an operation.Here, contrast-enhanced T1-weighted images, T2-weighted images, and ADC maps from histologically confirmed 111 MB, 70 EP, and 107 PA fossa tumor patients are utilized in order to extract radiomics features.The proposed TPOT performs better than manual expert pipeline optimization and qualitative expert MRI review.In [107], an automatic classification method to effectively delineate brain tumors at an earlier stage using MRI images from different databases was presented.The methodology was outlined as pre-processing via Median Filter, 3 × 3 block conversion of images, extraction of texture features using gray-Level Co-Occurrence Matrix, classification, and segmentation.Adaptive k-nearest neighbor (AKNN) classifier was adopted to identify usual and unusual images based on the extracted features.And the unusual ones were segmented by applying optimal probabilistic fuzzy C-means algorithm to detect affected parts of the brain.The authors of [108] have studied the significance of key differentially expressed genes to understand the different stages of glioma tumor (grade I to IV), the most fatal nervous system cancer, using a combination of ML algorithm and protein-protein interaction networks.A brain tumor localization pipeline based on fluid attenuated inversion recovery scans of MRIs (skull stripped) using ML algorithms is illustrated in [109].After noise removal, Gabor filter bank is used to create texton-map images and texture maps.Low level features are extracted through segmentation of the texton-map images into superpixels that are integrated with features at the region level approach.Finally, classification results are shown considering four different sets of data such as real high grade (HG), real low grade (LG), synthetic HG, and synthetic LG.The proposed methodology in [110] differentiates among brain tumors (tumor/non-tumor/benign/malignant) by feeding a fusion of features to the ML classifiers.Brain surface extraction method is adopted to remove non-brain portions VOLUME 9, 2021 TABLE 4. (Continued.)A comparative study on recent works to detect brain tumors using ML/DL approaches.like skull and eyes from images which then segmented with better accuracy via The best features are selected from extracted features by employing a genetic algorithm.The evaluation results of the proposed method for different datasets proved its superiority in performance compared to existing techniques.

2) DL-BASED APPROACHES IN BRAIN TUMOR DIAGNOSIS
In [111], a computer-aided detection model is proposed where brain tumor features are recognized from MRI with improved efficiency by CNN.Brain tumor MRIs are segmented and the convolution operation recognition rate with the fusion of PCA extracted and synthetically selected features.The performance analysis showed that the model has practical impacts in improving diagnostic results.An automatic brain detection approach using deep LSTM was proposed in [112], where the model was tested on SISS-ISLES 2015 database and six BRATS challenge dataset.The outcome of this paper suggests that a radiologist can classify brain tumors more precisely with the proposed method.A brain detection approach using 3D-CNN was proposed in [113], where the model was tested on BRATS 2015, 2017, and 2018 challenge datasets.From the experimental results, it was clear that the proposed model showed the highest classification accuracy on BRATS 2015 and a comparable accuracy with the existing methods.A DL method was proposed in [114], where multi-level features are extracted from different layers of two pre-trained DL models namely, Inception-v3 and DensNet201 and then concatenated prior to the categorization of the brain tumor by Softmax classifier.The proposed model is evaluated using a publicly available dataset comprising of 708 glioma, 1426 meningioma, and 903 pituitary tumors.The concatenation based DL model showed better performance than current DL and ML models of brain tumor categorization.A DL based hybrid architecture was proposed in [115] to categorize brain tumors using T1-weighted contrast-enhanced MR images of 708 meningioma, 1426 glioma, and 930 pituitary brain tumors from 233 subjects, where the properties of CNN and neural autoregressive distribution estimation were incorporated.The three core steps of the learning method were density estimation, feature exploitation, and classification.A performance comparison with other famous models pointed out that this hybrid model maintained a similar level of accuracy with a reduction in computation cost.The authors of [116] proposed an automatic multi-modal brain tumors categorization model based on DL with a robust feature selection technique.They outlined the proposed model with five main stages linear contrast enhancement, extraction of DL features with transfer learning from visual geometry groups-VGG16 and VGG19 (pre-trained CNN models), feature selection using correntropy-based joint learning with ELM, fusion of the selected features in one matrix via partial least square-based technique, and classification using ELM classifier.The performance studied on BRATS datasets showed stable accuracy.The proposed brain tumor recognition and classification (benign or malignant) system in [117] applied fuzzy-C-means algorithm with super-resolution for segmentation as well as CNN with ELM algorithm for classification.The system utilizes digital imaging and communications in medicine format MRIs and employs a pertained CNN architecture called SqueezeNet to perform feature extraction.This proposed system with super-resolution performed 10% better in terms of accuracy than without super-resolution.
In [118], a CNN based DL network is applied for the identification of brain tumors from 115 tumor and 98 nontumor MRI images.A CNN architecture called Resnet50 was employed as the base and 10 new layers were added instead of the last 5 layers of this Restnet50 network.The modified architecture gave a better accuracy rate compared to AlexNet, Inception-V3, DenseNet201, GoogLeNet, and ResNet50 in identifying brain tumors.A CNN model based on attention modules and hypercolumn technique called BrainMRNet is presented in [119] in order to detect brain tumor from heterogeneous MRIs of 155 tumor and 98 non-tumor samples.The attention modules and hypercolumn technique help in VOLUME 9, 2021 maintaining the best and most competent features from the significant areas of images till the last layer of the network architecture.The BrainMRNet outperforms pre-trained CNN models like GoogLeNet, AlexNet, and VGG-16 using the same sets of data.A DL based brain tumor detection approach was proposed in [120], where the seed growing method was used for segmentation.The model was tested on 6 different BRATS datasets.From the experimental results, it was clear that the proposed model showed the highest classification accuracy on BRATS 2012 and BRATS 2013 Leaderboard datasets.In [121], structural and textural features from multi-modal MR images (T1, T2, T1CE, FLAIR) were fused using discrete wavelet transform with Daubechies wavelet kernel and applied a 23 layered CNN to classify normal and brain tumor region.Noise reduction and segmentation were respectively done by employing a partial differential diffusion filter and a global thresholding method.The model was evaluated using different BRATS datasets and performed better due to the feature fusion.In [122], identification and classification of 708 meningioma, 930 pituitary, and 1426 glioma brain tumors were performed by applying three deep CNN architectures namely AlexNet, GoogleNet, and VGGNet.Further validation of these architectures was conducted with fine-tune and freeze transfer learning techniques.The fine-tune VGG architecture turned out to be the best in terms of accuracy.A deep CNN based brain tumor classification method was investigated in [123], where a Whale Harris Hawks optimization technique jointly derived from Whale optimization Algorithm and Harris Hawks optimization algorithm was proposed.The segmentation of MRIs was carried out by cellular automata and rough set theory.The proposed optimized classification method attained better performance as compared to other models in terms of accuracy, sensitivity, and specificity.In [124], 204 brain tumor MRIs of T1, T2, and FLAIR modality are categorized as normal and abnormal (tumor) using a DBN optimized with improved seagull optimization algorithm (ISOA).The core steps of this methodology are: segmentation of preprocessed images via Kapur thresholding method, extraction and then selection of optimal features by adopting the ISOA, and finally classification with DBN.Comparative analysis of performance with existing models proved the superiority of the proposed model.In [125], deep features acquired from CNN model VGG-19 through segmentation using grab cut method and handcrafted features like local binary pattern and histogram orientation gradient are optimized via entropy after concatenation.The optimized features are fused in one feature vector prior to being fed to different classifiers for glioma and healthy image detection.This methodology was individually evaluated on BRATS challenge databases.

C. EPILEPSY
Epilepsy is a disorder in brain functionality that causes convulsions in the whole body and sometimes loss of awareness.Usually, it has no serious symptoms and people of all ages are seen to suffer from it.It is the second most occurred neurological disease in humans after stroke and over 50 million people are suffering from it.So the importance of its automatic detection and prediction is immense in the field of biomedical signal processing.Note the summary of the presented works for a quick overview is provided in Table 5.

1) ML-BASED APPROACHES IN EPILEPSY DIAGNOSIS
The main goal of [126] was to discover the cognitive signatures of mTLE patients also with lateralization information.For this, SVM and XGBoost were used to classify the extracted features to either left or right.Two types of dataset such as ''reduced and working'' and ''original'' were used and it was observed that there were promising interactions between language and memory scores.It was discovered some cut off points that predict the disease with more accuracy.In [127], the authors proposed an automatic system to detect the epileptogenic region for epilepsy detection.ANFIS classifies the extracted features to either focal or non-focal with high accuracy.An extra step for knowing the severity added to classify the focal EEG signals further to either 'early' or 'advance' stages.[128] aimed at finding present and previous comorbid psychiatric conditions in epilepsy patients who are mostly teenagers and young adults.Here machine learning approaches figure out whether the patient is suicidal or not.The study was conducted for classifying mainly three groups that include no psychiatric disorders, non-suicidal psychiatric disorders, and participants with any degree of suicidality.In [129], unsupervised learning has been applied to distinguish epilepsy patients in a cluster form on the basis of unique psychosocial characteristics.This approach aims to cluster patients into three unique clusters: ''high psychosocial health'', ''intermediate'', and ''poor psychosocial health'' using K -means++.It is observed that intermediate clusters mainly form from seizure-related issues and poor cluster depends on social factors.Thus social support can help in optimizing the health of patients.[130] introduced a new scope in discriminating mTLE from NC with an increasing accuracy.
In [131], a comparative study of epilepsy detection was performed using different ML techniques.The observations suggested that the fine Gaussian SVM was most efficient.Classification of the EEG signals into focal and non-focal signals using soft computing methods was performed in [132].The whole process comprises three modules: transformation, feature computation, and feature classifications.Lastly, the adaptive neuro-fuzzy inference system classifies the extracted features.In [133], the laterality in cases of TLE was analyzed by using theoretical graph analysis and ML algorithms.In [134], a comparative study was done on epilepsy detection using various classifiers.Among the classifiers, RF showed the best results.In [135], it was observed that TLE remains even after the removal of medial temporal structures.It is discovered that extra-medial regions are capable of causing seizures.

2) DL-BASED APPROACHES IN EPILEPSY DIAGNOSIS
In [136], a deep C-LSTM model is proposed, where multi-class (epileptic seizure, brain tumor, eye statuses) classification is attained through automatic extraction of features from EEG datasets of three disease and two activities.The proposed deep C-LSTM outperforms DCNN and LSTM in terms of accuracy and noise robustness.Additionally, the deep C-LSTM has the ability to detect seizure from a short EEG signal portion (1 second).In [137], a novel imaging tool was presented, where DCNN tract classification method was used to analyze the pre-surgical condition of children with focal epilepsy.Reference [138] mainly distinguished non-epileptic paroxysmal events from epilepsy.In [139], a novel hybrid method using the adaptive Haar wavelet-based binary grasshopper optimization algorithm and DNN was proposed to detect epilepsy with high accuracy.A new automatic feature fusion CNN model for epilepsy detection based on dilated convolution kernel was proposed in [140].A DL method to detect interictal and preictal states of a patient was investigated in [141] to help in preventing epilepsy.A novel method of classification using an unsupervised FCM multiview clustering algorithm was proposed in [142] to make the system more efficient and robust than existing methods.A multi-channel DNN model was proposed in [143] to evaluate the performance of individual and combinations of multimodal MRI datasets to predict TLE accurately.A novel method to classify the states of epilepsy was proposed in [144], where frequency domain features and time scale features for multichannel EEG were combined.Processing of MEG data identifies epileptic zones using epileptic MEG spikes.Visual inspection of these spikes is time-consuming.Hence, [145] presented an automatic spike detection method employing the deep learning approach EMS-Net.EMS-Net was capable of identifying spikes from MEG raw data with high accuracy.

D. PARKINSON's DISEASE (PD)
Parkinson's disease (PD) is the second most common neurodegenerative disease after Alzheimer's.PD can be diagnosed early by monitoring several symptoms including bradykinesia (slowness of movement), rigidity (stiffness of muscles rendering a person unable to stretch muscles properly), tremor at rest (shaking of body parts especially hands when at rest), and voice impairment (losing control over speech) [96], [146].According to the category of symptoms, different ML approaches for detecting PD have been developed.Note the summary of the presented works for a quick overview is provided in Table 6.

1) ML-BASED APPROACHES IN PD DIAGNOSIS
A comparative study was performed in [146] considering four major symptoms of PD.Various ML algorithms were implemented on UCI repository datasets.Static spiral test using RF ML approach showed highest accuracy (99.79%) among various spiral Test (mainly used to detect Tremor) approach as well as other approaches.In [147], hand movement activity was used to detect PD at the 2nd and 3rd stages.3D Leap Motion sensor was used to capture the hand movement signals which was calculated based on speed, amplitude, and frequency.Different ML classifiers were trained using feature vectors separately and with various combinations of them.Among the classifiers, SVM showed the highest accuracy (98.4%) for combined features of all motor tasks.In [148], R-fMRI based ML framework was used to detect PD.Three frequency bins such as slow-5, slow-4 and conventional were analyzed.A two-sample t-test was used for feature selection and linear SVM was used to classify PD and NC patients.From the experimental results, it was identified that combined frequency scheme shows improved performance than the individual frequency scheme.To predict different stages of PD, both ML and DL based approaches were investigated in [149].Different types of ML methods such as LDA, SVM, DT, MLP, RF, AdaBoost + DT, AdaBoost + SVM, and deep CNN were used as classifiers.Among the different types of feature extraction methods, intensity summary statistics outperformed the others.Moreover, among the different classifiers, deep CNN with VGG16 gave the best result (test accuracy of 65.30%, training accuracy of 92.20%, and F1 score of 60.60%).An ML approach for the early detection of PD was proposed in [150], where voice was used as a modality.Three different classification methods such as classification and regression Tree (CART), SVM, ANN and two feature selection methods such as feature importance and RFE were investigated.From the experimental results, it was apparent that SVM with RFE obtained the highest accuracy.
In [151], a wearable sensor array was used to distinguish PD from progressive supranuclear palsy.The least absolute Shrinkageand selection operator was used as a feature selection method.Various combinations of sensor data were fed to the classifiers to distinguish between PD and progressive supranuclear palsy, where RF showed the best classification accuracy on combined tasks.A supervised ML based classification approach to differentiate different stages of PD (e.g., high, medium, and mild) from the NC based on the gait patters data collected from sensors was investigated in [152].Among the different state-of-the-art classifiers, DT achieved the best performance.Moreover, the proposed approach outperformed several other existing PD detection techniques.In [153], an imbalance of gut microbiota was used to classify patients with PD from NC. 846 metagenomic samples were analyzed, where 374 samples were taken from NC and 472 samples were collected from PD patients.Finally, the performance of the proposed scheme was shown by applying 3 different ML techniques.Among the ML classifiers, RF showed the best performance.The strategy proposed in [154] highlights that voice can be a key indicator for the early detection of PD.It was also suggested that the traditional ML approaches can different the patients with little or no symptoms from NC by exploiting voice features.In [155], a novel approach was proposed to describe structural changes related to the severity of hypokinetic dysarthria (HD) in  PD patients.FreeSurfer tool was used to extract features from collected sMRI data and SVM was used for the prediction of severity of HD.

2) DL-BASED APPROACHES IN PD DIAGNOSIS
Based on speech spectrogram acoustic features, authors of [156] designed and tested 3 different DL methodologies for detection of PD.The first method uses transfer learning, the most widely used DL technique on Fourier Transformed speech spectrogram to detect PD.The second method uses deep features extracted from the spectrogram and applied to ML classifiers.Finally, the third method uses hand crafted features, and it is safe to say that it was developed merely to test the competence of handcrafted features to the deep feature based ML detection method and transfer learning method.The second method outperformed the other 2 techniques.In [157], the authors use CNN to classify the motor state of PD patients detected by IMU sensor worn on the patient's wrist.Practical challenges of motor state monitoring in the free living environment were taken into account and tested using the proposed CNN model and compared with different ML classifiers.In a high temporal resolution, adept motor state detection was possible with the proposed CNN model.In [158], an innovative DL technique is proposed for early detection of PD based on premotor features.Three different DL models (e.g., DEEP1, DEEP2, and DEEP3) were trained based on feed-forward ANN with two hidden layers.Finally, a deep ensemble model was constructed from the three individual model.From the experimental results, it was identified that the proposed approach outperformed the conventional ML approaches.An attention enhanced DL framework was proposed in [159], where both left and right gait patterns were exploited to detect PD.Each gait pattern was considered separately and finally combined through a fully connected layer followed by Softmax classification.Comparing it to studies that considered left and right gait patterns as a whole, it was evident that separate gait patterns tend to be more informative in terms of detecting PD.
Authors of [160] designed a potent feature extraction pipeline incorporating adaptive grey wolf optimization algorithm and sparse autoencoder neural network.The designed feature extractor was employed in extracting candidate features from the vocal dataset.Based on the features, classification was performed with 6 different ML algorithms to detect PD.Among the classifiers, LDA obtained the best performance.In [161], a new DL model for tracking PD progression was developed by hybridizing clustering and DBN.The progression of the disease was being evaluated based on UPDRS.The results of the study showed how prediction accuracy of UPDRS increases as DL is aided by cluster analysis.A DL approach by combining CNN and RNN was investigated in [162] and a high prediction accuracy was obtained to classify PD from non-PD.A home-environment friendly IMU sensor based system for detecting freezing of gait was investigated in [163].A combination of CNN and RNN provided a significant increase in freezing of gait detection accuracy with low latency.In [164], a DL based approach was investigated on a real-world dataset, where PD progression detection was based on UPDRS.In [165], an efficient PD prediction model was devised by optimizing hyperparameter tuning in the deep learning prediction model.Based on grid search, the authors proposed a multi stage optimization.[166] attempted to detect anomalies in subcortical brain regions of newly detected PD patients through diffusion MRI.A semi supervised autoencoder was designed to reconstruct MR diffusion of a healthy person based on provided healthy dataset.The reconstruction errors for reconstructing a healthy MR diffusion were compared to the reconstruction errors from pathological data.This research came to a conclusion that anomaly was not very specific even in WM in T1 weighted images even though the overall water displacement and diffusion orientation in T1 weighted images were quite informative.
A CNN model for PD detection using screening of hand writing was investigated in [167].This model not only worked with spiral patterns as done in the state-of-the-art architecture but also worked with wave patterns.A fine-tuned VGG-19 was used to classify PD and control groups automatically.In [168], a DL based feature selection mechanism was proposed to pick out genetic features that correspond to imaging features.The proposed mechanism was tested through simulation and real data.The results were compared with sparse canonical correlation analysis.It was investigated that genetics and neuroimaging data were potentially related to PD.In [169], an autoencoder neural network was used to detect PD based on vocal features.AEN posed outstanding scores in terms of all performance metrics, leading to a conclusion that AEN is potential of detecting AD.In [170], the authors claimed that pooling based deep RNN on EEG signals to detect PD was investigated for the first time.The results of the study proved the model's compatibility in terms of PD detection.To accurately estimate dyskinesia severity in patients with PD, a DL based approach was investigated in [171].It was claimed that during the normal activities of daily living, the proposed method showed the highest performance in estimating dyskinesia scores.To identify patients with PD by analyzing spiral and wave sketches of patients, a DL based system was designed in [172].To diagnose PD, a CNN based technique was proposed in [173], where T2-MRI and clinical data were integrated for obtaining an improved classification accuracy.To differentiate mild impairment in PD from NC, an auto-encoder based DL model was proposed in [174] and finally, the performance was studied in terms of different performance matrices.In [175], DAT-SPECT 3D projection data was exploited to train CNN for identifying subjects suffering from PD and a high classification performance was obtained.To detect PD, a DL based approach was studied in [176], where 3D MRI was analyzed to realize intricate patterns of the brain's subcortical structures.A 1D CNN architecture was designed in [177] to detect PD accurately and predicting severity by processing the signals from foot sensors.

IV. FEATURE EXTRACTION TECHNIQUES USED IN BRAIN DISEASE DIAGNOSIS
There are different methods of extracting features for brain disease classification.Here, a brief overview of the most common feature extraction methods is demonstrated.

A. MULTISCALE GEOMETRIC ANALYSIS [178] 1) CONTOURLET TRANSFORM
Wavelets shows poor performance in directional analysis of 2-D images.The contourlet transform is a new 2-D extension of the wavelet transform.So, it has the main features of wavelets.Also, it offers a degree of directionality and anisotropy.The contourlet transform allows a different and flexible number of directions at each scale and achieves nearly critical sampling.

2) CURVELET TRANSFORM
Curvelet transform is another multiscale geometric analysis that overcomes the drawbacks of wavelets.The main advantage is that it has no loss of information in terms of retrieving frequency information from images.Curvelet Transform has been divided into two generations: first generation curvelet transform or continuous curvelet transform and second generation fast discrete curvelet transform.Fast discrete curvelet transform via wrapping is the newer one and it is more intuitive and faster.

B. WAVELET BASED FEATURE EXTRACTION [179] 1) DISCRETE WAVELET TRANSFORM
Wavelets transform the data into various frequency components.It analyzes all components separately with its scale matched resolution.Discrete wavelet transform uses a discrete set of wavelet scales and translations under some defined rules.The sampling is done on the dyadic sampling grid.As for neuroimages, the discrete wavelet transform is implemented to each dimension separately.

2) COMPLEX WAVELET TRANSFORM
Complex wavelet transform was introduced to overcome two main problems of typical wavelets i.e. lack of shift invariance and poor directional selectivity.This has been shown by achieving lower errors and pixel intensity to construct feature vectors.For 2D images, the complex wavelet transform produces six bandpass sub-images oriented at ±15 0 , ±45 0 , ±75 0 .

3) DUAL-TREE COMPLEX WAVELET TRANSFORM
The dual-tree complex wavelet transform introduces perfect reconstruction to shift invariance, good directional selectivity, limited redundancy, and efficient order-N computation.For having those features, dual-tree complex wavelet transform considers positive frequencies and rejects negative frequencies or vice-versa.Besides, the two trees give the real and imaginary parts of the complex coefficients.

4) EMPIRICAL WAVELET TRANSFORM
Empirical wavelet transform is basically a filter bank that is constructed around the detected Fourier supports from the signal spectrum information.Like other classic wavelets, the empirical wavelets also identical to dilated versions of a single mother wavelet in the temporal domain.The new feature is that corresponding dilation factors are not bound to a certain scheme.They are detected empirically.[180] It is a powerful analytical technique for neuroimaging data.It is a multivariate data-driven technique that can extract necessary biomarkers by exploring the links between voxels in local substructures of the brain.The independent component analysis algorithm separates the data into several components that are related to a task.

C. COMPONENT ANALYSIS 1) INDEPENDENT COMPONENT ANALYSIS
2) PRINCIPAL COMPONENT ANALYSIS [181] It summarizes structure in the covariate space.It transforms the neuroimage data into the low dimensional coordinate system which is grouped by several elements.These elements represent the whole neuroimage data and are known as principal components.

D. SPARSE INVERSE COVARIANCE ESTIMATION [182]
Sparse inverse covariance estimation is a method for functional connectivity modeling.It is an effective tool to analyze the structure of the inverse covariance matrix of the data.It can be used to identify the existence and non-existence of functional connections between brain regions.[183] This method is used to select ROI of the brain images.A single Gaussian represents an ROI with a certain center, shape, and weight.The processed ROIs of the neuroimage are the extracted features.[184] This method is developed to analyze non-negative data and extract their physically meaningful temporal components.Image pixel, gene expressions, power spectra etc. are known as non-negative data.Though this method is linked to independent component analysis but the components are not independent.

F. NON-NEGATIVE MATRIX FACTORIZATION
G. SHEARLET TRANSFORM [185] Wavelet transforms do not provide directional information and is not effective in extracting different types of texture features.ST provides an effective approach for merging multiscale and geometry analysis of a neuroimage data.
It shows high accuracy in detecting directional features such as distributions of curves, edges and points in images.

H. PEARSON CORRELATION ANALYSIS [186]
Pearson correlation analysis is an interactive feature extraction algorithm.It is used on the volume for each structure and measured to analyze the relationship between operators of ROIs.In this method, the brain is divided into three-dimensional regions and volumetric measurements are made accurately.
I. K-SKIP-N-GRAM [45], [187] K-skip-n-gram method is used to extract the correlation details of both adjacent and non-adjacent residues.The sequence information of protein peptides can be extracted by it.Each sequence is transformed into a feature vector.

V. DISCUSSIONS AND FUTURE RESEARCH DIRECTIONS A. RESEARCH FINDINGS
In Fig. 6, an overview of the number of AD, brain tumor, epilepsy, and PD articles is presented in two groups depending on whether the article is based on ML or DL.More AD articles were reviewed than other brain diseases for both of the ML (35) and DL (40) groups.In addition to that, comparatively higher number of researches related to DL are reviewed.Fig. 7 shows that AD has got the highest amount of attention of the researchers among the four brain diseases as maximum number of articles are on AD (75).After AD, PD (32) is with the highest number of articles while brain tumor (20) as well as epilepsy (20) have same number of researches.
The bar charts of Figs. 8 and 9 illustrate the number of researches based on image modality and source of data for AD detection, respectively.It is noted that MRI (33) is the most preferred type of image and ADNI (40) database is used as a source of data more than any other sources in AD detection articles.A range of image modality and database is observed in a few articles though.In Fig. 10, we observed    that almost all of the brain tumor detection articles utilize MR images (18).A number of databases are seen to be used in Fig. 11, where BraTs ( 6) is adopted comparatively in more brain tumor detection researches.In the bar chart  of Fig. 12, we noticed that EEG data (10) is used in the highest number of articles for epilepsy detection.Moreover, a range of different databases termed as others databases (11) are the most utilized source of data for epilepsy detection researches according to Fig. 13.According to the bar chart in Fig. 14, the three highest number of articles are based on sensors data (7), speech (6) and MRI (5) for PD detection, respectively.Similar to epilepsy detection a range of different databases (11) are used in the highest number of PD detection researches.Apart from that UCI ML repository (7) and PPMI (7) databases are also adopted in a considerable amount of articles.

B. OPEN ISSUES AND FUTURE DIRECTIONS
From the contemporary studies presented in this paper, it is clear that ML and DL methods are getting increasing attention from the researchers because of their potentials to  significantly contribute to brain disease detection.Nonetheless, in order to transform the computational intelligence with the aim of full-scale deployment for clinical practice, ML/DL-based brain disease diagnostic approaches must deal with a number of major issues as described below.

1) EXPLAINABLE DIAGNOSIS AND CLINICAL PRACTICE
they are potentials in terms of brain disease diagnosis and predictions, ML and DL methods suffer from opacity, it is difficult to get straightforward insights into their internal mechanisms of work [188].This issue of opacity comes with a set of problems, because entrusting key decisions to a brain disease detection system that is not good to clarify itself convey apparent dangers.Recently, Explainable AI (XAI) emerges as an oracle to make the AI-based systems more transparent.The primary goal of the XAI paradigm is to introduce a set of methods that delivers more explainable models while retaining high performance levels.Finding appropriate XAI approaches [189]- [191] in the context of brain disease diagnosis will eventually be helpful to achieve the verified predictions, improved models, and new insights that lead towards more trustworthy brain disease detection systems.Explainable diagnosis will be the ultimate basis for reliable and trustworthy communications between medical experts and AI experts, which is highly important to transform the ML/DL-based brain disorder detection potentials into clinical practice.

2) QUALITY OF TRAINING AND DATA AVAILABILITY
The disease diagnostic performances of ML and DL algorithms largely rely on the accessibility of high-quality training models.Moreover, the problem of annotated data scarcity is the most critical issue in AI-based medical diagnosis.Annotation of medical data is time consuming, tiresome, and costly as it requires significant engagements of experts.Various techniques such as information augmentation and picture synthesis can be used to produce additional annotated data [192], [193].However, understanding and applications of these methods are yet to be formulated for AI-based medical diagnostics.Moreover, the methods need to be further tailored to fit the brain disease diagnosis.

3) INTEROPERABILITY AND COLLABORATION
In the context of brain disease detection, there are possibly many ways that vendors can build their AI-based hardware and software solutions.Rules, regulations, and interfaces adopted by a certain manufacturer might not be compatible with another manufacturer of a product with the same functionalities.This introduces interoperability issues.Multidimensional collaboration among health providers, manufacturers, and AI scientists is undoubtedly essential to setup this beneficial solution for enhancing the quality of brain disease treatments.This collaboration will even resolve the medial data scarcity to the AI researcher [194].In this regard, the world's leading health organizations such as world brain alliance, world health professions alliance, and world health organization can work together with the AI group run by the international organization.

4) SECURITY AND PRIVACY
ML and DL techniques are typically application-specific where a model trained for detecting one kind of brain disorder might not work well for another brain disorder.To avoid the wrong diagnosis, the underlying DL/ML algorithms need to be separately retrained with respective brain data for each disease class.Also, unfitting selections of hyper parameters, by even a small change, can trigger a large change in model's performance resulting [195] in bad diagnosis, which will eventually jeopardize patients' lives.More comprehensions are therefore extremely important for AI systems to be optimized for particular brain disorder detection.Apart from security, data privacy needs to be addressed jointly from both sociological and technical perspectives [196].Particularly, patients in general and brain disorder patients should have legal rights over their personal information protection.The exponential rise in medical data comes with a big challenge that how to anonymize the patient information [197].Efforts are required to design appropriate algorithms for anonymizing sensitive information associated with brain data.

5) RESOURCE EFFICIENT METHODS
ML and DL applications often come with hardware limitations.The issue becomes more severe when the computation processing works on medical data because of the constraint of lossless data preservation.Eventually, increased processing power requires more memory and computation resources.Image pre-processing is a major concern in ML and DL.It's important to preprocess images properly to obtain accurate results.But preprocessing is both time consuming and requires huge space.Interestingly, it is possible to predict AD with high accuracy without the use of pre-processing methods by using object detection techniques [89].So, one can perhaps focus on this sort of methods in future to reduce the associated overhead and cost.The volume of data used for brain disease detection is usually very high, the data sources are heterogeneous in nature, and is the data often originated from real-time sensors [198].Due to the diverse data characteristics, associated data processing platforms experience critical challenges to effectively process and maintain the generated data.It is also extremely important for medical applications to determine data dependency.For example, some data sections may be in need of various critical factors such as time and location.Upon the correct identification of such dependencies at the data processing layer, associated medical staff or software agents can rapidly respond to the situation.Although efforts are visible to offer various data processing methods and platforms suitable for big data management and extracting meaningful information [199], further researches are required to investigate whether these existing techniques are necessarily resource efficient in the context of ML/DL-based brain disorder identification.

6) EMERGING CONCEPTS
In the field of AI and XAI, whereas the word ''confidence'' typically indicates that the model of interest provides its results with small variances, the word ''trust'' implies that the associated model offers interpretable and explainable results.The quantification of trust for DL approaches has recently been discussed [200].Taking this quantification process into account to design brain disorder diagnosis would be an insightful investigation.Various network science approaches have been used to analyze the brain activities for AD patients to extract interconnectivity patterns of brain regions based on neuroimaging techniques [201], [202].These network science approaches can be integrated with advanced XAI and ML/DL techniques to have improved solutions for brain disease treatments.In this context, the role of data fusion of time series data with different modalities might be examined using different ML and DL algorithms.Generative adversarial network-based image processing techniques are also potentials to offer enhanced brain disease detection capability by reducing the data scarcity problem [203], [204].

VI. CONCLUDING REMARKS
In this paper, we have presented a survey on the four most dangerous brain disease detection processes machine and deep learning.The survey reveals some important insights into contemporary ML/DL techniques in the medical field used in today's brain disorder research.With the passage of time, identification, feature extraction, and classification methods are becoming more challenging in the field of ML and DL.Researchers across the globe are working hard to improve these processes by exploring different possible ways.One of the most important factors is to improve classification accuracy.For this, the number of training data needs to be increased because the more the data is involved, the more accurate the results will be.The use of hybrid algorithms and a combination of supervised with unsupervised and ML with DL methods are promising to provide better results.Even, various fine tunings can sometimes offer promising improvements.For example, in [83], 3D-CNN is used first to extract primary features, and next, instead of the general FC layer, the FSBi-LSTM is used.This slight change in a part of the system eventually resulted in superior performances.Based on the discussion on different types of brain disease data sources and feature extractions methods, it is apparent that the accuracies differ based on different classifiers used and feature extraction processes applied in the systems.To uncover the limitations of existing ML/DL-based approaches to detect various brain diseases, the paper provides a discussion focusing on a set of open research issues.To design effective AI systems for medical applications, the inclusion of XAI approaches is the ultimate necessity.This will help medical professionals to build their confidence and AI-based solutions will be transformed into clinical practice in the treatment of patients with brain disorders.We came to know that quality of training data and interoperability are also major concerns to develop ML and DL-based solutions.It is yet to be determined whether we will be able to have sufficient training data without compromising the performances of DL/ML algorithms.To make ML/DL-based solutions more practical, various other issues such as resource efficiency, large-scale medical data management, and security and privacy should be addressed well.This survey is expected to be useful for researchers working in the area of AI and medical applications in general and ML/DL-based brain disease detection in particular.

FIGURE 2 .
FIGURE 2. Difference between ML and DL.

FIGURE 3 .
FIGURE 3. Classifications of ML and DL techniques to detect brain diseases.

FIGURE 5 .
FIGURE 5. Classifications of feature extraction techniques used in brain diseases detection process.

FIGURE 6 .
FIGURE 6. Article distributions with respect to ML/ DL.

FIGURE 7 .
FIGURE 7. Article distributions with respect to different diseases.

FIGURE 8 .
FIGURE 8. Different types of image modality and other data used in the different articles to detect AD.

FIGURE 9 .
FIGURE 9. Different types of database used in the different articles to detect AD.

FIGURE 10 .
FIGURE 10.Different types of modality and other data used in the articles to detect brain tumors.

FIGURE 11 .
FIGURE 11.Different types of used in the different articles to detect brain tumors.

FIGURE 12 .
FIGURE 12. Different types of image modality and other data used in the different articles to detect epilepsy.

FIGURE 13 .
FIGURE 13.Different types of database used in the different articles to detect epilepsy.

FIGURE 14 .FIGURE 15 .
FIGURE 14. Different types of image modality and other data used in the different articles to detect PD.

TABLE 1 .
List of frequently used abbreviations.
The dataset is an open-source data set of MRI images that can be used by anyone.Initially, it consisted of 416 patients' data, all of them being right-handed and aged 18 to 96 years.Both male and female patients were present.One hundred of them aged above 60 were diagnosed with very mild to moderate AD.For each MRI, three to four T1 weighted scans with high contrast to noise ratio.Here, the total volume of the brain and the estimation of the intracranial volume used for analyzing normal aging and Alzheimer's disease.The data set also provides data on 20 dementia patients.
3) ADNI-2 ADNI-2 was established in 2011 and lasted for five years.It aimed to find biomarkers to predict and analyze Cognitive impairment.Along with existing ADNI-1 and ADNI-GO, it included 150 elderly controls, 100 early MCI, 150 late MCI, and 150 AD subjects.SMC was added as a new cohort in ADNI2 to precisely identify the difference between healthy controls and MCI; They added 107 SMC participants.Moreover, a vital contribution of ADNI2 was the incorporation of amyloid PET with Florbetapir at all ADNI2 sites and on all ADNI2 and ADNI GO participants' data.D. AIBL [11]AIBL is a cohort of Neurodegenerative diseases like AD, MCI, SMC, or SCD.The dataset comprised of the info of over 2000 individuals.Different questionnaires and clinical processes were used to collect data.All information is being collected over a protracted amount of ten years and enriched with 142 AD, 220 with MCI, and 582 normal patients information.Moreover, the baseline cohort enclosed the info of 211 with AD, 133 with MCI, and 786 healthy individuals.Alternative data includes age gender recruitment periods.
Moreover, there is ongoing research on the risk factors of dementia.Also, the relationships between physical characteristics and genetic patterns within the region unit are studied.
It followed CVD development over an extended amount of time in 3 generations of participants.It began in 1948 with an original Cohort of 5,209 men and women between the ages of thirty and sixty-two who had not nonetheless developed open symptoms of a disorder or suffered a heart attack or stroke.Later, it associated Offspring Cohort (1971), the Omni Cohort (1994), a 3rd Generation Cohort (2002), a new Offspring Spouse Cohort (2003), and a Second Generation Omni Cohort (2003).It successfully identified vital CVD risk factors and their effects such as blood pressure, blood triglyceride and cholesterol levels, age, gender, and psychosocial problems.
PPMI presents datasets containing advanced imaging, biological and clinical data to estimate the progression of PD.These data help to discover progression biomarkers of the disease.Its aim is to form a repository of clinical data and biospecimens to help the scientific community in biomarkers identification research.Biospecimens contain urine, plasma, serum, CSF, DNA, and RNA data of patients.It is playing a great role in the research of PD and is currently available in different clinical sites in the United States, Europe, Israel, and Australia.

TABLE 2 .
(Continued.)A comparative study on recent works to detect AD using ML approaches.

TABLE 3 .
A comparative study on recent works to detect AD using DL approaches.

TABLE 3 .
(Continued.)A comparative study on recent works to detect AD using DL approaches.

TABLE 6 .
A comparative study on recent works to detect PD using ML/DL approaches.

TABLE 6 .
(Continued.)A comparative study on recent works to detect PD using ML/DL approaches.