Recent Advances in Computer-Aided Medical Diagnosis Using Machine Learning Algorithms With Optimization Techniques

Artificial intelligence is a spectacular part of computer engineering that has earned a compelling diversion in the field of medical data classification due to its state-of-art algorithmic strength and learning capabilities. Machine Learning is a major sub-domain of artificial intelligence, where it has become one of the most promising fields in computer science. In recent years, there is a large spectrum of healthcare and biomedical data that has been growing intensely. Due to the huge labeled or unlabeled data, it is important to have a compact and robust machine learning solution for classification. Several optimizers have been deployed to improve the inclusive performance of machine learning models. The classification of machine learning models depends on several factors. This comprehensive review paper aims to insight into the current stage of optimized machine learning success on medical data classification. An increasing number of unstructured medical data has been utilizing in machine learning algorithms to predict intuitions. But it is difficult to inherent immense intuition from those data. So machine learning researchers have utilized state-of-art optimizers and novel feature selection techniques to overcome and emend the performance accuracy. We have highlighted some recent literature, which exhibits the robust impact of optimizers and feature selection on machine learning techniques on medical data characterization. On the other hand, a clean-cut introduction on machine learning and theoretical outlook of widely utilized optimization techniques like genetic algorithm, gray wolf optimization, and particle swarm optimization are discussed for initial understanding of the optimization techniques.


I. INTRODUCTION
Machine learning (ML) focuses on the receiving and self-learning policy by manipulating the algorithms. Machine learning is also very impressive on managing the large spectrum of data, whether it's labeled or unlabeled. It has very immersive applications in medical data classification in large-scale optimization [1]. It is responsible of taking information and intuition from the training dataset [2]. These applications have been utilized with expanding accomplishment to foresee persistent forecasts in numerous different zones of medication, for example, hospital re-admission prediction, breast cancer prediction, diabetic difficulties, cardiovascular mortality, and so on.
The associate editor coordinating the review of this manuscript and approving it for publication was Juan Liu . In machine learning algorithms, optimizations are needed to reduce the computation and extracting important features from the dataset. It can impact the overall classification performance of machine learning techniques. However, optimization techniques can play a vital role in constructing machine learning algorithms for large-amount of data. Because of their broad application and appealing theoretical features, optimization methods have risen in popularity in machine learning. Because of the rising complexity of machine learning models, current assumptions must be analyzed. That's why optimization in machine learning is extremely useful.
Medical diagnosis is a challenging task in machine learning domain. A large amount of medical data is accessible to utilize for machine learning models. Medical data classification is based on the human medical expert perspective. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ But there are some errors, unwanted biasness can be occurred while classifying the data. So robust intelligent solution can decrease address this problem into a more compact way [3]. In medical research field, a scoring prediction system is utilized to predict the disease risk of the patients [4]- [10].
In this review article, we have considered four diseases to review their medical data classification utilizing optimizers prior to machine learning models. The four diseases are breast cancer, heart disease, diabetes and parkinson disease. The main reason behind taking these four diseases because machine learning techniques are widely and mostly used in diagnosis of these four disease comparing to other diseases like kidney, lung and others. A brief introductions of these diseases are presented.
Breast cancer, which is the 2nd leading death reasons among all cancers in women. Multifarious intellect reasons are involved behind the causes of appearing breast cancer [11]. Breast cancer is mainly seen in 40 years old ladies [12]. A timely robust screening process in needed to diminish the mortality rate of breast cancer [13]. Precautionary treatment can easily increase the chances of getting rid of breast cancer [14]. However, breast cancer can be diagnosed by mammogram and structural symptoms. Different intelligent medical diagnosis systems are developed to avoid the human errors in the screening process [15]. In recent years, machine learning and deep learning have stimulated an enormous effort in medical classification systems.
Heart disease considered as the most deadliest diseases among all countries. It is drastically increasing in a undefined order globally [23]. Heart disease is also named as Cardiovascular disease. Which caused 17 million deaths globally, according to WHO. Multi health attributes are considered such as blood pressure, cholesterol, auscultation to determine cardiovascular condition. Regular health monitoring is required to prevent any unwanted circumstances [24]. Most common initial symptoms are deliberated such as fatigue and chest pain for an early assumption to cardiovascular disorder [25]. Lifestyle elements can dominate the risk of a patient. According to the investigation process of suspicious cardiovascular disorder, physicians have required some medical tests such as blood pressure, previous heart disorder history in the family or the patient itself, chest X-ray, ECG report etc [26]. Data mining techniques have been stimulating a tremendous success on heart disease prediction in a proficient manner. In recent years, due to accessible medical heart disease data, several clinical decision systems are developed to analyze individual medical condition for the patients. It helps profoundly to the patients as well as the physicians in healthcare domain [27].
Diabetes, a common disease globally. It increases the blood sugar level from the expected norm [35]. Sugar is considered as the main source of energy in human body, when it comes to uncontrolled sugar production in the body if creates diabetes [36]. Rising number of diabetes patients in lower-middle income countries are very concerning [37]. Several severe medical conditions can occur due to uncontrolled diabetes such as blood pressure, organ failure, heart disease, kidney failure and so many other conditions. Which leads to high risk of death. Type -2 diabetes leads to high risk and effects the insulin reproduction in the human body [38]. Diabetes can regulate by the certain lifestyle changes, desired drugs and In next few years, diabetes will cause 25% expanded death rate globally. Utilizing the scoring method can segregate and differentiate the low-high risk patients [39]. However, frequent diagnosis is essential to determine patient's health condition about diabetes. Data mining techniques are widely utilized to thrive the clinical support system for the patients.
Parkinson is one of the incurable diseases in the world and many people suffer from it, resulting in memory loss and other symptoms. It is a progressive neuro-degenerative motor system disorder. For this reason, early detection is much needed so that precautions can be taken at the early stage. Around 90% of parkinson patients have examined with dysphonia, recent studies said. Vocal valuation is needed to determine parkinson disease [46]. It causes for crumbling of movement such as, unsteadiness, discourse, for example, trouble articulating sounds, and decreased volume and pitch range, and on psychological perception of human [47]. A tremendous success can be achieved if the patient gets proper meditation and treatment [48]. In past decades, machine learning researchers are trying to build a user friendly and intelligent PD detection support system. Several state-of-art techniques are applied to solve this problem [51].
Data mining techniques have attained a massive success on medical data classification. But the desired datasets are not primarily congruent for the data mining techniques to build a robust solution. Due to an inconsistent and small dataset, it is not easy to gain quality features from the dataset. Regardless to this, we need adaptable machine learning techniques with suitable and quality features to build decision support systems. Different optimizers are enforced to get key features. Machine learning models with optimizers are performing very commendable comparing the results without optimizers [52].
In this review paper, we have considered to review some recent literature of medical data classification using optimizers. This review will give intuitions of current scenario of the impact of optimizers in machine learning performance in medical data classification. Key contributions of this review paper are -• A holistic overview of recent works on optimization techniques used in machine learning for medical data classification is demonstrated.
• Recent development of machine learning on medical data classification is provided.
• Most extensive available datasets for breast cancer, heart disease, diabetes and parkinson disease are discussed while reviewing literature.
• Limitations and future scopes of optimization techniques are discussed in this review paper. The following paper is oriented as, section II describes the overall background on machine learning and optimization techniques, section III has the taxonomy of this article, section IV summarizes the recent literature on disease prediction utilizing machine learning and optimizer techniques. And lastly section V and VI represent the future direction, discussion and conclusion.

II. BACKGROUND ON MACHINE LEARNING
Machine learning has the capability to learn from the data. We train a large set of data to machine learning models. Where it finds the patterns from the data [45]. In recent years, there are a large spectrum of applications of machine learning techniques. In this section a brief introduction of machine learning classification is presented. Figure 2 shows an overview of machine learning classification.

A. SUPERVISED LEARNING
Supervised learning is a dominant learning process, where the output builds upon the labeled input [53]. In ideal cases, the supervised algorithms find a specific pattern from the labeled input data. This learning process has computed as to minimize the error by altering the error for a productive result. It mostly concedes the best outcome while the predicted features are known. Labeled breast cancer, heart disease, diabetes prediction are the ideal example of supervised learning, where the prediction class is known [54].

B. UNSUPERVISED LEARNING
Unsupervised learning relied on the unlabeled data as the input. It urges to predict the output utilizing the unlabeled input data. It is also considered as adaptive learning by its extreme learning process [55]. Clustering technique is considered as the recurrent method of unsupervised learning. Where it barely relied on the probabilistic clustering technique. Although, unsupervised learning is mostly used in unlabeled segmentation [56].

C. REINFORCEMENT LEARNING
Reinforcement learning is a learning process where an agent take multidimensional decisions for picking the right action [57]. It is marginally varied from the supervised learning. Reward and penalty are the major stage to take action for an agent in an environment. There are four stages of a reinforcement environment. These stages are action-rule, performance component, reinforcement component and discovery component. Q-learning is the widely applicable reinforcement learning variant [58].

D. RECOMMENDER SYSTEMS
Recommender systems can be characterized as a learning procedures by righteousness of which online client can alter their locales to meet client's preference [59]. Apparently, it is an intelligent computer based system that learns from the client's searching and preference. Youtube video recommendation, facebook friend suggestions are the ideal example of recommender systems. Currently, it has a large scale applications in online marketing, social sites etc [60].

III. BACKGROUND ON OPTIMIZATION TECHNIQUES
A. GENETIC ALGORITHM Genetic Algorithm (GA) is one of the state-of-the-art algorithms that utilize both combinatorial and numerical optimizations and was developed by John Holland in the 1970's. They have numerous applications in the research fields such as management, engineering, identifying several medical diseases and etc. The prime three operators of GA are selection, crossover and mutation [61].
The two operators crossover and mutation utilizes the search space to figure out the solutions and each solution is displayed as standalone chromosome with numerous alleles embedded with material of genetics that evaluates the object function's value of fitness. In case of crossover, it generates two kinds of chromosomes that is known as offspring and comes from two parents. To build heterogeneousness in the search space of the population pool a known probability is given on the offspring [62]. Also, it can be seen that the population, residing in individual generation, is calculated utilizing the function known as fitness function. In the next step the selection process is carried out where it can be seen that the low fitness chromosomes are gotten rid of by the high fitness chromosomes. Some of the well-known processes for VOLUME 9, 2021 selection or generation are Roulette-wheel selection, Boltzmann selection, Tournament selection, Rank selection and Steady-state selection. Although, it can be seen that selection alone cannot generate new diversifications in the population pool so to eliminate this crossover and mutation methods are introduced. In crossover there are numerous types such as are one point crossover, two-point crossover, uniform crossover, multi-point crossover and average crossover. Mutation can be defined as the unique change of the point assigned to the certain gene. Furthermore, mutation can be utilized to eliminate premature convergence in the local optima. Mutation is established in several processes which are creep mutations, random gene mutations, heuristic mutation and etc. The whole process is repeated until the desired termination criterion is met. In this way the latest generation is created from the previous generation [63].
In [64] the authors have suggested that GA deals with variables that are string-coded rather than normal ones because in string-coded variables the coding for the search space is distinctive and advantageous. In [65] the authors wanted to propose a method where the controlling parameters are generated into the transformation in such a way where the proven fitter stand alone do not untimely dominate the given population. A numerical approach to elaborate the accomplishment of the several techniques. Here is an example of the proposed method: Side constraints for the design variables are 0.1 ≤ X i ≤ 5, i = [1,10]. From here it can be seen that the following equation surge a ten-variable function.

B. GREY WOLF OPTIMIZATION
Grey Wolf Optimization (GWO) is a type of meta-heuristic algorithm used for classification and prediction of data. Recently most medical applications involve GWO optimizer for prediction of results with maximized accuracy [66]. By observing how grey wolves conduct their hunting and how their leadership chain is followed authors [67] proposed a mathematical model that imitates the behavior of grey wolves. Based on the hunting behavior and leadership hierarchy there are four types of wolves namely alpha wolves (α), beta wolves (β), delta wolves (δ) and omega wolves (ω). The mathematical model of the optimizer is also created following this hierarchy.
Alpha wolves are on the first level leading the hierarchy guides the hunting process and the other pack of wolves follow their guidance. The main task of this level of wolves is decision making [37].
Beta wolves which acts as advisors of the first level wolves help them in decision making. Alpha control is transmitted by beta wolves to the whole packet and the responses are forwarded back to the alpha [68].
Delta wolves also known as scouts monitors and protects the pack. The wolves in this pack don't belong to any other category. They look after the territory and alerts the pack if there is any danger [69]. Omega wolves are at the lowest category and they follow the superior wolves.
The mathematical model in GWO illustrates the behavior of the grey wolves. Following mathematical equations representing the hunting behavior are Here, t refers to current run. The coefficient vectors are represented by A and C. X p indicates the position vector of prey and X indicates position of the grey wolf. r 1 , r 2 represents random vectors and the component a linearly decreases during the course of iteration and a denotes as distance controlling parameter and D denotes as |CX p (t)X (t)|.

C. PARTICLE SWARM OPTIMIZATION
Particle swarm optimization (PSO) deals with the complex and complicated problems in a way that establish a connection between the environment and respective simple agents [70]. At the year of 1995 two great men were involved in the invention if PSO. Among these two men one was Russel Eberhant, electrical engineer and the other one is a sociopsychologist, James Kennedy. These two man established a metaheuristic optimization which were lead by the particle swarm. The whole process is based on movement of each particle and the iterations of them. The nearest one to the optimum let know the others about its place, so that trajectory can be updated [71].
There has been many research done on particle swarm optimization. Framework of combinational optimization can be called the most effective one among them [72]. Different objective and particles made by research space need to be defined to apply PSO. The main purpose of this technique is to find out the optimum by moving those particles, Each particles contain their own position coordinates. While iterations occurs, there comes a speed which allows to move the particle. So the particle change their positions and it gets updated to the nearest one which is considered as its best position. So the evolving process helps it to find the optimum one [73].
When a set of particles deals with the best criterion particle, each of them gets to know their position which were also best visited. They are also aware of the must used value of criterion and coordinates. After optimal scheduling, the value was given to the function objective because each of them needed to be compared based on the optical and criterion values of the particle.
Here goes the objective function for PSO which has been used in different kind of research methods also [74].
Here, n represents the amount of samples. The sample mean for x and y was represented by Each one of a population in the PSO algorithm is called particle. While considering a standard PSO every particle modify its position and speed in every iteration depending on their experience. Which is denoted as pbest. For all particles best experience it is noted by gbest. They are written in equations below. The cost function which has been defined previously evaluates the performance result of all particles.
Here, i = 1,2,. . . . N. By N we get the amount that how many population are there. The speed vector is represented by v i [t] at [t]th iterations. p i [t] of the ith particle is the present position. For the best position previously of the ith particle it is represented as p i best [t]. Again, for the whole particle best position it is represented with p g best [t]. 'w' was utilized to control the global and local search pressure. Here, c 1 , c 2 are called the cognitive parameters and social parameters. And r 1 , r 2 are used as any random numbers between 0 and 1.

IV. TAXONOMY
In last few years, researchers have tried to address the healthcare problem by using machine learning models. Several machine learning models are extensively utilized for detection some common medical conditions. But due to explicit large amount of medical data, machine learning models are not performing well in some cases. But researchers are trying to improve the overall performance of the machine learning models by utilizing optimizers. In this review paper, we have tried to address some recent advances of machine learning optimizers. Taxonomy of this paper is as following.
• Formal introduction of the diseases. • Recent works on machine learning with optimizers. • Performance analysis and future direction of the review.

V. LITERATURE ON MEDICAL DATA CLASSIFICATION
In this comprehensive review article, we have contemplated four extensive diseases for conducting the evaluation and insight to the machine learning algorithms with optimization techniques. In this section, some recent literatures are presented.

A. BREAST CANCER
Tabrizchi et al. [11] proposed in their paper that for early stage detection of breast cancer they categorized the cancer into two groups of benign and malignant and suggesting a novel collective learning technique utilizing Multi-Verse Optimizer (MVO) and Gradient Boosting Decision Tree (GBDT). To enhance the feature selection they suggested a new effective method based on MVO and GBDT parameters. The two data sets that were utilized in their research are of Wisconsin Diagnostic Breast Cancer and Wisconsin Breast Cancer. Gradient Boosting Decision Tree is a strong collective method that was first invented by Friedman and it's a state of the art technique because it converts the weak classifier into robust ones by mixing them together. They found out that GBDT is quite different from other known techniques since it utilizes function space for the diligence of optimization. Moreover, this technique is more elastic, scalable and strong against the complications of non-linear problems when compared to other linear models for example linear regression. An optimizer was first suggested having a metaheuristic algorithm and is utilized in many researches nowadays to evaluate different complications in various applications. It can be seen that compared to other algorithms, MVO possessed a robust capability for the optimization with lesser control variables. Finally, they have concluded that with the help of MVO and GBDT they have developed an efficient and effective strong classifier for the optimization of the breast cancer datasets with high precision.
Punitha et al. [12] proposed an enhanced optimization framework for deep learning algorithms to characterize breast cancer in an efficient way. They named it as IABC-EMBOT. Bee colony and monarchy butterfly optimization algorithms have been combined and specifically designed for breast cancer recognition. They have executed IABC-EMBOT utilizing the MATLAB with the addition of neural network toolbox and required techniques of back propagation that were worked out during the analysis. In their approach for optimization of datasets for the detection of breast cancer cells they have utilized MLP network and consequently for the improved collection the MLP network is trained with IABC-EMBOT approach. The MLP network is divided into three layers each contributing to achieve higher precision and sensitivity of datasets. They have used winner-take-all approach, to enhance and proficient the datasets of cancer cells, during the method of execution of IABC-EMBOT. The datasets were obtained from Wisconsin Breast Cancer Database (WBCD) which consists of two classes. To achieve extraordinary precision of optimization, they have utilized ABC and IMBO algorithms. Finally, they have concluded that with an integrated ABC-BFA (Bacterial Foraging Algorithm) the technique of intelligent breast cancer detection can be achieved with higher accuracy of sensitivity and speed.
Wuniri et al. [13] proposed that feature selection is an important part to build a breast cancer classifier for the preventive diagnosis. They have suggested a method known as wrapper method that is an incorporated framework in which the feature selection datasets regarding breast cancer are implanted with the Bayesian classifiers. For handling the discrete and continuous features they have proposed two approach and they are as following (i) A naïve approach regarding the discrete features (ii) For the continuous features it is kernel probability density estimation. These in turn helps to direct to feature-type-aware hybrid in Bayesian classifiers. In their research they have utilized genetic algorithm (GA) to acquire a near optimal subset where it can be seen that Area Under the Curve (AUC) metrics and it's equivalent classifiers produced a good result. The experiments were conducted with the continuous Wisconsin diagnostic breast cancer dataset and the real breast cancer dataset for Chinese women. Also, the algorithm's convergence is improved by the one class-F-score. One class-F-score is another commonly used metric that calculates the features upon their merits and is responsible for crossover operation, population initialization and mutation operation guidance in GA. They have embraced one class-F-score over the original F-score as it can perform well with the breast cancer datasets since they are immensely lopsided. In the experiment with the WDBC datasets, it was found out that with the Big F algorithm the results were better than compared to Bayesian classifierbased Genetic algorithm (BG). Moreover, Big-F algorithm accuracy and efficiency were quite high for the breast cancer diagnosis resulting the researchers to choose it over BG algorithm. Finally, they concluded that Big-F algorithm outperforms, in feature selection and add rapidly, the old-fashioned BG algorithm and with the supervision of one class-F-score the precision is quite high and they have plan to expand its reach to other cancer datasets.
Memon et al. [14] suggested machine learning-based diagnostic system which commendably identifies, the menacing and compassionate people in the environment of IoT, for the early stage identification of the breast cancer diagnosis. They have utilized recursive feature selection algorithm to improve the further precision of obtaining the datasets and building a strong cataloguing for the classification system. To achieve the outstanding prognostic model for the arrangement of the datasets they have utilized testing/training split method. Furthermore, Matthews's correlation coefficient, F1-score, and execution time, which were automatically computed, have been tested for the performance of the classifier and they have used WBCD for acquiring the cancer datasets. To scheme the machine learning predictive model system, they have worked out SVM and Recursive feature elimination (REF) FS algorithm for the accurate detection of the marked classification of the cancer cells in people. They have finally concluded that their proposed approach is quite accurate in detection of the early stage of breast cancer with huge precision sensitivity and specificity.
Ronoud and Asadi [15] have suggested a diagnosis system utilizing the Deep Belief Network to extract the breast cancer cell data's. They have explained that if DBN is embedded with extreme learning machine (ELM) classifier, it can overcome the challenges with the two well-known methods E (T)-DBN-BP-ELM and E (T)-DBN-ELM-BP. To tackle another challenge of the number of neurons and hidden layers it can be seen that architecture optimization is done by the Genetic algorithm (GA). They have proposed another method, in which the challenges are evaluated, E (TW)-DBN utilizing GA. DBN is developed with different layers of RBM and RBM is an artificial neural network consisting of two layers: (i) Single visible layer (ii) Single hidden layer. The ELM is utilized in this particular research as it does not have to tune the hidden layer which in turn helps to build a connection between input layer and hidden layer easily and the computation done in ELM is based upon efficient least squares method. In the first fine tuning test, ELM is utilized by the model E (T)-DBN-ELM-BP for the selection of weights which in turn produces a more suitable initial point in BP algorithm. In E (T)-DBN-BP-ELM, the second method, the two algorithms BP and ELM labor for the tuning of fine steps in first and second. For the last method E (TW)-DBN, the DBN training is done by GA and it can be seen that the network architecture is enhanced by GA for all the three methods. For the assessment of algorithm's performance is done by the two most famous and accessible data sets which are Breast Cancer Wisconsin-Original (WBCO) and Breast Cancer Wisconsin-Diagnostic (WDBC).Finally, they concluded that the suggested three models performed well in the diagnosis but the first two models, E (T)-DBN-BP-ELM and E (T)-DBN-ELM-BP respectively, accomplished quite outstanding result compared to E (TW)-DBN in the detection with high accuracy of cancer cells.
Kadam et al. [16] suggested feature ensemble learning based on Sparse Auto encoders and Softmax Regression (FE-SSAE-SM model) for early identification of breast cancer datasets and the dataset that was used is Wisconsin Diagnostic Breast Cancer (WDBC).The experimental analysis was evaluated on both (FE-SSAE-SM model) and Stacked Sparse Autoencoders and Softmax Regression based (SSAE-SM) model. It was seen that (FE-SSAE-SM model) performed well in comparison to (SSAE-SM) model and other well-known algorithms. In (SSAE-SM) model the end level of stacked autoencoders is supplied to softmax level for categorization. But it's seen that various representations pops when stacked sparse auto encoder based DNN are taught. The efficiency of (FE-SSAE-SM model) is quite high because of fine tuning was done on the input data layer, feature representation layer, and final Softmax layer. The experiment of the suggested models were carried out in MATLAB and the outcome is (FE-SSAE-SM model) performed well with high sensitivity, precision and specificity.
Supriya and Deepa [17] suggested a Breast Cancer (BC) diagnosis system. They have utilized an optimized artificial neural network. For handling the missing attributes and duplicate data, they used RMA and Hadoop MapReduce techniques respectively. Feature selection is utilized to measure performance deviation in the proposed and other machine learning techniques. Features are considered by exploiting Modified Dragonfly Algorithm (MDA). They gathered the dataset from Wisconsin Breast Cancer Database (WBCD), which consists of two individual datasets. Each of the dataset has 34 and 32 attributes by contrast. They cultivated their proposed ANN model by using GWO technique. The GWO is based on bio-inspired algorithm. The algorithm mimics the activities of the grey wolves to survive as a group (pack). Also, the GWO algorithm is sectored as 3 parts: (i) Encircling Prey (ii) Hunting (iii) Attacking the Prey. After feature selection, they obtained higher accuracy. Finally, their experimental finding are validated using IWDT (Improved Weighted-Decision Tree) algorithm.
Wang et al. [18] build a technique known as Improved Random Forest (RF)-based rule extraction (IRFRE) which can eliminate the drawbacks of the certain methods in describing the causes of the diagnosis. With the help of the decision tree collected for the cancer (breast) diagnosis, a precise and explicable classification rules can be derived for IRFRE. For the experimental analysis they have devised three steps to develop the method: (i) utilizing the Random Classifier for different decision tree models for the development of production containing plentiful decision rules present (ii) to separate decision rules from the trained trees a rule extraction step is developed (iii) at last an enhanced multi-objective evolutionary algorithm (MOEA) is given the task to find out the best deal among the explanation and precision for an optimal ruled for the constituent rule. The constructed method is analyzed on three datasets of the breast cancer which are Wisconsin Diagnostic Breast Cancer (WDBC) dataset, Wisconsin Original Breast Cancer (WOBC) dataset, and Surveillance, Epidemiology and End Results (SEER). In IRFRE histories of past histories is combined with new findings so that the analysis takes place faster and all data are collected for computing from patients and doctors of various facilities. With the help of MOEA, it's seen that a precise and understandable rule predictor can be developed based on the extracted rule of Random Forest technique. For an unique sets of solutions in the optimization process to be created they suggested that an improved tool for elimination of identical individuals and recreating various parents. Overall, the experimental outcomes suggested that the proposed method can clarify the black-box methods and the performance is quite high compared to various famous methods. Finally, they concluded that even with high precision the method can still deduce to Random Forest on three different datasets with enhancement.
Liu et al. [19] suggested in their paper to eliminate the huge cost linked with detecting a false negative compared to false positive, a different breast cancer intelligent diagnosis method was suggested by them. In their proposed algorithm feature selection is utilized as information gain directed simulated annealing genetic algorithm wrapper (IGSAGAW). Also, cost sensitive support vector machine (CSSVM) learning method is used as for the withdrawal of the top m optimal and for the order of the features is achieved accordingly to IG method. Even though their method of feature selection can acquire the remarkable precision as well as the least misclassification cost and decreasing the complications related with SAGASW technique. The datasets that were used in their paper are Wisconsin Original Breast Cancer (WBC) and Wisconsin Diagnostic Breast Cancer (WDBC). In this paper their main goals were of their suggested classification are to examine the enactment of IGSAGAW on selection of feature and also to find out the effect of CSSVM method. From their experimental analysis it can be seen that the precision of IGSAGAW is quite high on feature selection also the misclassification cost of IGSAGAW performed the best. Finally, they concluded that even though their model (IGSAGAW+CSSVM) were not as much precise when compared to IGSAGAW+BP but the computational time required for their suggested method is quite low thus becoming optimum for the diagnosis for breast cancer detection with low cost.
Sangaiah and Vincent Antony Kumar [20] have suggested an algorithm which is a hybrid-type detection system for the breast cancer diagnosis. For the early detection of breast cancer, the suggested algorithm utilized a ReliefF quality reduction with entropy based genetic algorithm. In their experiment, the datasets were used from Wisconsin Breast Cancer Dataset (WBCD) which have different characteristics organized accordingly. They proposed since original relief method can only work on nominal and numerical properties, also they were unable to covenant with data that were not complete and restricted to two -class. To solve this, they have utilized ReliefF which is a broad version of the original Relief method and discard all the shortcomings of the original. ReliefF is not restricted by data types, rapid and can work quite well in noisy surroundings. In classification system, for the detection of the most applicable features, genetic algorithm is being utilized. Entropy is defined as the similarity of datasets and supplies additional perception about the gain of information. In this suggested technique by them, entropy is being utilized as the detection of the applicable features. This helps to build a strong correlation between the input and output and discarded the extra components in the input features. Their proposed method for breast cancer diagnosis which is dependent on ReliefF and Entropy GA technique is found to be quite highly accurate for small features as well as the time required to build the model is also quite low.
Haung [21] suggested a system of high precision for the detection of breast cancer and for this they developed an improved machine learning framework. Mainly, in this framework fruit fly optimization algorithm (FOA) is being utilized with Levy flight (LF) method (LFPA) for the purpose of optimization for the purpose of two vital features of support vector machine (SVM) and construct LFOA-based SVM (LFOA-SVM) for breast cancer diagnosis. The fruit fly optimization is defined as a meta-heuristic method that is enthused by the imitating characteristics of a fruit fly. In FOA the solution of candidate is unanimously developed in the solution space and consequently each fruit fly will apprise their formation with accordance to the fruit fly mode. LFOA-SVM model optimizes the inner bound which in turn vigorously alters the parameters of the SVM with the help of LFOA. Consequently, this optimized parameters are provided into the SVM prediction model for the outer loop to carry out the classification task of breast cancer diagnosis. All the experiments of the models were done on MATLAB and the dataset that was used is Wenzhou people's Hospital (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). They found out that the center point uniqueness of LFOA-SVM that improves the eminence for FOA lies in collaboration of FOA with LF technique. Finally, they concluded that LFOA-SVM technique performs the best among the other similar techniques and can cause early diagnosis of the cancer with high precision that can help the doctors.
Abdar [22] suggested a nested collective model that utilized the Stacking and Vote (Voting) as the classifiers mixture of models in their ensemble technique for the identification of breast cancer tumors. ''Classifiers'' and ''Meta Classifiers'' are the two contents of their each ensemble classifier. In their paper, they have built a two layer nested classifiers where the Meta Classifiers contains at least two various algorithms. They piloted their experiment on the Wisconsin Diagnostic Breast Cancer (WDBC) and it can be seen that their proposed model worked very well with high precision and efficiency compared to other related models. Also, among their proposed methods SV-BayesNet-3-MetaClassifier and SV-Naïve Bayes-3 MetaClassfier the later takes a reduced amount of time to construct which help to enhance the efficiency even though both of them have the same precision. This less time taken to construct and high precision of the proposed model helps to identify the cancer tumors at a very early stage.

B. HEART DISEASE
Tama et al. [23] has introduced a new method of Coronary Heart Disease (CHD) prediction with machine learning techniques. They have built a two-tier ensemble where some of them were used as base to others. They have used Gradient Boost Machine, Random Forest Classifier and Extreme Gradient Boosting. The data sets that had been used during this research were Cleveland, Statlog, Hungarain. An optimized feature was selected for the significant sets of data. Later on, there has been a statistical test too. The experiment showed it outperformed any other models in terms of F1, accuracy and AUC.
Nilashi [24] has established a detective model to detect the heart disease based on machine learning. This model was established by supervised and unsupervised machine learning techniques. Fuzzy SVM and Principal Component Analysis (PCA) was used along with two imputation processes. The two imputation techniques were basically used for the missing value imputation. Moreover, they have applied the increased FSVM and PCA for increased learning of the data. It was done to lessen the computation time. This was related to disease prediction. This experiment was done on two real data sets of data. The whole experiment showed that increased FSVM helps to gain better accuracy for detecting a heart disease. Also, it stated that it helps to lessen the time of computation.
Sharma et al. [25] has deal with the modified artificial plant optimization (MAPO) technique. Plant optimization (MAPO) technique has been used as an optimal feature selector. It was combined with other machine learning techniques. By this process it can find out the rate of heart by utilizing the data set of fingertip video. This results in the prediction of coronary heart disease at that moment. The dataset was made from a long time ago and all noises were filtered. Use of Standard Error Estimate and Pearson Correlation was seen here. They give the estimation of 2.418 and 0.9541 respectively. The prediction was utilized on two other different dataset and MAPO has been applied again to detect coronary heart disease. The experiment result showed a significant outcome from MAPO which lessened the dimensionality. Compared to other dimensionality, there was a reduction of 81.25%. So basically it outperformed the other optimizer with far better accuracy.
Reddy et al. [26] has established a hybrid genetic algorithm and a fuzzy logic classifier for heart disease detection. By this, the heart disease can be detected in the early stage. This hybrid technique was comprised of rough set data and the fuzzy rule classification model. The fuzzy classification were optimized by the algorithm called genetic algorithm. In the rough set, the important data were selected, which will cause heart disease. Then an adaptive genetic algorithm with fuzzy logic (AGAFL) hybrid classifier predicts the disease. The whole test was done on the data set of the UCI heart Khourdifi and Bahaj [27] have worked with Fast Correlation-Based Feature (FCBF) to outrun the repeating features so that they can improve the quality of the classification of heart disease. They have also performed different classification such as Random Forest, SVM, Multilayer Perceptron, Naïve Bayes and K-nearest neighbor. Moreover the utilization of artificial neural networks was there along with particle Swarm Optimization by combining the Ant Colony Optimization (ACO) process. This hybrid mixture approach was applied to the set of data of heart disease. They have also studied different machine learning algorithms and compared their results. Thus they got the accuracy of 99.65% for the hybrid model consisting of FCBF, ACO and PSO. The results were satisfying compared to the process of machine learning.
Parthiban and Subramanian [28] have introduced a new technique called coactive neuro-fuzzy inference system (CANFIS) for the detection of heart disease. The model was collaborated with the machine learning techniques of neural network capabilities and the fuzzy logic. This full mixed qualitative technique was integrated with the help of genetic processes. It helps to detect the presence of that disease. The results of this model were evaluated based on classification, perfectness, and the performance of training. At last the result has shown a better potential accuracy and efficiency.
Gokulnath and Shantharajah [29] have introduced an optimizer based on the Support Vector Machine (SVM) classifier function. Genetic algorithm (GA) is one of the uses of this objective function. By this way many more significant characteristics can be chosen for predicting heart disease. There have been some good experimental outcomes when they compared the GA-SVM with CFS, Relief, Filtered subset, consistency subset, gain ratio etc feature selectivity techniques. The analysis for the operation of receiver features was performed to determine how the SVM classifier works. It came out pretty good. Moreover the dataset collected from Cleveland was processed in MATLAB along with the desired framework.
Nalluri et al. [30] have proposed a hybrid system which utilizes optimization of classifiers. SVM and Multilayer perceptron technique. Other than that 3 recent algorithms was deployed for the optimization of the parameters. Moreover, it leads to six alternative hybrid disease diagnosis systems. It can also be called a hybrid intelligent system (HISs). Sensitivity, accuracy, namely, objectives and specificity has been compared with the other techniques. The desired model was evaluated on the benchmark of 11 sets of data. The experimental result showed that it performs better in terms of our criteria than other techniques. Furthermore, a statistical test was deduced to substantiate the efficiency of the results.
Liu [31] has focused on the study to help the process of diagnosis of heart problems by utilizing a classification system which is hybrid. Rough set and Relief f methods were the key of this whole experiment. The desired system contained two sub models. One is a system of classification with ensemble classifiers and other is the selectivity system of characteristics of RFRS. Three stages were introduced to the RFRS system. These are data discretization, feature extraction using the Relief F, feature reduction using the heuristic rough set of reduction. The second system was based on the C4.5 classifier. For the jackknife cross validation process 92.59% accuracy was obtained. It showed the technique gives better performance compared to other classification techniques.
Bashir et al. [32] has proposed a novel classifier ensemble framework that has been based on increased approach of bagging along with the scheme of voting which is weighted multi objectively for analysis and prediction the disease of heart. By using this model, many limitations were overcome as well as resolved some performance issues. It was compared to the five other quadratic discriminant analysis, Naïve Bayes, SVM, linear regression, Instance based learner. The authors have utilized five different types of sets of datas for the experiment, validation and evaluation. Those were public data. Upon the comparison of different classifiers, the proposed model has been validated by ANOVA statistics and ten-fold cross validation. Also, the results showed an accuracy of 84.16%. 93.29% sensitivity, specificity of 96.70% were obtained with the proposed model. F-critical was lower than f-ratio and p-valued came out smaller than 0.05 as 95% results proved significant for all the datasets.
Long et al. [33] have briefly discussed and dealt with one of the heart disease diagnosis systems which was based on rough sets attribute reduction and a logic system called internal type-2 fuzzy (IT2FLS). The attribute reduction which has been based on rough sets had to undergo the integration process. For IT2FLS it was the same, didn't change. The goal was to deal with the uncertainties and challenges of big datasets. The IT2FLS used a learning process which is hybrid by consisting of the algorithm called fuzzy C mean clustering. When it comes to work with the dataset which are high dimensional, the process of learning is expensive. To find out the optical reduction, firefly algorithm was used. It was seen that, it increased the working performance of IT2FLS. The result of the experiment has a very improved and dominant significance compared to other machine learning techniques such as SVM, Naïve Bayes, artificial neural network. Thus it was told to be very useful for heart disease diagnosis.

C. DIABETES
Mishra et al. [34] has proposed a new hybrid attribute optimization algorithm known as Enhanced and Adaptive Genetic Algorithm (EAGA) for diabetes diagnosis. Authors used the optimization algorithm on Pima Indian diabetes diagnosis. Furthermore the EAGA model was used with Multilayer Perceptron (MLP) and this hybrid classification approach was implemented for diabetes diagnosis. This EAGA-MLP model was also compared with several existing studies and the performance was evaluated based on some important performance metrics. Experimental result showed that this model has the maximum accuracy rate and least computational time among all the previous classification algorithms.
Massaro et al. [35] has focused on applying Long Short-Term Memory (LSTM) network for predicting health status of diabetes patients. The proposed LSTM network in this study is also suitable for DSS platforms. Authors used the Pima Indians Diabetes Dataset that contains several medical predictor variables and a single target variable. Traditional LSTM neural networks and a new approach LSTM artificial records-AR were integrated into the information system to collect patient information and data. The LSTM artificial data was adopted for improving the training dataset in order to obtain better accuracy. Then a prediction model was created based on co-occurrent analysis of several attributes. Results showed that by using LSTM-AR-approach a better accuracy can be found compared to traditional LSTM approach and Multi-Layer Perceptron (MLP) applied on the datasets.
Choubey et al. [36] evaluated the performance of several classification methods with Principal Component Analysis (PCA) and Particle swarm optimization (PSO) for diabetes. PCA which is basically a feature reduction method can identify which features are relevant in the feature set of linearly dependent features. PSO method that is also a popular stochastic optimization technique also used for feature reduction. Authors used the Pima Indian Diabetes Dataset and Localized Diabetes Dataset to analyze their research work. In the first approach of their research they utilized several classification methods namely Logistic Regression, K-Nearest Neighbor (KNN), ID3 DT, C4.5 DT, Naive Bayes, KStar on the datasets. In another approach authors used PCA and PSO before classification of dataset for feature reduction. After that classification of the dataset was done using the same classification methods. The comparative analysis between the two approaches showed that the later approach that used PCA and PSO algorithms resulted in higher accuracy and less computational time. Authors also suggested that this proposed approach can also be applied efficiently for early diagnosis of other medical diseases.
Shankar Babu et al. [37] presented a diabetes prediction model by using the theory of fuzzy rule. The optimization that the authors used for this model is grey wolf optimization. This optimizer gives better and efficient accuracy in prediction. In this optimizer the first step is to initialize the grey wolf optimizer population and initialization of alpha wolves and beta wolves. The fitness value is then calculated. After maximum iteration the values are updated and the fitness value is calculated again. The classification result is finally obtained by using the updated fuzzy rules. Authors also used ant colony optimization for predicting the model. After performance analysis of both optimizers, authors found that GWO gives much better results in terms of accuracy, precision and recall than Ant Colony Optimization (ACO).
Bani-Hani et al. [38] implemented a Recursive General Regression Neural Network(R-GRNN) oracle in their research for prediction and diagnosis of diabetes. GNN which is an enhanced oracle of GRNN takes into account the 137858 VOLUME 9, 2021 predictions of individually trained classifiers as well as the first GRNN oracle. It then gives one superior prediction as output based on the error rate for each classifier that was taken from a set of observations. Dataset used in this research was taken from Pima Indians Diabetes Dataset. It contains data of about 768 observed patients among which 268 have diabetes. For feature selection and hyperparameter optimization authors utilized Genetic Algorithm (GA). They used several classifiers along with R-GNN Oracle namely Support Vector Machine (SVM), Random Forest(RF), Probabilistic Neural Network (PNN), Multilayer Perceptron (MLP), K-Nearest Neighbor (KNN), Gaussian Naive Bayes (GNB) and GRNN Oracle. After performance evaluation it was observed that the proposed model (R-GNN) resulted in the highest accuracy, AUC and sensitivity than the other models. However in terms of specificity optimized MLP resulted in higher value than R-GNN. For future study authors suggested applying hyperparameter optimization and feature selection instead of utilizing feature selection based on optimized hyperparameters from all the features.
Alirezaei et al. [39] proposed a hybrid optimization algorithm for reducing noise and data dimension in diagnosis. In their research authors analyzed PIMA Indian Type-2 dataset and applied several methods to figure out which gives the best classification accuracy. After using KNN algorithm for imputing missing data, K-means algorithm was utilized for detecting and deleting outliers. For data pre-processing they used four meta-heuristic algorithms namely Non-dominated sorting genetic algorithm (NSGA-II), multi-objective particle swarm optimization (MOPSO), multi-objective firefly (MOFA), and multi-objective imperialist competitive algorithm (MOICA) that helped to obtain effective and significant features. Finally a SVM was implemented for classification of data. Authors used several performance metrics for comparing the algorithms. After performance comparison analyses it was seen that MOFA proved to be the best algorithm among the four optimizers in terms of the performance metrics.
Cui et al. [40] utilized an improved support vector machine algorithm to predict readmission of diabetic patients. Authors investigated readmission rate of diabetic patients in 130 United States hospitals. Due to imbalance of data, they utilized an efficient SMOTE-based class imbalance processing method. The support vector machine classifier (SVM) was used to classify the data. To optimize the SVM classifier GA was used. GA is implemented to tune three kernel functions linear, polynomial and sigmoid kernel to provide better accuracy. It also helps in searching optimal parameters for RBF-based support vector machines. Authors also compared their proposed method with other methods like LACE score, Naive Bayes, Decision Tree, Logistic Regression and Back Propagation Neural Network (BPNN). Experimental results showed that their proposed diabetic readmission prediction method proved to be superior to the other compared methods.
Paul and Choubey [41] have used GA-RBF NN classification system on Pima Indian Diabetes Dataset (PIDD) for the diagnosis of diabetes. In the first stage of their work, authors used GA for feature selection. It helped to remove insignificant features. Moreover, it provides less computational time with increased accuracy. In the next stage Radial Basis Function Neural Network (RBF -NN) was utilized for classification of data. This classification method was compared with some earlier methods and it was observed that the later requires less cost and improved training and classification. Authors also suggested to use this classification system for several other medical diseases.
Maniruzzaman [42] applied four classification techniques to diagnose and classify diabetes dataset. Authors proposed Gaussian process classification (GPC) technique in their study compared with some existing classification techniques. GPC technique is introduced in the study as it is capable of handling several problems in regression and classification like complex data types, linearity of classical methods, curse of dimension etc. Moreover uncertainty in unknown functions can also be easily handled using the Gaussian process. Authors used the classifiers in the Pima Indian diabetes dataset. Apart from GPC classification technique the other classification techniques that were used are LDA, QDA and NB. Then performance evaluation was done based on the parameters: classification accuracy (ACC), sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV). Performance analysis showed that GPC classification has the best outcome results among all the four classifiers.
Karegowda et al. [43] integrated Genetic Algorithm and Black Propagation network (BPN) to implement a hybrid model. In the model they used Genetic Algorithm for optimizing and initializing the connection weights of Black Propagation network. GA is utilized for improving the performance of BPN in many ways. It is an efficient search method that is able to find out large search spaces that can be used with BPN for determining the number of hidden nodes and hidden layers. It can also select relevant feature subsets, determine the learning rate, momentum and optimize the network connection weights of Black Propagation network. For finding out the relevant features, authors used Gain ratio and Correlation based feature selection. The dataset that was used for this model was taken from Pima Indians Diabetes Database (PIDD). The classification of this dataset was done using the hybrid GA-BPN model. This model showed significant improvement in classification accuracy.
Ganji and Abadeh [44] presented FCS-ANTMINER which is a new classification algorithm obtained by combining Ant Colony Optimization (ACO) and Fuzzy logic. They used this classification algorithm for diabetes disease diagnosis. In their research they used the Pima Indian Diabetes dataset (PID) and implemented FCS-ANTMINER on this dataset. The FCS-ANTMINER consists of a training stage and a testing stage. During the training stage, an ACO algorithm is applied where fuzzy rules are generated through training patterns. In the testing stage, for classifying the test patterns, a fuzzy inference engine is utilized VOLUME 9, 2021 by the test classifier. After using the proposed classifier (FCS-ANTMINER) the result showed that it has the highest classification accuracy compared to several recognized classification algorithms.

D. PARKINSON DISEASE
Tomar et al. [45] has utilized Least Squares Twin Support Vector Machine (LSTSVM) for Parkinson disease diagnosis. This classifier was used on Parkinson disease dataset. LSTSVM classifier is very much efficient as it takes less time for computation and provides better generalization ability. For feature selection Particle Swarm Optimization (PSO) was used. The parameters were also optimized using PSO. Authors evaluated the classification model based on some parameter metrics namely accuracy, sensitivity and specificity. It was observed that LSTSVM+PSO classifier model gave the best performance result when used with Gaussian Kernel function. Datasets were taken from UCI. The experiment was resulted in accuracy, specificity and sensitivity.
Sharma et al. [46] proposed a modified version of grey wolf optimization for diagnosing Parkinson's disease. Modified Grey Wolf Optimization (MGWO) which is a special type of Grey Wolf Optimization (GWO) is utilized in the research for feature selection. MGWO takes a group of features as input and gives reduced and relevant features as output. This optimizer helps to get better accuracy result for recognizing Parkinson's disease. Authors implemented their proposed model on several dataset like voice, handwriting and speech. The proposed algorithm was compared with Optimized Cuttlefish Algorithm (OCFA) and it was observed that MGWO algorithm maximized the accuracy result and using minimum features for PD diagnosis. These data contain different voice, speech and handwriting. The proposed system helped to detect the illness with a 94.83% accuracy containing 98.28% rate of detection along with 16.03% alarm rate. Later on these were also compared to the result of Optimized Cuttlefish algorithm (OCFA). But the proposed model showed better results.
Shahsavari [47], classification of Parkinson's Disease (PD) patient dataset was analyzed using the Extreme Learning Machine (ELM) model. This model acts as a type of feed-forward neural network having a single hidden layer for classifying the patients. ELM provides faster computation and better generalization. Moreover, it has the ability of universal approximation. Authors used Hybrid Particle Swarm Optimization (PSO) for feature selection. This optimizer helps to extract relevant features and the ELM-based model is also improved through it. Dataset used for this research was taken from a PD patient's medical information dataset. Then the proposed model was implemented on the dataset. Experimental results showed that the proposed model leads in Accuracy, Recall, Precision and F-score than the other compared models.
Cai et al. [48] proposed an optimal SVM based on bacterial foraging optimization (BFO) for the identification of Parkinson Disease. The fruitfulness of BFO+SVM was proved to be precise on the datasets of PD established on vocal computation. Comparison of the suggested model (SVM+BFO) was done with other optimization techniques such as SVM based on the grid search method and an SVM based on PSO. BFO is categorized as three main procedures which are (i) Chemotaxis (ii) Reproduction (iii) Eliminationdispersal. There are mainly main kernel functions such as linear kernel, polynomial kernel, sigmoid kernel and radial based kernel. The RF-BFO-SVM framework was constructed to distinguish between the PD patients from the healthy controls by mixing the feature selection with the proposed model (BFO+SVM). All the experiments were executed on MATLAB and the evaluation metrics of the proposed model was precise and satisfactory.
Sankara Babu et al. [37] suggested in their paper that the vital goal of their work was to identify the various diseases at an early stage. They proposed utilizing GWO and auto encoder based Recurrent Neural Network (GWO+RNN) for identification purposes. GWO is being utilized for feature selection purpose and RNN technique is being utilized for the identification of diseases. GWO is chosen over other stateof-the-art because it eliminates unnecessary and inessential characteristics ominously. This is done after the features are sent out to the RNN classifier. The datasets that were used in their paper are Hungarian, Cleveland, PID, mammographic masses, Switzerland. From their experiment, they observed that GWO+RNN technique's effect on identification of diseases accomplished enhanced result compared to similar techniques like Group Search Optimizer and Fuzzy Min-Max Neural Network (GFMMNN) approach. The simulation was done on PyCharm and the evaluation metrics were evaluated, suggesting that the proposed model GWO+RNN method is an accurate model.
Shen [49] proposed a Fruit Fly Optimization based SVM (FFO-SVM) was implemented for medical diagnosis classification. Authors used four datasets namely Parkinson Dataset, Pima Indians Diabetes Dataset (PIDD), Wisconsin breast cancer dataset and Thyroid dataset. The fruit fly optimizer follows a few steps to work. After parameter and population initialization it evaluates the population. Then after replacing the smell concentration value with Fitness function it finds the maximal smell concentration. After keeping the maximal concentration and going through several iterative optimization it gives the best optimized value. In this study, FOA optimizer was used for optimizing the SVM parameters and SVM model performs classification task based on the optimal parameters gained by FOA technique. Authors also used GA-SVM, BFO-SVM and Grid-SVM on the same datasets to compare with the proposed model. Performance analysis showed that the proposed method (FOA-SVM) gave the maximum classification accuracy with lease computation time. Authors also suggested to use this model for further medical diagnosis.
Qiang Li et al. [50] suggested in their paper a unified system known can be developed, grey wolf optimization (IGWO) and kernel extreme learning machine (KELM) termed as IGWO-KELM, for the detection of medical diseases. The selection of feature is utilized for the objective of uncovering the optimal feature subset of the records of the medical. They at first embraced the genetic algorithm (GA) to produce mixed initial positions and Grey wolf optimization is utilized for the refresh of current positions of habitants in the secluded space. This results of achieving the best feature subset for the improved classification based on KELM. After experimenting the suggested model on the two common diagnosis pattern and evaluating different evaluation metrics, it can be observed that the suggested model IGWO-KELM outperformed the other two equivalent models which are GWO and GA. Finally, they have concluded that since their suggested method has high precision and efficiency on the two datasets so it can be utilized for early detection and in the future for similar related practical medical diseases.
Wu [51] has focused on Deep Brain Stimulation (DBS). They have approached the Radial Basis Function Neural Network (RBFNN) along with Particle Swarn Optimizer (PSO). Also the use of principal component analysis (PCA) and Local Field Potential (LFP) was seen in this research paper. The whole system was trained by real life Parkinson patient. Thus they found out the detection of accuracy can be rise up to 89%. Moreover, there have been a comparison among RBFNN based PSO & RBFNN. But RBFNN based PSO was far behind in terms of performance with a fair reduction in computation.

VI. DISCUSSIONS, CHALLENGES AND FUTURE DIRECTIONS
In this review paper, we have discovered some recent literature on medical data classification utilizing optimized machine learning algorithms. Several optimization techniques are involved nowadays in machine learning algorithms to achieve better results. Presently, there are a huge span of data globally. But most of the datasets are unstructured and not well organized for machine learning algorithms to reuse. Even some datasets have lower quality to extract key features to achieve an outstanding accuracy. In this regard, various optimization techniques are incorporating to extract key features from the datasets. This review insights some recent works and demonstrates the impact of optimizers in machine learning systems. The whole review article introduces the overall optimization impact, data preprocessing methods, performance evaluation techniques in different datasets. There are several considerate optimization technique are utilized extensively in recent works. These widely used optimization techniques are genetic algorithms, particle swarm and grey wolf optimization techniques. However, these are base techniques. But most of the have used ensemble optimization techniques using these base optimization techniques.
For the review convenience, Table 1 provides an overview of recent advances of machine learning with optimizars in breast cancer prediction. From the table, we can see [11]- [22] have utilized several optimization methods. In most literature, they have utilized WBC, WBCD and WBCO datasets. K-fold validation based dataset splitting process is used. Where, only [11] is utilized 5 k-fold cross validation used. But in [14] and [15] have utilized only manual data splitting method. However, in [18]- [22] have utilized 10 k-fold validation. In the [13], they have achieved the highest accuracy of 99.20%, which is higher than all other recent works. They used WBCD dataset, and 5 k-fold based data splitting. By contrast, they have enforced grey wolf optimization technique to achieve this remarkable performance. Where the most works have the accuracy ranging from 82% to 99%. On the other hand, genetic algorithm based E(T)-DBN-BPELM have performed well in [15] with an accuracy of 99.45% in WBCO dataset. Moreover, In [79]- [86], the authors have discussed about and deployed traditional machine learning and deep learning models to detect breast cancer without optimizers. Although, they have also achieved a decent outcome. But with optimization solutions are more robust for detection breast cancer. Table 2 demonstrates heart disease classification using machine learning with optimizations. This is an comprehensive outlook of recent machine learning based classification task for heart disease prediction. However, from the table we can see [23]- [33] have utilized several optimization methods. Where most common datasets that are utilized such as UCI Heart Disease dataset, Cleveland dataset and Starlog heart disease datasets. In [24], [29], [30] and [33], they have utilized 10 k fold-validation for dataset splitting and validation. Only [23] have utilized 5 fold-cross validation in their experiment. In [24] have achieved the highest accuracy of 99.65% by using the particle swarm, ant colony and FCBF optimization techniques. However, they have validated two machine learning algorithms such as K-NN and Random Forest. Where K-NN achieved the highest accuracy. On the other hand, most of the accuracy ranging from 81% to 99%. In [87]- [93], the authors have discussed about and deployed traditional machine learning and deep learning models to detect heart disease without optimizers. Although, they have also achieved a decent outcome. But with optimization solutions are more robust for detecting heart disease.  Table 3 provides an outlook of recent advances of machine learning algorithms with optimizers in diabetes prediction. From the table we can see [34]- [44] have utilized several optimization techniques to enhance the overall prediction accuracy. Around all recent works have utilized Pima Indians Diabetes Database (PIDD). This dataset is extensively used in research community. In [40], they have utilized multi-objective particle swarm optimization (MOPSO) for optimization purpose. They have achieved the highest accuracy of near 100% in the Pima Indians Diabetes Database (PIDD) which is higher than any other recent works. Principle component analysis ensemble with particle swarm optimization technique has been adapted by [37], where they have achieved 95.58% accuracy using C4.5 Decision Tree model. As per the literatures, the dataset splitting is done manually. On the other hand, in [44], Enhanced and Adaptive Genetic Algorithm (EAGA) based optimization also achieved a higher accuracy of 94.7%. Most of the accuracy ranging from 76% to 99%. In [94]- [100], the authors have discussed about and deployed traditional machine learning and deep learning models to detect diabetes without optimizers. Although, they have also achieved a decent outcome. But with optimization solutions are more robust for detecting diabetes. Table 4 demonstrates the overall scenario of machine learning based parkinson disease prediction with optimizers. From the table we can see [45]- [52] have utilized several optimization techniques with machine learning models to enhance the overall performance of the model. However, there are numerous datasets are used. Internally developed datasets are also used in their experiment. Parkinson Dataset is the most VOLUME 9, 2021 common, utilized by [46], [49], [50]. However, in most cases particle swarm optimization based optimization techniques are used in the most literature. In parkinson dataset [46], Butter Fly Optimization based SVM has achieved accuracy of 96.89%. However, particle swarm optimization is also performed better with an accuracy of 97.95%. Which is higher than all recent works. In [101]-[109], the authors have discussed about and deployed traditional machine learning and deep learning models to detect parkinson disease without optimizers. Although, they have also achieved a decent outcome. But with optimization solutions are more robust for detecting parkinson disease.
Machine learning has become very fascinating in healthcare domain. Due a large volume of healthcare data, it is difficult for the medical doctors to identify a specific disease efficiently. However, to tackle this issue, dedicated AI researchers are applying various statistical techniques in healthcare data to classify them accordingly. Therefore, in future more robust datasets should be prepared to achieve a sophisticated result in computer-aided medical diagnosis. However, authors believe that there is a very high possibility to achieve desired results by producing optimized AI models in medical data classification area. Moreover, it will also help the physicians to make diagnosis decisions more accurately and improve the overall healthcare policies. Thus, optimization techniques in machine learning can play a very significant role to perceive better outcome rather than replacing the physicians.
Optimization techniques are mathematically standard. It needs good mathematical background to come up with a new optimization technique. However, all existing optimization cannot deliver desired output in the machine learning models. So it is important to figure out which optimization technique can be the best choice for a specific model and dataset. These are the key challenges on optimization techniques in machine learning domain. These challenges can be addressed for future remarks.

VII. LIMITATIONS ON THIS REVIEW
This paper is a comprehensive review of machine learning optimization techniques in medical data classification. Recent advances of optimization techniques used in literature are highlighted. On the other hand, a contemporary introduction of machine learning and most extensively used optimization techniques are highlighted in this review paper for better intuition to the readers.
However, there are still some limitations in this review paper. Due to less number of recent works on optimization techniques for machine learning, this paper is still lack of robust outlook for making further decisions on determining better optimization technique for the medical data classification and best feature selection. Hopefully, in future more research will work on optimization techniques so that a robust review paper with more recent works will be highlighted. But in short, this review paper is still a far-reaching outlook of current advances. So researchers can use this as an initial reference.

VIII. CONCLUSION
This review paper involves the current aspect of machine learning model with optimization techniques to classify the medical data. We tried to give a holistic outlook of some recent works on predicting breast cancer, heart disease, diabetes and parkinson disease. The main intend of this review paper is to feature recent studies of optimization techniques involved in machine learning for a better prediction system. This paper will help the researchers and industrial engineers to analyze the better optimization technique for medical data classification. This review paper has been structured into several parts. Firstly, we started with a formal introduction of this review and diseases. Then an introductory background has been discussed in the next section with demonstrates the classification of machine learning, deep learning and optimization techniques. After that, some literature are presented of each disease. Which allows the prospect of current scenario of optimization techniques. In near future, researchers can utilize the optimization techniques to build compact solution for intelligent decision system making. In our observation, some common datasets have been used for these experiments. More elaborate and large scale datasets are needed to validate the overall performance of the optimization techniques for the machine learning models.
TAKI HASAN RAFI received the bachelor's degree in electrical engineering from Ahsanullah University of Science and Technology, Dhaka, Bangladesh. He will be starting the Ph.D. degree in computer science in this year. He has published around ten research articles in various peer-reviewed journals and IEEE flagship conferences. Prior to his research involvement, he spends couple of years in a start-up and an internet service provider company in Bangladesh. His research interests include deep learning in medical applications, generative adversarial networks, spiking neural networks, and human-computer interaction. He also works as a reviewer in various SCI journals and IEEE conferences. He is currently a Full Professor of electrical engineering affiliated with New York University (NYU), Abu Dhabi. His current and past academic and research appointments also include Massachusetts Institute of Technology (MIT), Harvard University, and the University of Waterloo. He was a Full Professor of electrical engineering with Khalifa University (formerly, Etisalat University College), United Arab Emirates, from 1993 to 2017, during which he received several times the Excellence in Teaching Award and Distinguished Service Award. He has over 380 publications in the form of articles in peer-reviewed journals, papers in refered conference proceedings, book chapters, and U.S. patents. His publication span several research areas, including 6G and terahertz communications, modern antennas and applied electromagnetics, signal and array processing, machine learning, the IoT and sensor localization, medical sensing, and nano-biomedicine. He is a fellow of MIT Electromagnetics Academy and a Founding Member of MIT Scholars of the Emirates. He is a standing member of the editorial boards of several international journals and serves regularly for the steering, organizing, and technical committees of IEEE flagship conferences in Antennas, Communications, and Signal Processing, including several editions of IEEE AP-S/URSI, EuCAP, IEEE GloablSIP, IEEE WCNC, and IEEE ICASSP. He is also a Board Member of European School of Antennas and the Regional Director of the IEEE Signal Processing Society in IEEE Region 8 Middle East. He is a Founding Member of five IEEE society chapters in United Arab Emirates, which are the IEEE Communication Society Chapter, the IEEE Signal Processing Society Chapter, the IEEE Antennas and Propagation Society Chapter, the IEEE Microwave Theory and Techniques Society Chapter, and the IEEE Engineering in Medicine and Biology Society Chapter. He was a recipient of several international awards, including the Distinguished Service Award from ACES Society, USA, and MIT Electromagnetics Academy, USA. He organized and chaired numerous technical special sessions and tutorials in IEEE flagship conferences. He delivered more than 60 invited speaker seminars and technical talks in world-class universities and flagship conferences. He has served as the TPC Chair for IEEE MMS2016 and IEEE GlobalSIP 2018 Symposium on 5G satellite networks. He served as the Founding Chair for the IEEE Antennas and Propagation Society Educational Initiatives Program. He is the Founder and the Chair of IEEE at New York University at Abu Dhabi. He is an Officer of IEEE ComSoc Emerging Technical Initiative (ETI) on Machine Learning for Communications. He is the Founding Director of the IEEE UAE Distinguished Seminar Series Program for which he was selected to receive, along with Mohamed AlHajri of MIT, the 2020 IEEE UAE Award of the Year. He served as an Invited Speaker for U.S. National Academies of Sciences, Engineering, and Medicine Frontiers Symposium. He holds several leading roles in the international professional engineering community. He is also an Editor of IEEE JOURNAL OF ELECTROMAGNETICS, RF AND MICROWAVES IN MEDICINE AND BIOLOGY, and IEEE OPEN JOURNAL OF ANTENNAS AND PROPAGATION. VOLUME 9, 2021 FAISAL FARHAN received the bachelor's degree in electrical and electronic engineering from Ahsanullah University of Science and Technology, Dhaka, Bangladesh, in January 2021. He is currently pursuing the Ph.D. degree in machine learning and artificial intelligence. He is also a full-time Lecturer with the Department of Electrical and Electronic Engineering, Ahsanullah University of Science and Technology. He has worked on several academic and independent research projects. His volunteering affiliations include IEEE AUST Student Branch, IEEE AUST WIE Section, and AUST EEE SOCIETY. His research interests include signal processing, speech processing, brain-computer interface, machine learning, and applied artificial intelligence. He was a recipient of the 2021 Dean's List of Honour Award at Ahsanullah University of Science Technology for his academic excellence.
MD. ZIAUL HOQUE received the bachelor's degree in electrical and electronics engineering from Ahsanullah University of Science and Technology, Dhaka, Bangladesh, where he is currently pursuing the M.Sc. degree in electrical engineering with fully-funded scholarships. His research interests include machine learning, signal processing, and pattern recognition. Thus these have been always the core of his research.
FARHAN MOHD QUAYYUM received the bachelor's degree in electrical and electronic engineering from Ahsanullah University of Science and Technology, Dhaka, Bangladesh, where he is currently pursuing the M.Sc. degree in electrical engineering. His research interests include machine learning, deep learning, and speech recognition.