A Framework for Schizophrenia EEG Signal Classification With Nature Inspired Optimization Algorithms

One of the severe and prolonged disorder of the human brain which disturbs the behavioral characteristics of an individual completely such as interruption in the thinking process and speech is schizophrenia. It is a manifestation of many symptoms such as hallucinations, functional deterioration, disorganized speech and hearing sounds and speeches that are non-existent. In this paper, a computerized approach based on optimization and classification is done to analyze the classification of schizophrenia from Electroencephalography (EEG) signals. As EEG can analyze a lot of brain disorders and is used to study the diseases of the brain in an in-depth manner, it can be used to analyze the schizophrenia EEG signals. In this paper, three feature extraction techniques are employed such as Partial Least Squares (PLS) Non linear Regression technique, Expectation Maximization based Principal Component Analysis (EM-PCA) technique and Isometric Mapping (Isomap) technique. The extracted features are further optimized with four optimization algorithms such as Flower Pollination algorithm, Eagle strategy using different evolution algorithm, Backtracking search optimization algorithm and Group search optimization algorithm. The optimized values are then classified with varied versions of both Adaboost classifier and Naïve Bayesian Classifier. The individual results show that for normal cases, Isomap features when optimized with Backtracking search optimization algorithm and classified with Modest Adaboost classifier, a classification accuracy of 98.77% is obtained. The individual results show that for schizophrenia case, when Isomap features are optimized with Flower Pollination optimization algorithm and classified with Real Adaboost classifier, a classification accuracy of 98.77% is obtained.


I. INTRODUCTION
One of the chronic psychiatric disorders which troubles human beings to a great extent is schizophrenia [1]. When a patient is affected with schizophrenia, the patient exhibits a high level of disturbance in thoughts and perceptions. When affected with schizophrenia, the patient feels extreme difficulty in dealing with relationships [2]. When a person is affected with schizophrenia, the day to day activities of the person is severely affected thereby affecting the employment, The associate editor coordinating the review of this manuscript and approving it for publication was György Eigner . marriage and lifestyle [3]. There are both positive and negative symptoms in schizophrenia. The negative symptoms include lack of normal capabilities to perform routine tasks, lack of confidence and motivation to do even the simplest task accompanied by lack of speech. Auditory hallucinations and delusions along with thought disorders are considered to be positive symptoms. Thus, due to the disturbance in some brain functions, schizophrenia easily occurs and people in various sectors or professions have been troubled by this disorder. To the individual this disorder has caused damage in a microlevel and to the country, this disorder has caused a huge damage in macrolevel affecting the economic system VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ of the country too. When a person is showing psychological signs of incapacity to perform the routine duties, then the patient is tested for this psychotic disorder [4]. When a person is affected with schizophrenia and is at the initial stages at the workplace the colleagues and teammates treat him with suspicion not knowing the natural and original condition of the person. A limb mutilation, serious muscle injury, infections due to medicine and allergy and even cancer can bring back the person to normal social life but if a person is affected with this psychological disorder, then it is very difficult for the person to return back to social life. For the diagnosis of mental disorders, EEG has emerged as a powerful tool because it can interpret the brain state so well [5]. EEG is the simple measurement of electrical activity with the help of electrodes placed on various locations on the scalp. Due to its efficiency as a quick recordable tool for analyzing the general cognitive activity, this lowresolution diagnosis tool is widely used everywhere. Thus, the dynamics of the brain can be analyzed by EEG as this provides a high spatio-temporal data [6]. EEG even shows very small intra-personal differentiation with large interpersonal differentiation. EEG is thus quite feasible to diagnose a lot of neurological disorders. Some of the important works done in the schizophrenia analysis such as classification and interpretation of this neurological disorder is given below.
The generalized application of EEG for analyzing schizophrenia patients were in the following fields such that is was used to assess the functional connectivity and working memory [7], analyzing the Event Related Potentials during selective attention for single subject [8], combining odd ball and mismatch evoked potential paradigms for the single subject classification [9], alpha resting analysis based on nonlinear dynamic studies [10], the connectivity maps-based analysis [11], graph based analysis in brain connectivity [12], heat diffusion based dissimilarity analysis [13], resting EEG in the first episode and chronic schizophrenia stages [14], computation of the abnormal dynamics of EEG oscillations on multiple time scales [15] and the visual evocation of emotion through EEG entropy analysis in schizophrenia was reported in [16].
The non linear complexity analysis of brain functional Magnetic Resonance Imaging (fMRI) schizophrenia signals was reported in [17] and the multimodal classification with both Magnetoencephalography (MEG) and Functional Magnetic Resonance Imaging (fMRI) data utilizing both static and dynamic connectivity measures was reported in [18]. For the nonlinear analysis [19] of schizophrenia EEG data, principal Lyapunov exponent was calculated in [20], the long-range correlations in choice sequences was performed in [21], the dimensional complexity was reported in [22], multifractal behavior of schizophrenia EEGs was reported in [23], signal complexity analysis for schizophrenia patients in [24], multiscale entropy analysis in [25] and entropy modulation deficit analysis was reported in [26]. Artificial Neural Networks (ANN) was used to classify the schizophrenia with the help of EEG by Li and Fan [27] and reported an average classification accuracy of 60% only. A multi-domain Convolutional Neural Networks (CNN) was proposed by Phang et al. for the classification of EEG based brain connectivity networks in schizophrenia and reported an average classification accuracy of 91.69% [28]. A holistic approach of classification of schizophrenia based on EEG signals was done by Boostani et al. and they reported an average classification accuracy of 87.51% [29]. The neural classifiers for schizophrenia diagnosis support and diffusion imaging data was done by Mayo et al. and reported a specificity of 0.967 and a sensitivity of 0.872 [30]. For the automated diagnosis of schizophrenia using EEG signals, a deep CNN model was proposed by Oh et al. and they obtained 98.07% for non-subject and 81.26% for subjects [31]. For both schizophrenia and healthy adults, machine learning techniques was utilized to identify the EEG features for prediction of the working memory by Chen et al. reporting a highest classification accuracy of 87% [32]. In this work, after feature extraction, optimization techniques are used and then it is classified with many classifiers for schizophrenia classification from EEG. The block diagram of the work is shown in Fig. 1. The paper is organized as follows. The materials and methods are discussed in section 2, followed by the usage of feature extraction techniques in section 3. It is followed by the optimization techniques for feature extraction in section 4. Section 5 explains the usage of classifiers for classification and section 6 gives the results and discussion and section 7 gives the conclusion.

II. MATERIALS AND METHODS
The EEG signals was collected from 14 patients suffering from schizophrenia and they were obtained from the Institute of Psychiatry and Neurology in Warsaw, Poland [12]. For the experiment, seven males and seven females with average ages of 27.9 + 3.3 years for male and 28.3 + 4.1 years for female was selected. 14 healthy subjects within the same group are also utilized for the study and they were also obtained from the Institute of Psychiatry and Neurology in Warsaw, Poland. When the participants remained with their eyes closed and in a relaxed state, at a sampling rate of 250 Hz, the EEG was collected for 15 minutes duration. The standard 10-20 International system was used to collect the data. The electrodes were kept at suitable positions and the acquired EEG signals was then divided into segments where the EEG signals are considered to be stationary.
A nineteen channel EEG signals are obtained per subject over the duration of fifteen minutes. Each channel of EEG signals totally consists of 2,25,000 samples which are divided into groups of five thousand sample segments. Therefore, totally each channel now represents the data of matrix [5000 × 45]. For pre-processing, Independent Component Analysis (ICA) was utilized. Then feature extraction is initiated using PLS based non linear regression, EM-PCA, and Isomap methods. The attained features matrix per method are in the form of [5000 × 10]. Then four types of optimization procedures like flower pollination optimization, eagle strategy using different evolution optimization, back tracking search optimization, and group search optimization is utilized to further extract a better represented feature column matrix as [5000 × 1]. This procedure is repeated for all the channels among the subjects.

III. FEATURE EXTRACTION
As the EEG signal is quite chaotic, to explain the most important features of the morphology of the EEG signals, nonlinear features are utilized. Instead of analyzing regular features like Detrend Fluctuation Analysis (DFA), Hurst exponent, Recurrence Quantification Analysis (RQA), Entropy, Fractal Dimension, Kolmogorov complexity, Hjorth component, Lempel-Ziv complexity, Auto Regressive (AR) Coefficients, Wavelet transform, Eigen vectors etc. [33], here in our paper PLS based Nonlinear regression features, EM-PCA features and Isomap features are extracted before optimization is proceeded. The feature extraction based on Isomap was implemented for applications like video manifold [34], semi-supervised local multi-manifold computation [35] and electromechanical equipment fault prediction [36]. The feature extraction based on EM-PCA was implemented for applications like face recognition [37] and classification in BCI [38]. The feature extraction based on PLS-NLR is useful in applications like development of Relevance Feature vector machine [39] and denoising application [40]. Owing to the versatility of these feature extraction techniques, it has been preferred in our works rather than the conventional techniques.

A. PLS BASED NON LINEAR REGRESSION FEATURES
Partial Least Square (PLS) method is quite advantageous to perform ordinary multiple linear regression as the collinearities in the predictor variables are assumed here which is nothing but the combination of the original input data in a linear manner [41]. Based on a covariance criteria, the input variable matrix is decomposed and it is mainly relied by the PLS. The latent variables or factors are found out by the PLS that are correlated with the output variables and descriptors of the input variables. Using the following linear equation, the concentration of the component is expressed as follows: where W denotes an input matrix of wavelength signals, A represents a matrix containing regression coefficients and the bias vector is represented by G. The matrix A has the following form: where the linear combination of input and output variables are expressed by the latent variables T and V . By means of mapping the input data into a high dimensional space, the enhancement of the regression ability of a linear model can be done easily. A prediction in high-dimensional feature space is realized by the kernel technique without an explicit mapping of the original space. In the original space, the function of two elements is described by the kernel which is concerning to be the dot product of them in the feature space. For a linear algorithm, a kernel extension can be completed by means of replacement of the dot product calculation of elements. Therefore a non linear extension of PLS is kernel PLS. A non linear mapping : w ∈ R N → (w) ∈ G is utilized to transform the original data into a feature space. A non linear PLS is obtained for the original input data when the construction of a linear PLS regression is done. The calculation of the kernel gram matrix K can be done in the following form: The regression model with the component concentration is expressed as where Z and Z represent the output variables for both the validation set and the calibration set, the matrix representation of the validation variable feature space mapping is represented as y , K y is the matrix comprising of K ij = K w i , w j , where for the validation set and the calibration set, w i and w j form as the input variables. When the kernel function is selected, the determination of the non linear regression is done. VOLUME 8, 2020 The feature of high-dimensional space mapping is determined by the Kernel and the regression performance of it is affected. To develop a strong nonlinear regression model, different kernels are utilized generally such as Linear, Gaussian, Polynomial, Inverse Multi quadratic, semi local, exponential, rational, Knode etc. In our work, Gaussian Kernel is used and is expressed as In this type of feature extraction, there are 4 substages: Derivation of EM step, orthogonalization step, data projection step and PCA Eigenspace generation step For preprocessing, estimation of mean vector µ and mean subtraction to compute the difference between the input data and mean is done. From a large amount of high dimensional data, EM-PCA can easily extract a few Eigen vectors and Eigen values [42]. For a q dimensional variables, the covariance structure can be captured with the operation less than q(q + 1) 2 dimensions.
(1) Derivation of EM Step: Assume the Eigen vector matrix V . To estimate the input parameters V for orthogonalization, this process is used. E-step and M-step are the two important steps used here. Unless the differences between the variance is less than the threshold value or is equal to the threshold value, the EM step is carried out repeatedly. The eigen vectors will be selected as the input in a random manner before entering the step and is expressed as; (2) Orthogonalization: The Gram-Schmidt orthonormalization process is carried out here. The first vector is normalized initially and then the remaining vectors is transformed into weighted normalized vectors after subsequent iterations.  (1) Mean vector estimation: The input data is centered around the mean and is represented as The input data is projected on V : Z = V T Y (10) On Z , the PCA is performed. Obtain V and λ (11) Rotate V for V : V = VV (12) The eigen values λ = λ are determined.

C. ISOMAP FEATURES
When the data points are present close to a low dimensional non linear manifold embedded in a high dimensional space, and linear approximation cannot be used to adequately represent the non linear structure, the standard Multidimensional Scaling (MDS) cannot be used. Recovering the low-dimensional structure of a non linear manifold is quite difficult with MDS. Therefore, Isomap has been designed to discover the structure of high dimensional data and then trace its embedding in a low-dimensional Euclidean space making it as a class of non linear embedding schemes. In Isomap, instead of considering the Euclidean distance, the geodesic distances between the points are extracted [43]. By constructing a sparse graph, the computation of the geodesic distances are done in which every node is connected to its closest neighbours. Between each pair of node, the geodesic distance is considered to be the shortest path length in the graph so that the connections would be more feasible. Then to the classical MDS, these approximate geodesic distances are then utilized as input. The algorithm for the extraction of features through Isomap is expressed in Algorithm 2. From the Fig. 2 it is observed that the histogram plots of the EM-PCA extracted features for normal case shows the Gaussian normal behavior for the underlying EEG signals.

D. PARAMETRIC ANALYSIS ON EXTRACTED FEATURES
It is observed from the Fig. 3 that the histogram occupies the entire region with ups and downs as well as outlier

Algorithm 2 Isomap Feature Extraction
Step 1: Neighborhood graph construction: On the manifold M , the points which are neighbours to each other is determined. Within a fixed radius ε(Isomap), two simple techniques are used to connect every point to all the points or to all its K nearest neighbours (K Isomap). Over the data point, the representation of these neighborhood relationships are done as a weighted graph W with respective weight edges e y (i, j) between the neighboring points.
Step 2: Shortest path computation: On the manifold M , the geodesic distances e M (i, j) is estimated by Isomap between all pairs of points by means of computation of their respective shortest path distance e W (i, j) in the graph W .
Step 3: Construction of d-dimensional embedding: The classical MDS is applied to the matrix of graph distance E W = e W (i, j) by means of construction of an embedding data in a low-dimensional Euclidean space Z so that estimated intrinsic geometry of the manifold is preserved. K-Isomap is used here and so KNN algorithm is used in step 1 of this algorithm 2. members. Fig. 3 demonstrates the non-Gaussian and sparse nature of histogram for the schizophrenia case. Fig. 4 indicates the Cumulative Density Function (CDF) plot for the Isomap features for Normal case. The Fig. 4 will be approximated as a monotonically increasing sigmoid function as well as Gaussian function. This is the nature of the Isomap features in the normal case. Fig. 5 is associated with the Cumulative Density Function (CDF) plot for the Isomap features for Schizophrenia case. As shown in the Fig. 5, the curve is approximated as non linear sigmoid function with the presence of discontinuities and also non-Gaussian one. This is the nature of the Isomap features in the Schizophrenia case.   The statistical parameters such as mean, variance, skewness, kurtosis, geometric mean, harmonic mean, sample entropy and approximate entropy with different features VOLUME 8, 2020 for schizophrenia cases and normal cases are expressed in Table 1. In addition to that, the Canonical Correlation Analysis (CCA) with different features for schizophrenia and normal cases are expressed in Table 2.
The significance of the features like PLS non linear regression, EM-PCA, and Isomap among the schizophrenia and normal groups can be analyzed for non linearity and overlapping through the extraction of statistical features like mean, variance, skewness, kurtosis, Geometric mean, Harmonic mean and non linear features such as Sample entropy, Approximate entropy, Renyi entropy, Fuzzy entropy, Shannon entropy and Singular Value Decomposition (SVD). Table 1 demonstrates the average of parameters at different features for Schizophrenia and Normal Cases. It is observed from the Table 1 that all statistical parameters and entropies among the schizophrenia and normal cases are numerically overlapped and also exhibits the presence of non linearity as projected by higher values of variance and kurtosis. The sample entropy and approximate entropy indicates the peaked and trough regions of the features among the schizophrenia and normal groups. Hence it is worth to investigate on the correlation property of the features among the classes. One such type of analysis is Canonical Correlation Analysis (CCA) which will act as bench mark parameter for the features among the classes. Table 2 shows the CCA of different features like PLS non linear regression, EM-PCA and Isomap for Schizophrenia and Normal Cases. As observed in the Table 2, the average value of CCA is very low and therefore CCA exhibits the existence of no correlation among the features in the two classes. Hence it is wise to perform heuristic optimization techniques on the extracted numerical, non linearly overlapped uncorrelated features to attain the segregation of the features among the classes. Since EEG signals are complex and non linear in nature, these signals will exhibit erratic and random response which can be conceptualized using chaotic theory. Chaos is a field of study which is well known as nonlinear dynamics. A nonlinear system is represented by the nonlinear time domain equations comprising the dynamic property of the variables in their non linear form. Correlation Dimension (CD) is a famous study in chaos theory and it is calculated for EEG signals in this study. The Table 3 shows the average correlation dimension values for extracted features among the normal and schizophrenia cases. It is evident from the Table 3 that the correlation dimension values for both the classes are separated. The higher value of CD is due to the presence of randomness in the features. As for the Schizophrenia cases the low value of CD is exhibited by the existence of dampness in the features.
As reported in the Tables 1 and 3, the extracted features are uncorrelated as shown in the results attained from Correlation Dimension values but the features are non linear and overlapped. The dynamics of the features are explored by calculating the entropy values and the Singular Value Decomposition (SVD) values. These calculated components indicate that the features are in need of optimization techniques for further processing. Therefore, a simple threshold may not be useful in the classification.

IV. NATURE INSPIRED OPTIMIZATION ALGORITHMS
The extracted features are then optimized through nature inspired optimization algorithms. The main purpose of any optimization algorithm is to search and locate the best solution to a particular problem. With the help of various agents, the search process is done generally which in turn evolves iteratively depending on certain rules of the algorithm involved. Therefore, in our work, four optimization algorithms are used specifically. The degrees of exploration with which the members of the group or search space can move across can be ascertained well with these four algorithms. In engineering, academia and industry, multi-objective optimization is quite challenging to solve and therefore sophisticated techniques were implemented to tackle it in an efficient manner. Applications of these four optimization algorithms have been used in various applications like engineering, medicine, astronomy, banking, financial risk management etc [44], but this is a new attempt to use them for schizophrenia EEG signal classification.

A. FLOWER POLLINATION ALGORITHM
This algorithm is developed and inspired from the flower pollination process of the flowering plants and it is usually extracted to multi-objective optimization [45]. The four rules are utilized in this algorithm for simplicity reasons.
1) Rule 1: The process of global pollination includes biotic and cross-pollination, and pollen carrying pollinators progress in a way that obeys Levy flights. 2) Rule 2: Self pollination and abiotic pollination are used for local pollination. 3) Rule 3: Flower constancy are developed by pollinators such as insects which is equivalent to a reproductive probability. The reproductive probability is proportional to the similarity of the two involved flowers here. 4) Rule 4: With the help of a switch probability p ∈ [0, 1], the controlling of interaction between the local pollination and global pollination can be done easily. Pollinators such as insects carry the flower pollen gametes in the global pollination step and as a result the pollen can travel over very long distances because the insects can fly and cover a much longer distance range. Therefore, the mathematical representation such as rule 1 and rule 3 is expressed as where z t j is the pollen j or the solution vector z j at iteration t and h * is the current best solution found among all the solutions at the current iteration. To control the step size, the scaling factor γ is used. In reality the step size parameters is L (λ), to be more specific it is the Levy flight based step VOLUME 8, 2020 size that indicates the pollination strength. With different distance steps, a long distance is covered by the insects and a Levy flight is utilized to mimic this specific characteristic more efficiently. Therefore, L > 0 is drawn from a Levy distribution and is represented as The standard gamma function is denoted by the (λ) and for large steps, s > 0, this distribution is valid. Theoretically, it is required that |s 0 | 0, but in particular it can be as small as 0.1. The generation of the pseudo random step size is not important to object the Levy distribution. Generally, the Mantegna algorithm is utilized for drawing such random numbers by using Gaussian distributions X and Y by the following transfer function: Here X ∼ 0, σ 2 implies that the samples are obtained from a Gaussian normal distribution with zero mean and a σ 2 variance. The representation of rule 2 and 3 for local pollination is done as where z t k and z t l are the pollen obtained from different flowers of the same plant species. The flower constancy is easily mimicked in a limited neighborhood easily. If z t k and z t l come from the same species, this equality can become a local random walk if ∈ is drawn from a uniform distribution in [0, 1]. The activities of the flower pollination can occur in both local and global scales. The flowers which are adjacent to each other in the not so far away neighborhood have high chances to be pollinated by local flower pollen than those far away. The switch probability (rule 4) is used to mimic this feature and can be used to switch between the global pollination to local pollination. Here p = 0.8 is used as an initial value. The procedure of the flower pollination algorithm is expressed in pseudocode 1.

B. EAGLE STRATEGY USING DIFFERENT EVOLUTION ALGORITHM
One of the famous metaheuristic strategy for optimization is Eagle strategy [46]. A combination of intensive local search and crude global search is used by the Eagle strategy which employs varies algorithms to match different applications. Utilizing a Levy flight random walk, the global search space is explored initially by this strategy. Once a promising answer is found out, then utilizing a local optimizer such as hill climbing or DE, an intensive local search is employed. Here in this work, DE is used. Then in a new region again this two stage method is initiated which starts with the exploration of a new global space followed by a local search. A balanced

Pseudocode 1 Flower Pollination Algorithm
Objective min or max f (z), z = (z 1 , z 2 , . . . , z d ) Initialize a population of q flowers with random solutions Find the best solution h * in the initial population Define a switch probability p ∈ [0, 1] While (t < MaxGeneration) For j = 1 : q(all q flowers in the population) If rand < p Draw a (d-dimensional) step vector L from a Levy distribution Global pollination via z t+1 and good tradeoff between global search and fast local search can be obtained for different search stages and therefore any algorithm of our choice can be utilized here. Therefore to produce good results, the combination of various algorithms is efficiently utilized here. The switch between the local and the global search is controlled by the only parameter s e . It serves the dual purpose of exploration and exploitation. It is a simple strategy or method and not an algorithm. To explore the search space in a much more effective and diverse manner, the algorithm utilized for the global exploration should have more randomness. As the system converges, the speed also rises. To utilize the local exploitation in an intensive manner, an efficient local optimizer is used. With a less number of functions, the main intention is to reach the local optimality as fast as possible. The procedure of the eagle strategy using different evolution is expressed in pseudocode 2.

C. BACKTRACKING SEARCH OPTIMIZATION ALGORITHM
The backtracking is one of the stochastic search technique [47] and is shown in Pseudocode 3. For solving different optimization problems, Backtracking search algorithm is used as it has a single control parameter and a very simple structure. It is a population-based technique and it has a memory where population from previous generations are present which helps in the analysis and generation of search direction matrix. Three basic genetic operators are present in this bioinspired technique such as mutation, cross-over and selection. A random mutation strategy is employed in this BSA that uses one direction individual for every target individual and is expressed as where A represents the current population and oldA represent the historical population. C is a coefficient which helps to control the amplitude of the search direction matrix (oldA − A). A complex cross over strategy which is non uniform in nature is used by BSA. In the cross over process, there are basically 2 steps, initially, a binary integer-values matrix of size M × Q is generated to specify the mutant individual which is to be explicitly manipulated by the usage of relevant individual. Here M represents the population size and Q represents the problem dimensions. Secondly, the relevant individuals are used to update the relevant dimensions of the mutant individuals. For BSA, there are two kinds of selection operator. For calculating the search direction, the historical population is selected which is employed by the first type selection operator. The current population is replaced by the historical population when the random number is comparatively smaller. The other type of selective operator is to determine the best individuals to get inside the next generation. Best resource searching is done based on (16) If best resource is not found, then stay in the current position only

Pseudocode 3 Backtracking Search Optimization Algorithm
If producer cant search a better area, then change the angle by (17) Scrounging Choose some of the group members randomly on the scrounger Dispersion Head angle generation using (16) Obtain a random distance based on (18) and then move to a new point using (19) Search for a better solution Final optimum solution is found out End

D. GROUP SEARCH OPTIMIZER ALGORITHM
In Group search optimizer, there are 3 main evolution operations such as producing, scrounging and dispersion [48]. The procedure is given in the Pseudocode 4. In this algorithm, A m j is the j th member at the m th iteration, ϕ m j is a head angle and D m j is a unit vector. The scanning of three points expressed in the following equations are scanned by the producing operator as where the maximum search angle is expressed as θ max ; the maximum pursuit distance is experienced by q max . Under the normal distribution, r 1 and r 2 are the two parameters generated. r 1 and r 2 are the real numbers in the interval [0, 1]. If the changing of the current point is done, then the angle is represented as follows where α max represents the maximum turning angle. Otherwise the angle is fixed and is represented as where b is a constant. Around 75% of candidates for the remaining of the population is selected in the scrounging operator and in the dispersion operator, random walks are VOLUME 8, 2020 utilized. The random distance is expressed as The expression for the new point is written as

Algorithm 3 Real Adaboost Algorithm
(1) The initial weights w l,1 = 1 N , where (2) The following functions are performed for t = 1, 2, . . . , T (a) To divide the training set P into Q partitions, based on the weighted instances, a weak classification is trained. A partition represents each leaf of the CART. For a partition P k t , where k ∈ {1, 2, . . . , Q}, W k t+ and W k t− are computed as follows: (b) The weak hypothesis for each partition P k t is computed as For every training instance x l , the weak hypothesis f t (x l ) be f k t (x), where k is the index of partition that x l falls into. Weight updation is expressed as (3) Assume F T (x l ) = T t=1 f t (x l ) and the strong classifier output is expressed as The value of the weak hypothesis in Adaboost are +1 or −1. In real Adaboost, they are real numbers. The absolute value of f k t (x) indicates a prediction confidence. (1) The initial weights are set as w l,1 = 1 N , where l = 1, 2, . . . , N (2) The following tasks are performed, t = 1, 2, . . . , T (a) Based on the weighted instances, the weak classifier is trained and then W k t+ and W k t− are calculated for every partition P k t (b) The inverted weightw l,t is calculated as are computed for each partition P k t as follows: For every partition P k t , the weak hypothesis is represented as The weak hypothesis f t (x l ) equals f k t (x) for the instance x l . (c) The weights are then updated as (3) Assume F T (x l ) = T t=1 f t (x l ); Output the strong classifier as To mitigate the contribution of weak hypothesis, inverted distributions are used by Modified Adaboost classifier.
features after the optimization is prevalent. But the presence of non linearity still makes the optimized feature to devise better segmentation by a chosen group of classifiers.   Table 6 shows the CCA for various Optimization techniques with different features among Schizophrenia and Normal Cases. If the CCA value is greater than 0.5, then it exhibits correlation among the classes. As indicated from the Table 6, except for PLS non linear Regression with Flower pollination optimization and Eagle strategy using different evolution optimization, all the other methods preserve the non-correlative behavior of the extracted features.

V. CLASSIFIERS
The optimized values are fed to the classifiers for schizophrenia classification. The classifiers used here are Adaboost classifier and its variants followed by Naïve Bayesian classifier and its variants.

A. ADABOOST CLASSIFIER
One of the techniques to attain an accurate classification is through means of ensemble mode-based classification such as Adaboost [49]. The learner's instability is greatly attributed to the effectiveness of the boosting. A base classifier is employed sequentially for boosting based on a weighted version of the training sample set. The weak learner is referred to as the base classifier. To solve complicated behavior classification problems, boosting a set of weak learners can be done easily. As a weak learner, decision tree is applied. In Adaboost, equal weights are assigned to all the training samples. According to the weak learning error rate obtained in the previous iterations, the weights in each epoch are updated.

B. REAL ADABOOST
A most generalized version of Adaboost is called real Adaboost [50] and is explained in algorithm 3 is follows:

C. MODEST ADABOOST
To suppress the generalization error of gentle Adaboost, modest Adaboost [51] was proposed and explained in Algorithm 4. VOLUME 8, 2020

D. NAïVE BAYESIAN CLASSIFIER
In machine learning modalities, one of the simplest and easiest to implement is NBC [52]. Based on the Bayes theorem, with assumptions of good independence among the features, it is expressed as where P ( h| y) indicates the feature probability of the target class of a given feature. The prior probability of the class is expressed as P(h) and the likelihood of the probability of feature class is expressed as P( y| h). The prior probability of the feature is expressed as P(y).

E. GAUSSIAN NAïVE BAYESIAN CLASSIFIER
Assumption of a known a priori and then reducing the probability of classification error is the essential principle in Bayes technique [53]. From the available training data set, the estimation of the class-conditional density function can be done. The updation of the training set conditioned density function is done during Bayesian estimation which allows the conversion to a posteriori density from a priori information. For the two class patterns h 1 and h 2 , the Bayesian rule is implemented as P h j y = p y| h j P y j 2 j=1 p y| h j P h j The 2 Bayes classification rules are as follows: , y is assigned to h 2 With µ j as the mean value and z j as the covariance matrix, for the Gaussian probability distribution function [52], it makes it more feasible for analysis and is expressed as A monotonic logarithmic discriminant function is chosen and is analyzed as   For each class from the training data, the mean vector and covariance matrices of the discriminant function are calculated, and a hyperplane is used to separate the data.

F. OPTIMIZED NAïVE BAYESIAN CLASSIFIER
The bag-of-token model was utilized to optimize the N Bayesian standard algorithm here [54]. Based on the VOLUME 8, 2020 non-negative number of occurrences of token q in the observation, the value of each feature q is calculated. The estimated probability is expressed as where in class m, the weighted number of occurrences of token q is expressed as β 1 and in class m, the total weighted numbers of occurrences of all token q is expressed as β 2 , the total number of instances in the training set is expressed as N .
Based on the estimated posterior probability, the class label for each observation is predicted by the classifier. Therefore, with the maximum posterior probability each observation is assigned to the class.

VI. RESULTS AND DISCUSSION
The performance metrics parameters computed in this work is done by means of calculating the Perfect Classification (PC), Missed Classification (MC) and False Alarm (FA) respectively. The overall performance of the classification system is expressed in terms of classification accuracy and is written as The overall incorrect classification of the classification model is known as classification error and is calculated as The implication of diagnostic test is positive and if the subject has disease, it is termed as the sensitivity or the true positive rate is expressed as The implication that the diagnostic test is negative, and the person is healthy is termed by specificity and is  expressed as The expression for Performance Index is written as follows Good Detection Rate (GDR): The impact of a successful detection is represented by GDR and its mathematical criteria is expressed as Mathematically, the Mean Square Error (MSE) is expressed as follows: where O i indicates the observed value at a specific time, T j denotes the target value at model j; j = 1 to 19, and N is the total number of observations per patient in our case, it is 5000. The feature extracted values are then fully utilized by the testing and training classifiers in this research. The classifier MSE values are drastically reduced to a least level as the progress of the training section was done in a quite regressive manner. The training of the classifiers was performed with a zero-training error of MSE. In this work, the type of cross-validation methodology utilized was K-fold. Initially, the dataset is divided into 'k' equal size points. For the training of the classifiers, k −1 groups are utilized for their performance assessment in every step, the remaining step is utilized. The validation repetition is done for k number of times. The assessment of the classifier computation performance is done based on the k results. In our work, the value of k is chosen to be 10. Therefore 90% of the data was used for training and 10% of the data was used for testing. The process was repeated over 10 times for every fold of the methodology. Before the selection of the new sets both for both testing and training for the next and upcoming cycles, the random division of all the instances in the training and testing groups for the current cycle is done over the whole dataset. Ultimately at the final stage of the 10-fold process, the average values of the performance metrics are computed.   Table 8 shows the consolidated result analysis of Isomap with Group search optimization for schizophrenia cases. Table 9 gives the average Accuracy (%) among the Classifiers at Various Optimization Techniques with different features for Normal Cases. Table 10 gives the average Accuracy (%) among the Classifiers at Various Optimization Techniques with different features for Schizophrenia Cases. Table 11 gives the average performance measures among the classifiers at various optimization techniques with different features for normal case. given in Table 16. The average performance of parameters among the classifiers at various optimization techniques with different features for normal cases is given in Table 17. The average performance of parameters among the classifiers at various optimization techniques with different features for schizophrenia cases is given in Table 18. Table 7 shows the consolidated result analysis of PLS non linear regression with flower pollination optimization with six types of classifiers for normal cases. Since the optimization is done through the nature inspired algorithms and the classifier's output is plugged with False alarm in the range as low of 3.64% to higher value of 47.55%. Real Adaboost Classifier attains higher parametric values like accuracy of 98.17%, GDR of 96.35% and PI of 96.205%. The NBC Classifier is ebbed to the lower parametric values like accuracy of 76.22%, GDR of 52.44% and PI of 9.277% and this lower performance indicates the presence of False alarm of 47.55%. As demonstrated in the Table 7 that all the classifiers have nil missed classification in this optimization technique and this makes the classifiers with low sensitivity and low threshold, and with high specificity one.
The consolidated result analysis of Isomap with group search optimization with six types of classifiers for schizophrenia cases is tabulated in Table 8. Since the optimization is done through the nature inspired algorithms and the classifier's output is plugged with False alarm in the range as low of 1.04% to higher value of 48.25%. Gaussian NBC Classifier attains higher parametric values like accuracy of 98.77%, GDR of 97.52% and PI of 97.45%. The Optimized NBC Classifier is ebbed to the lower parametric values like accuracy of 75.87%, GDR of 51.75% and PI of 6.747% and this lower performance indicates the presence of False alarm of 48.25%. As indicated in the Table 8 that all the classifiers have nil missed classification in this optimization technique and this makes the classifiers with low sensitivity and low threshold, and with high specificity one. Table 9 depicts the consolidated results of Accuracy (%) among the Classifiers at Various Optimization Techniques with different features for normal cases.
It is observed from the Table 9 that the Real Adaboost classifier attains higher accuracy of 98.176% in PLS non linear regression feature extraction with flower pollination optimization method and the low accuracy of 76.22359% for NBC Classifier. Likewise for EM-PCA feature extraction with back tracking search optimization the classifier optimized NBC attained higher accuracy of 95.83% and low accuracy of 76.87% is reached for Gaussian NBC classifier in EM-PCA feature extraction with flower pollination optimization method. As in the case of Isomap feature Extraction Modest Adaboost classifier with back tracking search optimization attained higher accuracy of 98.77% and low accuracy value of 76.67% is reached in Adaboost classifier with flower pollination optimization method. Table 10 depicts the Consolidated results of Accuracy (%) among the Classifiers at Various Optimization Techniques with different features for Schizophrenia Cases. It is observed from the Table 5 that the optimized NBC classifier attains higher accuracy of 98.176% in PLS non linear regression feature extraction with flower pollination optimization method and the low accuracy of 76% for Gaussian NBC Classifier. Likewise for EM-PCA feature extraction with back tracking search optimization the classifier Gaussian NBC attained higher accuracy of 97.65% and low accuracy of 76% is reached for Modest Adaboost classifier in EM-PCA feature extraction with eagle strategy using different evolution optimization method. As in the case of Isomap feature Extraction Real Adaboost classifier with flower pollination optimization method attained higher accuracy of 98.77% and low accuracy value of 75.875% is reached in Optimized NBC with Group search optimization method.
To study the effect of feature extraction and optimization techniques, irrespective of individual classifier performance an averaging of the parameters like accuracy and GDR is initiated among the six classifiers. The same is shown in Table 11 and Table 12. From Table 11, it is understood that a highest classification accuracy of 90.46% and a GDR of 80.92% is obtained if PLS non linear regression features are optimized with flower pollination algorithm for the normal cases. And also the lowest accuracy of 82.11% and GDR of 64.23% is attained in the EM -PCA feature extraction with group search optimization method. From Table 12, it is understood that a highest classification accuracy of 90.105% and a GDR of 79.35% is obtained if PLS non linear regression features are optimized with eagle strategy using different evolution optimization for schizophrenia cases. As shown in Table 12, EM-PCA feature extraction with group search optimization method reached the low value of accuracy 76.82% and GDR of 39.58% for schizophrenia cases. This is due to the accumulation of more false positive in the schizophrenia case.
The parameter, Performance Index (PI) indicates the influence of FN and FP in the classifiers. Table 13 and  Table 14   features and back tracking search optimization is classified with Modest Adaboost classifier. It is also analyzed that a least error rate of 8.33% is obtained when EM-PCA features and back tracking search optimization is classified with Optimized NBC. Also, a least error rate of 1.04% is obtained when Isomap features and back tracking search optimization is classified with Modest Adaboost classifier.
From Table 16, it is analyzed for schizophrenia cases that a least error rate of 3.64% is obtained when PLS non linear features and flower pollination optimization is classified with optimized NBC classifier. It is also analyzed that a least error rate of 4.68% is obtained when EM-PCA features and back tracking search optimization is classified with Gaussian NBC classifier. Also, a least error rate of 1.04% is obtained when Isomap features and flower pollination optimization is classified with Real Adaboost classifier.
To study the effect of feature extraction and Classifier performance irrespective of the optimization methods, an averaging of optimization results is initiated for the both Normal and schizophrenia Cases. The results are tabulated in the Table 17 and Table 18. From Table 17, it is analyzed for normal cases that when PLS non linear features are classified with classifiers, a high PI of 77.26% was obtained, a high classification accuracy of 91.47% was obtained, a high GDR of 82.94% along with an average error rate of 17.05% was obtained if classified with Adaboost classifier. If EM-PCA features are classified with classifiers, then a high PI of 73.05%,along with a high classification accuracy of 89.27%, GDR of 72.66% and error rate of 21.46% was obtained if classified with Optimized NBC classifier. As in the same EM-PCA features, high GDR of 75.36% with the error rate of 24.63% is attained in the Real Adaboost  Classifier. This peculiar situation is due to the averaging effect of optimization methods for EM-PCA feature extraction technique. If Isomap features are classified with classifiers, then a high PI of 82.08% and a high classification accuracy of 92.905%, GDR of 84.81% with Error Rate of 14.18% is obtained when classified with NBC classifier.
From Table 18, it is analyzed for schizophrenia cases that when PLS non linear features are classified with classifiers, a high PI of 68.34% was obtained, a high classification accuracy of 89.55% was obtained, a high GDR of 78.9% along with an average error rate of 20.89% was obtained if classified with Modest Adaboost classifier. If EM-PCA features are classified with classifiers, then a high PI of 66.13% along with a high classification accuracy of 88.99% with low GDR of 66.67% and error rate of 22.01% was obtained if classified with Real Adaboost classifier. High GDR of 73.38% with the modest error rate of 26.56% is attained in the Gaussian NBC classifier for EM-PCA feature extraction. The arrival of this situation is again attributed to the averaging effect of optimization methods for EM-PCA feature extraction. If Isomap features are classified with classifiers, then a high PI of 75.55% and a high classification accuracy of 91.106%, GDR of 82.21% with Error rate of 26.87% is obtained when classified with Real Adaboost classifier. To select a better classifier a compromise strategy is needed among the classifier parameters like PI, Accuracy, GDR and Error rate. Generally, lesser the error rate means better accuracy among classifiers. As observed from the results of Table 17 and Table 18 the performance of classifier parameters are compressed due to the averaging effect of the optimization methods for the both cases of normal and schizophrenia. In other words, it implies that the optimization methods enhance the classifier performance to a higher level. The main reason why the combinations of Isomap-Backtracking search optimization-Modest Adaboost classification for the normal cases and the Isomap-Flower pollination optimization-Real Adaboost classification for schizophrenia cases gives the best result is because of the intrinsic property of Isomap algorithm which has been explored well by both Backtracking and Flower Pollination algorithms to optimize the features well with very less redundancy and so upon classification with the versions of Adaboost algorithm, it gives the best result.

VII. CONCLUSION AND FUTURE WORK
Characterized by abnormal behavior, decreased ability to understand reality, strange speech etc, schizophrenia is a very dangerous problem to the human community. People diagnosed with schizophrenia also have additional problems like depression, anxiety, lack of emotional expression and motivation. In this work, a comprehensive analysis of schizophrenia classification from EEG signals is done well with the help of feature extraction, optimization techniques and suitable classifiers. The methodology adopted here in this paper is quite promising and easy to implement. The average performance measures among the classifiers at various optimization techniques with different features for normal Cases and schizophrenia cases were explained in the work. The average PI and average error rate too was computed and presented among the classifiers at various optimization techniques with different features for normal cases and schizophrenia cases in this work. The individual results show that for normal cases, Isomap features when optimized with Backtracking search optimization algorithm and classified with Modest Adaboost classifier, a classification accuracy of 98.77% is obtained. For schizophrenia case, individual results show that when Isomap features are optimized with Flower Pollination optimization algorithm and classified with Real Adaboost classifier, a classification accuracy of 98.77% is obtained. Future works aim to work with different feature extraction techniques, optimization techniques and a plethora of other machine learning techniques to classify the schizophrenia from EEG signals.