An Integrated Approach for Ovarian Cancer Classification With the Application of Stochastic Optimization

Ovarian Cancer is a type of cancer that begins in ovaries posing a serious threat to women. As a result, it leads to abnormal cells which has the ability to spread to other regions of the body. A highly useful diagnostic and prognostic data for ovarian cancer research is provided by the microarray data. Typically, genes with tens of thousands of dimension are present in the microarray data of ovarian cancer. There is a systematic methodology required to analyze this data and so it is important to select the most important genes or features for the entire data to avoid the computational complexity. In this work, an integrated approach to feature selection is done by two consecutive steps. Initially, the features are selected by the standard gene selection techniques such as Correlation Coefficient, T-Statistics and Kruskal-Wallis test. The selected genes or features will be further optimized by four suitable stochastic optimization algorithms chosen here such as Central Force Optimization (CFO), Lightning Attachment Procedure Optimization (LAPO), Genetic Bee Colony Optimization (GBCO) and Artificial Algae Optimization (AAO). Finally, it is classified with five different classifiers to analyze the ovarian cancer classification and the best results are projected when Kruskal Wallis test with GBCO is conducted and classified with Support Vector Machine – Radial Basis Function (SVM-RBF) Kernel technique giving a high classification accuracy of 99.48%. Similar results are also obtained when Correlation Coefficient test with AAO is conducted and classified with Logistic Regression giving a high classification accuracy of 99.48%.


I. INTRODUCTION
To treat ovarian cancer, surgery and chemotherapy are the general possible solutions used [1]. Usually the early stage ovarian cancer does not cause many symptoms [2]. Very few and nonspecific symptoms are caused by the advanced stage ovarian cancer that is mostly mistaken for common benign conditions [3]. The major symptoms of ovarian cancer include abdominal bloating/swelling, weight loss, discomfort in the pelvis area, changes in the bowel habits, quickly felling full after eating and an urgency to urinate [4]. Some of the types of ovarian cancer includes epithelial tumors, stromal tumors and germ kill tumors [5]. The factors that tend to increase the ovarian cancer risk is older age, family history The associate editor coordinating the review of this manuscript and approving it for publication was Seifedine Kadry . of ovarian cancer, estrogen hormone replacement therapy, inherited gene mutations and consideration of the age when the menstruation begins and ends.
On a genomic scale, to globally analyze tens of thousands of genes, the most successful technique of the gene chip emerged and that is called microarray technology [6]. On a planar substrate, the layout of microarray data is a simple ordered array of microscopic elements. The relevant or irrelevant genes with respect to the cancer development is contained in the genotype. Thus, a microarray data can be considered as a device utilized to compare the expression level of the genes and genotype of the patients [7]. The expression level of genes under different conditions can be compared by this technology. As the microarray data contains a huge number of genes, tackling the genes is very difficult due to the curse of dimensionality problem [8]. Therefore, reducing the size of it is highly essential. For the selection of informative genes from microarray data, a lot of approaches have been proposed in literature [9]. Many machine learning techniques with suitable genomic expressions have been successfully implemented in the past for the ovarian cancer classification from microarray data. As the genomic expression is with high dimension, there is a high time complexity involved in it also. Therefore, to get a better understanding into the global gene expression analysis and to get a higher classification accuracy, a systemic and integrated approach has been given in this paper.
The most important literature in the ovarian cancer classification is discussed as follows. For the human ovarian cancer, the microarray analysis of differentially expressed genes is given by Lee et al. [10]. The microarray-based gene expression studies in ovarian cancer was given by Chon and Lancaster [11]. The overview of biomarkers for the ovarian cancer diagnosis was done by Zhang et al. [12]. An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer was done by Lee [13]. Intelligent systems were utilized to analyze the microarray data for classification of ovarian cancer by Jeng et al. [14]. The complementary learning fuzzy neural networks for ovarian cancer diagnosis was done by Tan et al. [15]. For the ovarian cancer microarray data, the dimension reduction was carried out by Chuang et al. [16]. The microarray analysis of ovarian cancer with machine learning was done by Huang et al. [17]. The gene expression patterns in the histopathological classification of epithelial ovarian cancer was done by Zhu and Yu [18]. A Bayesian neural network method was approached to ovarian cancer identification from high resolution mass spectrometry data by Yu and Chen [19]. The diagnosis of ovarian cancer utilizing decision tree classification of mass spectral data was done by Vlahou et al. [20]. The automatic classification of ovarian cancer types from cytological images using deep convolutional neural networks was done by Wu et al. [21]. The machine learning techniques with Fourier transform was evaluated for the classification of ovarian tumors by Mas et al. [22]. A comparative study on various classification techniques for detection of ovarian cancer was done by Nuhic et al. [23]. An application of Artificial Neural Network (ANN) in the early detection of ovarian cancer was done by Zhang et al. [24]. The potentials and limitations of utilizing Bayesian networks for ovarian cancer diagnosis was done by Antal et al. [25]. Based on the historical data of ovary cancer patients, the detection of ovary cancer was done using Decision Trees (DT) classifiers by Osmanovic et al. [26]. With the help of feed forward ANN, the early detection of ovary cancer was done by Thakur et al. [27]. The recent progress in the diagnosis and treatment of ovarian cancer was done by Jelovac and Armstrong [28]. The application of SVM to ovarian cancer classification was done by Kusy [29]. The epithelial ovarian cancer stage subtype classification using gene expression approach was done by Nabawy et al. [30]. Using multicategory machine learning, the intraoperative diagnosis support tool for ovarian tumors based on Microarray data was done by Park et al. [31]. The ovarian cancer classification using bagging and random forest was done by Arfiani and Rustam [32]. A novel online paradigm for ovarian tumor characterization and classification using ultrasound was done by Acharya et al. [33]. To improve the diagnostic accuracy for prediction of ovarian cancer, a three-dimensional power Doppler ultrasound was used by Cohen et al. [34]. The ovarian cancer classification with missing data was done by Renz et al. [35]. For the early diagnosis of ovarian cancer, the efficient fuzzy if then rules from mass spectra of blood samples is extracted by Assareh and Moradi [36]. The feature extraction and analysis of ovarian cancer proteomic mass spectra was done by Meng et al. [37]. The multiple biomarker combination by Logistic Regression was explored for early screening of ovarian cancer by Kim et al. [38]. The organization of the work is as follow. The details of the dataset along with the first level feature selection techniques are explained in section 2. The stochastic optimization techniques used in this work for second level feature selection is employed in section 3. The classification details are given in section 4 followed by results and discussion in section 5 and concluded in section 6.

II. MATERIALS AND METHODS
For the Ovarian Cancer classification, a dataset was used which is publicly available online [39]. There are about 15154 genes here. There are 253 samples totally where Class 1 represents the normal class with 91 samples and Class 2 represents the cancer class with 162 samples. The details of the dataset are tabulated in Table 1. The illustration of the work is found in Fig. 1. The pictorial representation of the work is depicted in Figure 1.

A. GENE SELECTION TECHNIQUES
A few top ranked features are selected here. The main intention of this work lies in extracting the best 5000 Genes from 15154 Genes. The selected 5000 genes will certainly undergo a second level optimization with the help of stochastic optimization techniques.

1) CORRELATION COEFFICIENT
The correlation of genes [40] is represented with various samples and is computed as follows: where the number of records are represented as n r , G indicates the particular gene value. The most useful and VOLUME 8, 2020 informative gene in the dataset is represented as Ig and IG represents the most significant gene for the class level prediction. For the class partitioning problem, the most informative genes are the one having a higher correlation coefficient value.

2) T-STATISTICS
The gene expression dataset with the help of t-statistics approach [41], and hence the ranking of the differentially expressed genes are done. The calculation of the t-statistic for a particular gene is expressed as follows: With respect to the sample type, the relative difference of gene value is represented by t g value. Thus, with the help of top ranked t value, the extraction of some differentially expressed genes is done. The mean and variance of one category of patient samples of size n 1 is represented by mean 1 and var 1 respectively. The mean and variance of the second/other category of patient samples of size n 2 is represented by mean 2 and var 2 respectively.

3) KRUSKAL-WALLIS TEST
For testing and analyzing about the samples spreading from the same partition, a famous non parametric ranking based technique is developed and known as Kruskal-Wallis test [42]. A one-way Analysis of Variance (ANOVA) is equivalent to Kruskal -Wallis. The class membership is usually ignored, and the ranking of the data is done from 1 to N . The ranking of all the data from all the classes together is done here. The score for the Kruskal-Wallis test K w is computed as where the number of examination in class j is represented as n j . The total number of experiments in all classes is represented as N .r j = n j k=1 r jk n j (4) Among all the observations, r jk is the rank of experiment k from class j.

III. OPTIMIZATION TECHNIQUES
The 5000 features selected through the standard genes selection techniques are further optimized with a second level feature selection by means of utilizing stochastic optimization techniques to select the top 50-150 genes. Stochastic optimization techniques are the optimization methods where the random variables are generated and used. In the formulation of the optimization problem itself, the random variables appear for the stochastic problems. Some stochastic optimization techniques have methodologies involved with random iteration too. The generalization of deterministic method for deterministic problems is done by Stochastic optimization methods.

A. CENTRAL FORCE OPTIMIZATION
Here in the entire population, every individual is termed as a probe. Based on the defined masses, the probes are attracted by gravitation. The objects are considered as probes and fitness function is utilized to assess its performance [43]. A solution is represented by each mass and based on newton's universal law of gravitation; the position is adjusted properly by means of navigation. The algorithm mainly consists of 3 steps such as (a) initialization (b) calculation of the acceleration of the probe (c) motion factor. Initially, in the search space, the creation of a population of probes is done. The start position and acceleration vector is assigned to zero. Secondly, based on the Newton universal law, the calculation of the compound acceleration vector of one probe from appropriate components in every direction is done. The user-defined function is Mass and it is obtained from the objective function which has to be minimized. In a ND-dimensional search space with NQ probes y q = y q 1 , y q 2 , . . . , y q ND , (q = 1, 2, . . . , NQ), the operation of the q th probe is computed based on the following formula as where y q t , Z q t represent the position and acceleration vectors for the q th probe at the t th generation.  as represented as t.
In (5), V , α and β do not include the fundamentals of concrete gravitational elements. In order to avoid the probe from flying far away from the search space, detection of the probe positions is absolutely necessary. The probe is pulled back to its original search space if it is out of range. Thirdly, based on the previous calculation of acceleration, depending on the Newtonian formula, the updation of the position vectors of the probes are done. Once the exertion of the acceleration Z q t is done, then the movement of the probe q from y q t to y q t+1 is done in accordance to the motion equation as where the step is represented as t Depending on the last ''mass'' information, the updation of probe position is done as a specific deterministic gradient algorithm. The revelation of the convergence conditions of CFO that it would converge to the optimal point seeked so far is done. The performance of it is pretty good then the predefined one present in the initial distributions is considered.
The algorithm is shown in the following procedure: Step 1: Initialization of parameters: The dimension boundaries ND of the objective function is set, (i.e.), y min , y max , number of probes NQ, acceleration parameters α, β, gravitational constant V .
Step 2: Initialization of population: The initial probe distribution Y , acceleration vector Z , fitness matrix F and position vectors S is set.
Step 3: (Loop on time step) The following are considered. 3.1 The probe position vector S is computed 3.2 The retrieval of the probe which flies out of the boundary can be done 3.3 For the current probes, the fitness matrices are updated 3.4 For the next time step, the acceleration vectors Z is computed.
Step 4: The time step is increased and unless the stopping criterion has been met, the step 3 is repeated.

B. LIGHTNING ATTACHMENT PROCEDURE OPTIMIZATION
A new and famous nature inspired global optimization algorithm is LAPO where the lightning attachment process is mimicked which includes both the upward leader propagation and downward leader movement [44]. Between electrically charged regions of a cloud, the occurrence of a sudden electrostatic discharge happens, and it is called lightning. In a step wise movement, the lightning always progresses towards or away from the ground. The downward leader stops after every step and then to a randomly selected potential point it moves. The potential point which is randomly selected always has a very high value of electrical field. From the sharp points, the upward leader is started and then progresses towards the downward leader. The effect of branch fading lightning feature takes place when the branch charge is lower than a specific value. A final strike occurs when the two leaders joins together, and the neutralization of the cloud changes takes place.

1) TEST POINTS PARAMETER INITIALIZATION
The vital parameters here are the maximum number of iterations Iter max , the number of decision variables n, the number of test points N pop , the upper bounds Z max and lower bounds Z min for decision variables. At the start of the algorithm, these parameters are assigned. An initial population is mandatorily required as it is similar to other Nature Inspired optimization algorithms. In the feasible search space, every population is considered as a test point which serves as an emitting point of either a downward or upward leader. The random initialization of the test points are as follows:

2) MOVEMENT OF DOWNWARD LEADER TOWARDS THE GROUND
The consideration of the test points as the downward leader in this phase is done and it moves down towards the group. For all the test points and its respective fitness value, the average values is computed as follows A random behavior is present in the lightning process, for test point j, the selection of a random point s among the population (j = s) is done. Based on the following rules, the updation of the new test points is done.
(i) If the electric field of point s is greater than the average electric field, then (ii) If the electric field of point s is less than the average electric field, then (iii) The branch sustains if the electric field of the new test points is better than the old one, otherwise it gradually fades. Mathematically, this feature is formulated as

3) MOVEMENT OF UPWARD LEADER
The consideration of all the test points in the upward movement phase as the upward leader towards the cloud is done. The generation of the new test points are done as follows: where Z best and Z worst represent the between and within solutions of the population, E represents the exponent factor that is a function of both the number of iterations Iter and the maximum number of iterations Iter max as: For balancing both the exploration and exploitation capabilities of the algorithms, the iteration dependent exponent factors are significant from a computational point of view. The branch fading feature also happens in this phase similar to the downward movement.

4) PERFORMANCE ENHANCEMENT
For the performance enhancement of LAPO, in every iteration, the average test point replaces the worst test point if the fitness factor is worse.

5) STOP CRITERION
If the maximum number of iterations is satisfied, then the termination of algorithm is done. Otherwise the procedure of downward and upward leader movements is repeated to enhance the performance.

6) THE PROCEDURE OF LAPO
The Procedure of LAPO is given as follows.
(1) Set Iter max , n, N pop, Z max , Z min (2) Test Points Random Initialization Z worst = Z ave (12) end (13) Movement of downward leader towards the group (14) for j = 1 : N pop (15) Random selection Z s,k Z s,k = Z j,k (16) if end (21) Compute fitness value of new test points end (25) end (26) Movement of upward leader (27) for j = 1 : N pop By incorporating the advantages of Genetic Algorithm and Artificial Bee Colony (ABC) algorithm, a new optimization algorithm called GBC was developed for the sake of optimizing numerical problems [45]. The colony of the artificial bees in the ABC algorithm is classified into three different kinds such as employed artificial bees, onlookers' bees, and scouts' artificial bees. The following steps are done in ABC as follows:

1) ABC PARAMETER SETTINGS
Initialization of the main parameters of the algorithm should be done. The population size or solution, the limit parameter (L) and the number of bees that are considered to be double the size of the population size are the parameters considered.

2) INITIALIZATION OF THE ENTIRE POPULATION OF SOLUTION
By means of random generation, the solution with equal size to population size is expressed as where the solution index is represented as j, the decision variable is defined as k, the generation of a random variable between 0 and w min j,k − w max j,k is done as rand[0, 1]. The lower and upper limits of the k th decision variable is represented as w min j,k and w max j,k .

3) POPULATION SOLUTION EVALUATION
To assess the obtained generated solutions, the objective functions are utilized.

4) THE EMPLOYEE BEE
In this phase, a new source of food is being discovered by every employed bee in the surrounding area of its location. Then the movement of the employed bees into its candidate neighbour solutions is done, so that the food source is there to every employed bees in the surrounding environment. The evaluation of the nectar amount in the detected food sources is done. If the nectar amount of the detected source of food is greater than the nectar amount of the present resources of food, then the memorization of the detected food source is done immediately. By the modification of the j th solution, a neighbour solution 'n' can be obtained as expressed in the equation as follows: where s is a solution which is selected from population size randomly and θ is also randomly selected in the range of [−1,1].

5) ONLOOKER BEE
To detect the new food source in the neighbourhood area, onlooker bees are used. The information that has been obtained from the previous phase from the employed bees is made effective use of here in the exploitation process. By means of exploration of their neighbourhood using equation (18), the current solutions are tried to be improved by both the onlooker bees and employee bees. The onlooker bees can select the solutions by exploiting the fitness values according to the following equation as

6) SCOUT BEE
When the detection of food source is done, the employee bee becomes a scout bee so that a new source of food is found in the solution space. To indicate the number of trials, a parameter named limit is used to control the number of scout bees. When the source of food cannot be improvised or developed, VOLUME 8, 2020 then a random determination of a new source of food needs to be done. Thus, in the search space, exploitation and exploration processes should be carried out together.

7) INVOLVEMENT OF GENETIC OPERATORS
By utilizing some genetic operations such as cross over and swap, a new binary version of ABC algorithm is proposed as GBC. To the equations of (17) and (18) in ABC algorithm, the modifications are made, and the generations of the initial solutions is done by equation (20) instead of (17).
where V (0, 1) is a generated uniformly value. Within the following four steps, the integration of ABC with GA for search mechanism is utilized.
i) Two sources of food from the population is selected randomly in the neighbourhood of current food source, so that a proposed solution can be found out ii) Between the current two neighborhoods along with the best and zero food sources the two-point cross over operators are applied in order to generate the sources of children food iii) To the sources of children food, the second operator called swap operator is applied so that the grandchildren sources of food is found out. iV) Among the children and grandchildren food sources, the selection of best sources of food as a neighbourhood source of food is done. Thus, in a binary optimization problem, the performance of this ABC is improved with the inclusion of GA.

D. ARTIFICIAL ALGAE OPTIMIZATION
Mimicking the living styles and behaviours of microalgae, AAA was developed [46]. Microalgae lifestyles such as algal tendency, adaption of the surrounding quality, reproduction etc are considered as the major simulation factors by this algorithm. Three vital process called evolutionary process, helical movement and adaption phase are present in this algorithm. Algal colonies are comprising in this population of this algorithm. When enough light is received by the algal cells in algal colonies, then it grows into a bigger size. When insufficient light condition occurs, then there may not be sufficient algal colony growth. In the helical movement, only towards the best algal colony, the movement of every algal colony would be present. To explain the main process of AAA, assume x j = x j1 , x j2 , . . . , x 1m , where j = 1, 2, . . . , n and the solution in search space is expressed by x j . The following matrix is utilized to represent the algae population as follows: x 11 x 12 . . . . x 1n : : : : : : : : Assuming that the algal colony size of j th colony is C j , where j = 1, 2, . . . , n, and the f (x j ) is represented as the objective function, then C j is updated as follows using the mathematical equations as where the update coefficient of C j represents µ j and the current generation is represented as t.

1) HELICAL MOVEMENT PHASE
The movement of the algal colony is usually in 3D. Using the following equations, the movement of algal colony in 3D is expressed as follows where the movement in 1D is given by (25), say x, (26) and (27) indicates the movement in other dimensions y and z, say l, g and h indicate the random integers generated uniformly between 1 and d, X jg , X jl , X jh simulates the three coordinates of the j th algal colony, k represents the index of a neighbor algal colony, p represents the independent random number in the range of (−1,1), α, β represent the random degrees between 0 and 2π , shear force is represented as sf , σ j indicates the friction surface area of the j th algal colony.

2) EVOLUTIONARY PROCESS PHASE
To get a most feasible solution, the algal colony X j becomes larger as it progresses towards a feasible solution. This simulation process is expressed in the following equations as Biggest = arg max size X j , j = 1, 2, . . . ., n Smallest = arg min size X j , j = 1, 2, . . . ., n VOLUME 8, 2020 where the biggest algal colony is expressed as Biggest and the smallest algal colony is expressed as Smallest. The algal cell which is randomly selected is indicated by a random value k.

3) ADAPTATION PHASE
When the growth of the algal colony is not sufficient, it can adapt itself to the surrounding environment. After the adaptation movement, the objective function value is considered as inferior or superior. The highest starvation value is obtained after the algal colony movement is obtained after the algal colony movement is completed as shown in (31). With an adaptation probability A p , the adaptation to the biggest algal colony is represented as: For the algal colony phase, the adaptation phase of AAA is expressed as follows: (32) where the algal colony index which has the highest starvation value is expressed by c. To assess the starvation level of algal colony X j , starvation X j is utilized, where the algal cell index is represented as k. A p represents the adaptation probability and it considers value between 0.3 and 0.7. The Rand1 and Rand2 generates random values between 0 and 1.

IV. CLASSIFICATION PROCEDURES
The following classification models are utilized here. All the models used here belongs to well established group of Machine Learning Algorithms.

A. LINEAR DISCRIMINANT ANALYSIS (LDA)
A very simple with great utility is possessed by LDA [47]. When the two classes c = {0, 1} is assumed to have a Gaussian distribution with a specific mean µ c , and the similar covariance matrix is shared by them, then the Linear discriminant function δ c (y), c = {0, 1} is expressed by where the frequency of occurrence of class labels is expressed by π c . The class labels which are predicted is expressed as LDA is thus conceptually very robust, simple, and fast. It is very popular in high dimensional problems too.

B. NEAREST NEIGHBOUR CLASSIFIER
A weighted average over the labels y i is considered for these observations y i in the training set that are near to the query point q. This is expressed as where the k-element neighborhood of q is denoted by N k (q), d i denotes the related distance in the given metric. The metric choice and the number of neighbours are the parameters of the model. As this approach is based on concept of similarity, a very intuitive approach is provided to classification problems by K-Nearest Neighbor (KNN) [48].

C. LOGISTIC REGRESSION MODEL
For binomial distributed dependent variables, Logistic Regression model is extensively used for medical applications [49]. It is almost similar to LDA except for the way of estimation of the linear coefficients. The probability of the dichotomic variable v can be computed by the binary Log. Reg. model from the n independent variables z: where With the help of a second order gradient descent the estimation of model coefficients is done. As these calculations are highly memory and time consuming, it could be a hassle to apply it in high dimensional problems.

D. SVM
In machine learning, SVMs are one of the powerful tools utilized for many applications [50]. A hyperplane is created by the SVM in a feature space so that the data is separated into 2 classes with the maximum margin. Using a positive semi definite function, the mapping of a feature space of the original features y, y into a high dimensional space is done as The kernel function is represented by K (•, •) function and Mercer's condition is used by Kernel trick, which explains that the representation of a dot product in a high dimensional space is mentioned by any positive semi-definite kernel k y, y . The standard kernels utilized generally are as follows: k y, y = y • y + 1 d Poly The model parameters are with respect to the Kernel type, the polynomial degree d and the width of the RBF σ 2 . In this work only SVM -RBF kernel is used.

E. MULTI-LAYER PERCEPTRON (MLP)
With a sigmoid activation function, the training of a multilayer feed forward Neural Network is done [51]. With Gaussian distributed random numbers, the initialization of weights is done which has scaled variance and a zero mean. Gradient descent with Back propagation is used to train the weights. Common weight decay is found in MLP which has a penalty term represented as where the N-dimensional weight vector of the MLP is expressed as v, λ represents a small regularization parameter. During the cross-validation training, the number of neurons, number of regularization parameters and the number of hidden layers is adjusted to get a minimum error loss.

V. RESULTS AND DISCUSSION
It is classified with a 10-fold cross validation method and the performance of it is shown in tables below. The mathematical formulae for computing the Performance Index (PI), Sensitivity, Specificity and Accuracy is mentioned in literature and using the same, the values are computed and exhibited [52]. PC is Perfect Classification; MC is Missed Classification and FA is False Alarm in the expressions below. The sensitivity is computed as Specificity is computed as Accuracy is expressed as Performance Index (PI) is expressed as         of accuracy 78.77% and Performance Index of 25.503%. Across the classifiers the LAPO optimization method reaches high average accuracy of 91.162% and along with Performance Index of 74.89%. The LAPO method outperforms other three optimization methods in terms of accuracy and Performance Index. Table 11 displays the Consolidated Average Performance Analysis of Classifiers in terms of Classification Accuracy and Performance Index with three gene selection techniques for Average of four optimization techniques across the classifiers for Ovarian Cancer. As indicated in the Table 11 that T-Statistic gene selection technique in CFO optimization method retained at high accuracy of 92.0077% along with Performance Index of 78.29%. Once again T-Statistic gene selection methods scores high average accuracy of 90.37% and Performance Index of 72.84% across the four optimization methods. Fig. 2. displays the Consolidated Performance Analyses of Various Classifiers Accuracy and Performance Index under different Optimization Techniques Averaged to Gene Selection Methods for Ovarian Cancer. As demonstrated in the Figure 2 that LAPO method with LDA classifier of 100 genes selected achieved high accuracy of 97.84% and Performance Index of 95.49%. The LR classifier with 100 genes selected for GBCO optimization reached low value of accuracy 78.77% and Performance Index of 25.503%. Across the classifiers the LAPO optimization method reaches high average accuracy of 91.162% and along with Performance Index of 74.89%. Fig. 3. depicts the Consolidated Average Performance Analysis of Classifiers in terms of Classification Accuracy Performance Index with three gene selection techniques for Average of Four optimization techniques Across the classifiers for Ovarian Cancer. As shown in the Fig. 3. that T-Statistic gene selection technique in CFO optimization method retained at high accuracy of 92.077% along with Performance Index of 78.29%. Once again T-Statistic gene selection methods scores high average accuracy of 90.37% and Performance Index of 72.84% across the four optimization methods.

VI. CONCLUSION AND FUTURE WORK
The most common gynecological malignancy is ovarian cancer. To determine the diagnosis correctly, Computer Aided Diagnosis is absolutely necessary. Monitoring the expression levels of thousands of genes in a simultaneous manner under specific conditions is enables by Micro array technology. Microarray technology makes it possible for the analysis of gene expressions and tremendous amount of data is generated. As a result, due to the curse of dimensionality problem along with a small sample space, processing it further is very difficult. Therefore, in this paper, a two-level feature selection process is proposed, first with the standard gene selection techniques and then the with the implementation of optimization techniques before proceeding to classification. The second-best results are produced when T-static test results are further optimized with both CFO and LAPO and classified with MLP and LDA giving a classification accuracy of 98.96% and 98.69% respectively. the best results are projected when Kruskal Wallis test with GBCO is conducted and classified with SVM -RBF Kernel technique giving a high classification accuracy of 99.48%. Similar results are also obtained when Correlation Coefficient test with AAO is conducted and classified with Logistic Regression giving a high classification accuracy of 99.48%. Future works is to utilize a variety of other stochastic optimization techniques for the analysis of ovarian cancer classification.