An Effective Filter Method towards the performance improvement of FF-SVM Algorithm

Fining effective and informative biomarker genes form microarray is very challenging. In order to develop an hybrid gene selection algorithm, numerous filter feature selection algorithms have been previously reported. This research paper aims to identify the filter method that will improve the performance of our previously proposed FF-SVM algorithm to find the minimum number of accurate genes that achieves high accuracy performance. Therefore, an experiment was conducted using four different filter methods: Maximum Relevance Minimum Redundancy (mRMR), Joint Mutual Information (JMI), F-score, and Double Input Symmetrical Relevance (DISR). This experiment was undertaken in two phases: the first phase was filter to SVM, to identify the minimum number of features (genes) which served to maximize the SVM classifier; the second phase was filter to FF-SVM, to ascertain the best suite filter method to our previously proposed FF-SVM algorithm. The result of this experiment would be the most suited filter method to the FF-SVM. In conclusion, we found that the f-score method outperformed other filter methods when combined with FF-SVM.


I. INTRODUCTION
Microarray data analysis provides valuable results towards the solution of gene expression profile problems. One of the most important applications of microarray data analysis is cancer classification. Cancer is a genetic disease [2], [3] and the analysis of cancer pathobiology is correlated with the analysis of genes that cause cancer. This reflects the changes in the expression level of various genes. However, classification is challenging due to the high dimensionality found in a small sample size of gene expression data. Also, microarray data has a large number of redundant and irrelevant genes. Further, only a small number of genes (features) are significant to the classification task. The most practical method to overcome these challenges is therefore a feature selection technique [3]. The main idea behind the feature selection method is to select the most informative and significant genes for the prediction (classification) problem [4]. Feature selection techniques can be categorized into three categories: filter-based, wrapper-based, and embedded methods. Many feature selection algorithms previously reported have a common objective of improving classification accuracy while selecting the minimum number of features [5]. Recently, new hybrid and ensemble methods have been added to the feature selections general framework. Filter methods have been wildly used in the field of microarray data analysis as a preprocessing step. The filter methods do not use any specific learning model and, this way, they are independent of the classifier [5]. In filter methods, the typical features (genes) are ranked in a specific criteria and only the features with the highest scores are then selected. Following this, these features are used as an input for wrapper methods. This is what called hybrid feature selection algorithm, using any filter method as pre-processing to the wrapper algorithm, in order to take advantage of both approaches [4]. There are several filter feature selection algorithms reported in the literature. Therefore, in order to find the filter method that will improve the performance of our previously proposed FF-SVM algorithm, we did an experiment using four different filter methods: Maximum Relevance Minimum Redundancy (mRMR), Joint Mutual Information (JMI), f-score, and Double Input Symmetrical Relevance (DISR). This experiment was done in two phases. The first was filter to SVM, aiming to find the minimum number of features (genes) that maximize the SVM classifier. The second phase was filter to FF-SVM, to find the best suite filter method to our proposed FF-SVM algorithm. The result of these experiments were the identification of the most suited filter method to the FF-SVM. Our results showed that the f-score method outperforms the others filter methods, when combined with FF-SVM.

II. FF-SVM ALGORITHM
This study aims to identify the most informative genes contributing to cancer diagnosis. Consequently, in our previous research [1], we have developed an innovative wrapper feature selection method for microarray gene expression profiles, that employs the firefly algorithm and a Support Vector Machine (SVM) named FF-SVM [1]. The previous method comprised two phases, the gene selection phase and classification phase. During the gene selection phase, the firefly wrapper method was adopted as a means of discovering the optimal gene subset. During the classification phase, this optimal gene subset was tested via the SVM classifier, with classification precision obtained using Leave-One-Out Cross Validation (LOOCV). Five microarray benchmark datasets of various cancer types were applied to evaluate the proposed model. To validate its effectiveness, we compared this algorithm with other wrapper-based and hybridbased state-of-the-art algorithms. Overall, the experimental results evidenced the improvement in classification accuracy and in the number of selected genes, when compared to wrapper-based algorithms. The hybrid-based outperform FF-SVM. In order to develop a hybrid gene selection algorithm, numerous filter feature selection algorithms have been previously reported. This research paper aimed to identify the best filter method that will improve the performance of FF-SVM algorithm.

III. FILTER METHODS
This section describes the four filter methods used in our experiments, mRMR, JMI, DISR, and f-score.

A. Maximum Relevance Minimum Redundancy (mRMR) [6]
The mRMR is a multi-variant filter feature selection algorithm. The aim of mRMR is to select the features that have high correlation with the target class and have low correlation between each other. In other words, it is involved selecting the features that maximize feature to class relevance, while reducing the redundancies within each class. Thus, any mutually exclusive features not mimicking each other are selected. It is noted that mRMR filer method is very high sensitivity of standard relevance and redundancy measures to the presence of outliers in the data. In the case of continuous features, f-statistic can be used to calculate features to class correlation (relevance), while the correlation between features (redundancy) can be minimize using Pearson correlation coefficient. Afterwards, the features were selected one by one applying a greedy search by maximizing the objective function. The objective function maximizes relevance and minimizes redundancy.

B. JMI [7] [8]
The JMI measures the level of dependence between two random features. In other words, this process measures the amount of information that one variable (X) knows about another variable (Y). For example, p(X) and p(Y) are the marginal density functions for the variable x and y. Then p(X,Y) is the joint probability density function for the variables X and Y. Thus, the joint mutual information determines the similarity between joint distribution p(X,Y) and marginal distributions p(X) and p(Y). If X and Y are completely independent, then p(X,Y) is equal to p(X) and p(Y). For the feature selection problem, the aim is to maximize the MI between the set of variables Xi and the target variable y. This is called JMI. The main limitations of JMI filter method are the lack of information about the interaction between the features (genes) and the classifier, and the selection of redundant and irrelevant features.

C. DISR [9]
The DISR is an information-theoretic based on filtering feature selection algorithm, which depends on a measure of variable complementarity. DISR relies on two properties: the variable complementarity which is mean a more information about the target class and it can be returned from a combination of variables than sum the variables if considered individually; the absence of additional information on how a subset of variable d should be combined, to suppose a combination of the best performing subsets of d-1 variables as the best combination set. This can be made formally by computing the lower-bound on the information of a set of variables denoted as the average of information in all its subsets. In the feature selection problem, DISR relies on maximizing the lower-bound of the mutual information of a subset.

D. F-score [10]
F-score identifies a group of variables that are jointly significant. The f-statistic or f-score is simply the ratio of two variances or the mean square. Mean square means the estimation of population variance which represents the degree of freedom (DF). The variances depend on dispersion. In other words, it measures how far data points are from the mean, and results in larger values with grater dispersion. The higher variances are seen when the data points are far from the mean. F-score is used to evaluate whether the means of different classes are statistically different by calculating a ratio between the difference of their means and their variability. In the feature selection problem, the aim of the f-score to identify feature subsets where the distances between the data points within different classes are large. One could also say that f-score seeks to maximize the separation between the classes, while minimizing any variations within the classes. One of the main limitation of F-score is that it does not reveal mutual information among features

IV. EXPERIMENT SETUP AND RESULT
Our aim was to find a filter method that will improve the performance of our proposed FF-SVM algorithm, by reducing data dimensionality and lower the search space complexity. The next sections will describe how we conducted the experiments, following with the respective results. Figure 1 present the general procedural flow chart and Figure 2 and presents the experiment's process flow chart.

V. Datasets and Experiment Setup
In this section we descript the dataset, experimental setup, and phases implemented in our experiment.

B. Experimental Setup
The process of finding the most suitable filter algorithm consisted of two phases: filter to SVM and filter to FF-SVM.
Our aim was to find the minimum number of features that maximized the classification accuracy, prior to using the best filtered dataset to select the best filter algorithm.

Phase 1: filter to SVM.
The purpose of this phase was to find the minimum number of features (genes) that maximized the SVM classifier. Four filter feature selection algorithms were used: mRMR [6], JMI [7], f-score [10] and DISR [9]. We applied each filter method to select 100, 200, 300, 400 and 500 features on all the datasets. Hence, for each dataset we obtained five different filtered datasets in one filter method. Then, the filtered datasets were classified using SVM classifier and the accuracy evaluated using LOOCV. In order to ensure the result is statistically valid and more accurate, the experiment code is repeated 25 times for each dataset. The average classification accuracy is reported. SVM classifier with Leave One Out Cross Validation LOOCV was used to obtain the accuracy. The datasets presenting the highest accuracy with the minimum number of selected features were then moved onto the next phase. All the filter algorithms were implemented using Python programming environment.

Phase 2: filter to FF-SVM
The purpose of this phase was to find a filter method that would improve the performance of our proposed FF-SVM algorithm. In this phase, we applied the FF-SVM to the filtered dataset that resulted from the first phase. In order to compare the result fairly for all datasets, we set the number of selected genes to five. Then, we run the FF-SVM algorithm for each filtered dataset. For example, the colon dataset was used four times, one for each algorithm, first for the mRMR dataset, second for the JMI dataset, third for the DISR dataset, and fourth for the f-score dataset. This was done for all five datasets. After that, the results were compared and the best filter algorithm, the one that achieved the highest classification accuracy for all datasets, was obtained.

C. Experimental Results
In this section we illustrate and present the results for the first and second phases, in order to find the most suited filter method for the FF-SVM algorithm

D. FILTER TO CLASSIFIER RESULT
In this section, the results for the first phase are presented. Table 2 shows the result of applying the JMI filter technique on the SVM classifier. The highest accuracy was achieved when the number of genes was 400 or 500. The Lung dataset had 500 genes, the SRBCT dataset 400 genes, the Leukemia 1 dataset 500 genes, the Leukemia 2 dataset 500 genes, and the colon dataset 400 genes. This number of genes would move onto the second phase. Table 3 shows the results of applying mRMR filter technique on the SVM classifier. The highest accuracy was achieved when the number of genes was 400 or 500, except for the SRBCT dataset. The Lung dataset had 400 genes, the SRBCT dataset 300 genes, the Leukemia 1 dataset 500 genes, the Leukemia 2 dataset 500 genes, and the colon dataset 400 genes. This number of genes would move onto the second phase. Table 4 shows the result of applying DISR filter technique on the SVM classifier. The three datasets obtained the highest accuracy when the number of genes was 400 or 500, and two datasets when the number of genes was 100 and 200. The Lung dataset had 100 genes, the SRBCT dataset 500 genes, the Leukemia 1 dataset 200 genes, the Leukemia 2 dataset 500 genes, and the colon dataset 400 genes. This number of genes would move onto the second phase. Table 5 shows the result of applying f-score filter technique on the SVM classifier. Only one dataset obtained the highest accuracy when the number of genes 222 was 500 and the other datasets obtained the highest accuracy when the number of genes was 100 and 200. The Lung dataset had 500 genes, the SRBCT dataset 100 genes, the Leukemia 1 dataset 100 genes, the Leukemia 2 dataset 200 genes, and the colon dataset 200 genes. This number of genes would move onto the second phase.

E. The Best Filter
In this section the result of the second phase is presented. Table 6 shows the result of applying the proposed FF-SVM algorithm on filtered dataset. The number of selected genes was fixed to five in all datasets. The f-score filter algorithm outperformed the mRMR, JMI, and DISR. Concluding, the most suited filter method to FF-SVM algorithm was f-score. Because f -score select the most statistically relative genes and then use them as input to the second phase to achieves high classification accuracy

F. Result comparision
To evaluate effectiveness of our proposed hybrid-based algorithm (f-score with FF-SVM) over wrapper-based algorithm for microarray cancer classification, we will compare it with our proposed wrapper based FF-SVM [1] gene selection algorithm.  Table 7 in term of classification accuracy. As clearly showed from the comparison table our proposed hybrid-based algorithm (fscore with FF-SVM) outperforms other state-ofthe-art bio-inspired meta-heuristic gene selection algorithms in four out of five datasets. Furthermore, Table 8 shows the average run time for all filter methods under comparison with FF-SVM algorithm.    91.5% ------------GA-SVM [17] 84.6% ------91.5% --- With this experiment, we aimed to find filter methods that would increase the performance of our proposed f-score FF-SVM algorithm. Then, we conducted the experiment using four filter methods JMI, DISR, mRMR, and f-score. The result showed that the f-score out performed all other filter algorithms. In our next study, we expect to propose a hybrid gene selection algorithm that utilizes f-score and FF-SVM in order to find minimum number of predictive and informative genes with high classification accuracy. One of limitation of our research is we only four feature selection methods. For this reason, as future work, we will apply other feature selection approaches presented in [18], [19], and [20] which can be solve our problem efficiently . Moreover as future research ,we will apply other feature selection method such as NMFBFS: [21] and bio-inspired evolutionary algorithm such as PSO [22] f-score with our proposed f-score firefly algorithms in order to compare their performance with SVM classifier.