Machine Learning Assisted Methodology for Multiclass Classification of Malignant Brain Tumors

Analysis of malignant and non-malignant brain tumors is done using a computer-aided diagnosis system by practitioners worldwide. Radiologists refer computer-assisted techniques to draw conclusions using image modalities and inferences. Pedagogically, various machine learning approaches have been used, which usually focus on the classification of imaging modality into two categories, either normal and abnormal images or differentiating between benign and malignant tumors. Still, the work requirement is to classify these multi-class malignant tumors into their specific class with better precision. The proposed work focuses on distinguishing between the types of high-grade malignant brain tumors. This study is performed on real-life malignant brain tumor datasets having five classes. The proposed methodology uses the vast feature set from six domains to capture most of the hidden information in the extracted region of interest. Later, relevant features are extracted from the feature set pool using a new proposed feature selection algorithm named the Cumulative Variance method (CVM). Next, the selected features are used for model training and testing using K-Nearest Neighbour (KNN), multi-class Support Vector Machine (mSVM), and Neural Network (NN) for predicting multi-class classification accuracy. The experiments are performed using the proposed feature selection algorithm with three classifiers. The mean average classification accuracy achieved by using the proposed approach is88.43% (KNN), 92.5% (mSVM), and 95.86% (NN), respectively. The comparative analysis of the proposed approach with other existing algorithms like ICA, and GA suggest that the proposed approach gains an increase of accuracy around 2% (KNN), 3% (SVM), and 4% (NN).The experimentation results concluded that the proposed approach is found better with NN classifier with an accuracy of 95.86% using diversified features.


I. INTRODUCTION
A tumor is a medical term for abnormal cell growth inside the brain. Brain tumors in the human body comprise a diverse group of neoplasms that vary in their behavior depending on various factors, such as the cell of origin, site of occurrence, morphology, and pattern of spread. The human brain is the The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo . most sensitive part of the body, which controls muscle movements and interprets sensory information like sight, sound, touch, taste, pain, etc. Any tumor can affect such sensory details and muscle movements or even result in more hazardous situations, including loss of life. Since the position of the tumor is not designated, it can occur in any part of the brain. Depending upon the place of origination, the tumor can be categorized into primary and secondary tumors [1]. If the tumor has originated inside the skull, it is known as a primary brain tumor; otherwise, if the tumor's origination point is somewhere else in the body and has later moved to the brain, such tumors are called secondary tumors.
The locality of the tumor is not specific to any pre-defined place, so it can affect any of the brain lobes where it originates. Clinical methodologies for the treatment of brain neoplasms place trust in a variety of imaging modalities such as CT scan, MRI, fMRI, PET, and so on, with each imaging modality serving a distinct role in disease diagnosis. Among the various imaging modalities, Magnetic Resonance Imaging (MRI) is the most widely used and radiology relies on imaging in clinical diagnosis at multiple stages of treatment. MRI helps in the visualization of the soft tissues more effectively than any other imaging technique. The literature suggests that the MRI modality gives better statistically based analysis results in classification purposes than other imaging techniques.
The MR imaging system generates several voxels of different modalities, which are T1-Weighted, T2-Weighted, eT1-weighted, eT2-Weighted, and FLAIR. Among the various modalities suggested, the T1-weighted modality is widely used. These images are made post-contrast-enhanced using a contrast material named gadolinium. This contrast material is injected into the patient's body using an injection having 0.15-0.20 mMol/kg dose. The radiologists then visualize the post-contrast-enhanced T1-weighted MR image using a computer-aided system to find out the tumors or any other abnormality in brain MR images.
These imaging techniques do not lead to perfect identification of the type of neoplasm as they give a closer idea to the characterization of tumor only as neoplasm tissues are often heterogeneous in spatial and imaging systems. Furthermore, malignant brain neoplasm exhibits a wide range of characterizations and variations in imaging systems. The reference standard for brain neoplasm characterization is currently based on pathologic analysis based on biopsy. Some machinelearning-based analyses aid in gaining an understanding of brain neoplasm abnormalities [2]. Various algorithmic concepts have been proposed to analyze any given MR image based on statistical parameters and are based on statistical results that the machine will learn and predict the class of abnormality.
Based on the literature study, it was found that most of the literature studies are towards binary classification problem where categorization is between normal and abnormal classes or in between benign and malignant. These literature studies use machine learning approaches for the classification with the extraction of only textural features. While there exists some work where spectral features were used but the they lack in the classification accuracy. Besides this for multi-class problems the claimed accuracy was not found promising may be because of using only textual features for representation and model learning. Motivated by this, the objective of current study is towards exploring the diversified feature set by incorporating large number of features for image representation. Next, these features are filtered out by selecting the most appropriate features which are relevant and non-redundant in nature. At the end, the selected features are used for the model training and explore the power of computation using three state-of-art classifiers.
The rest of this paper is organized as follows: The related work towards the use of a machine learning-based diagnosis system for brain tumor imaging is presented in Section II. Section III discusses the proposed methodology for tumor classification. Section IV provides the experimental results. Section V gives the conclusion of the paper followed by discussion in Section VI.

II. RELATED WORK
Recent literature after the year 2016 mainly focuses on the use of Deep Learning and Transfer learning to classify tumor images. But succumbing to the limitation of the scope of the presented work towards the machine learning aspect, the authors have included only those papers which are on machine learning and deliberately exclude deep learning.
For the classification of brain tumors in [3], the author proposed the use of a genetic algorithm for feature extraction, which used fuzzy c-means for segmentation purposes. Intensity-based shape features were identified using Fourier [4] analysis for classification of the breast tumors. A more robust feature extraction model is defined in [5], which is based on the concept of the GARCH series where wavelet-based parameters were used to calculate the feature vector followed by PCA and LDA for the selection of relevant features. Another approach for generating intensity-based features using Discrete Wavelet Transformation (DWT) was proposed for the classification of brain tumors using Bayesian Neural Network (BNN) [6].
For tumor classification, three-domain features are extracted: spatial domain, frequency domain, and intensitybased features [7]. These feature vectors represent the overall imaging statistics that are used by the classifier for its categorization. For classification, the literature suggests that there are two types of classifiers used, i.e., linear and non-linear. Wavelet-based feature extraction was used primarily with non-linear classifiers and neural network variants [5], [8].
In [9], the author describes the comparative study for the classification of brain tumors which was done from 2009 to 2013. The comparative research provides the detail of the work done for the tumor classification using different feature extraction and selection techniques and compares the accuracy of these approaches. In [51], the authors proposed model to detection glioblastoma tumor using deep belief network.
In [11], the author gives the idea of the use of intensity, shape, and texture as spatial domain features for the segmentation of posterior-fossa tumors. The imaging features' effectiveness is shown with a feature selection mechanism for the segmentation of tumor regions from MR images. A decision forest-based tumor segmentation using symmetric texture and symmetric intensity-based features is shown and described [15]. Another approach to frequency-based feature VOLUME 10, 2022 classification using Discrete Wavelet Transformation (DWT) is described in [12].
Classification based on artificial neural networks has been widely used by various authors for tumor classification. In a similar approach, a supervised KNN and feed-forward Artificial Neural Network (ANN) hybrid classifier was used for MRI classification. Some approaches rely on the probabilistic neural network model PNN with textural features, which is also used for brain tumor characterization [13]. A modification based on the iteration of ANN was used in [14], which improved the performance rate of abnormal to normal brain MR image classification where ANN iteration-free was used, which ultimately improved the convergence rate besides yielding accurate results. The performance of these networks was analyzed in the context of abnormal brain image classification.
An ANN and its variant-based classification were described in [16]- [19], where the GLCM-based texture features along with some intensity-based features were used [16]. The dimensionality of the feature vector was further reduced using PCA. Using ANN and its variants for classification, various authors used a variable dataset consisting of 60-428 MR images of 256 × 256 dimensions. It was observed that for a small dataset of 60 or 80 images, the classifier gave an accuracy of nearly 91% [16]. But as the size of the dataset increased up to 273, the classification accuracy decreased to 64% and varied in a range of 64%-94% using Bayesian ANN [19]. While using a huge dataset of 428 MR images, the author of [16] claims an accuracy of 85%.
Instead of bi-class brain tumor classification on MR images, some multi-class classifications for brain tumors were proposed by the author of [25]- [27]. The experiments were performed on the dataset of 98 images [25],which resulted in the individual accuracy factor of MTS-92%, GBM-41%, GLI-91% approx. Likewise, the authors of [26], [27] used the MR dataset of 75 brain tumor images, including tumors of type Metastases, Meningioma (MEN), and Gliomas, which were classified using the Least Square Feature Transformed Probabilistic Neural Network (LSFTPNN). This classifier works in two domains, i.e., in the first phase, the features are transformed using the Least Square function method, and then the features are given as an input to the probabilistic neural network classifier. The experimental results showed MTS-87.5%, GLI-97%, MEN-95% accuracy.
Instead of MR imaging, many studies have suggested the use of MR spectroscopy for brain tumor classification [20]- [22]. To differentiate between benign and malignant brain neoplasm specifically, spectroscopic and conventional MR imaging were used in [20] by the application of a decision tree algorithm. In addition, spectroscopic and perfusion MRI were used in [22] to assess the inherent heterogeneity of brain neoplasms based on tumor and peri-tumor regions of interest (ROIs).Some recent works exhibit the use of the ensemble approach either on classifiers [33] or on feature extraction [34] for the classification of brain tumor images.
These studies have used various feature extraction methods to extract relevant features. Other works on the use of extracting relevant statistical features with a family of neural networks such as BPNN [35] and probabilistic Neural Network [36] for tumor classification were also used.
The literature study shows that few studies focused on high-grade malignant brain tumors, and the experiments performed resulted in low accuracy, even on smaller datasets of MR images. The comparative study of these works is presented in Table 1. Some malignant tumors like CNC and IVMM have not been taken into consideration yet. In this paper, an attempt has been made to classify the five types of high-grade malignant tumors with normal regions, which may help radiologists better analyze the subject matter in MR images.

III. PROPOSED APPROACH
The proposed approach adopted in this work is the extension of our previously published work [37]. The previous work was based on the usage of frequency-based features using Gabor and Wavelet Transformation. While the presented extended work uses a large set of features pool incorporating both spatial and spectral domain features for the classification of malignant brain tumors, the anticipated approach provides detailed information on the usage of machine learning approaches for the classification of high-grade malignant tumors in brain MR images. To identify the degree of malignancy and grade of tumor based on the World Health Organization (WHO) grading system, a diverse feature set including features vector of Intensity-based features, Frequency-based features, Texture features, Shape features, Contour based features, and Moments based features are used. This diverse feature set helps to gain the benefit of analyzing the malignant image in both the spatial and frequency domains, and thereby includes both forms of image analysis parameters.
Machine learning (ML) is an artificial intelligence branch that focuses on the construction and study of systems that can learn from data, use a diverse feature set for their learning, and predict the degree of malignancy. ML is a five-step process that helps to learn and build a model for its prediction. These steps are standard dataset collection which is taken into consideration, pre-processing of the dataset, feature extraction, feature selection, and classification [50]. All the model building steps are equally important and have their specific role, which is discussed in detail below: A. STANDARD DATA COLLECTION/DATA SOURCE The proposed system examined over 110 patients with highgrade malignant brain tumors. From these patients, a dataset of 660 malignant tumor images is generated. Among the generated dataset, an overall of 1160 Regions of Interest (ROI's) is extracted, which are taken into consideration. These ROIs are classified into 760 malicious regions and 400 normal regions. The examined dataset was collected over the duration from October 2013 to October 2014 [37]. All the patients are identified initially at the time of MRI, followed  The experimental images are generated using a 3.0 Tesla GE MRI Scanner at SMS Medical College Jaipur, Rajasthan, India. All the patients whose images are used for dataset generation are imaged using the same imaging system and environment variables. The obtained images have the following specifications: 3D axial T1-Weighted, T2-weighted, eT1-Weighted, eT2-Weighted images, and Fluid Attenuated Inversion Recovery (FLAIR), each having a size of 256 * 256 * 3. In the proposed work, a post-contrast-enhanced T1-weighted axial imaging modality is used for examination purposes, having a dimensionality of 256 * 256 * 3 [37].

B. PRE-PROCESSING
The input MR image is pre-processed to remove any kind of noise generated during image generation. Noise removal helps us to discard unwanted signals which can lead to errors during processing. For noise removal, a median filter is used as it preserves edges while removing noise [37].
A median filter is then used to scan pixel by pixel of the image and replace each entry with the median value of the neighborhood pixels. In the proposed work, a 3 × 3 box window is used as a median filter. Next, the filtered image is converted into an 8-bit grey level to reduce the bands of the image. An original 3 band color image is converted to a corresponding intensity image ranging between 0 and 255. After preprocessing, the input image becomes a noise-free, intense image of size 256 * 256, which serves as an input to the next step.
After converting the image into the grey level, the image is preprocessed for sharpening. Since the soft tissues in the MR image are a concern, the input grey image is sharpened to visualize the soft tissues. Sharpening of an image can be done using high-pass filters, which help to pass high-frequency components and discard low-frequency components. Therefore, the filtered image generated after a high pass filter results in a sharpened image where soft tissues and edges are visualized. In this paper, a high pass Gaussian filter is used for the sharpening of MR images.
The next part of pre-processing is skull striping. Since the concerned area of the proposed work is to classify high-grade malignant tumors in brain images, which always has a highintensity value as compared with other parts of the brain, it needs to be removed. Malignant tumors are mainly associated with soft tissues of the brain and the skull part in the ongoing processed image gives unnecessary information and increases complexity.
After skull striping from MR images, the next step is the extraction of the region of interest from the MR images.
In any MR image, some regions may be identified as malicious and differentiated from others. These regions are under suspicion for radiologists during the analysis of the MR image using the computer-aided machine. From any MR image, several regions are marked by the radiologists, who are termed as ''malicious'' and ''normal'' regions. In this paper, Region of Interest (ROI's) extraction is done using an automated approach that selects the abnormal identified malicious region of interest and normal regions using free flow marking of the boundary region. The selected malicious and normal regions of interest are extracted from sharpened skull removed images.
Among 110 malignant brain tumor suspects, a dataset of 660 MR images is prepared. From this dataset, regions marked as malignant and normal by the radiologist are extracted. This resulted in the form of 1160 ROI's having 760 malignant regions and 400 normal regions. Thus, the obtained dataset has 1160 regions which are going to be classified using the machine learning approach in 6 classes.

C. FEATURE EXTRACTION
This subsection provides detailed information about the varieties of features used using machine learning for predicting the degree of malignancy. This diverse feature vector includes features like Intensity-based features, Frequency-based features, Texture based features, Shape-based features, Contour features, and Moments based features. The generated feature set results in gaining the benefit of processing a single image in a variety of its representation forms. Since an image can be represented and processed on digital machines either in the frequency domain or in the spatial domain, both the representation forms have significance over each other. In the proposed work, both image representation forms are taken into consideration and a feature set is obtained over those representation forms which have the properties of the two domains.

1) FREQUENCY-BASED FEATURES
Imaging machines use signal values for storing and reconstructing brain images. Signal processing is thus an important factor for feature analysis. The input malignant image is thus mapped to the signal domain using Discrete Wavelet Transform (DWT), which provides enough information for analysis and synthesis of the original signal while reducing computation time significantly. To use DWT for brain MR images, we implement the 2D variant of the analysis and synthesis filter bank, which results in forming an image in four bands, i.e., LL, LH, HL, HH, as shown in Figure 1using row-wise and column-wise DWT transformations on an input image. From the resultant signal obtained using 2D DWT, several features are obtained like Contrast, Homogeneity, Entropy, Dissimilarity, and Energy. The phase offsets (θ ) which are taken into consideration for 2D DWT are 0 0 , 45 0 , 90 0 , and 135 0 . The above features can be represented as given below: where, i, j are the coordinates for the normalized cooccurrence matrix space are, g(i, j) is the element of the i, j coordinate's space and c is the dimension of the normalized co-occurrence matrix. Among every offset, all the above five features are calculated which results in forming a feature vector of size 20. The next feature extraction mechanism is also based on the wavelet formed using the Gabor filter. The frequency and orientation representation of the Gabor filter is found to be appropriate for texture representation and discrimination. In image analysis, the image is processed by the Gabor filter, which results in the generation of Gabor descriptors for that image. The Gabor descriptor of a linear low pass filter is defined as: where, A = x cos (θ) + y sin(θ ), B = −x sin (θ ) + y cos(θ ) In the above Equation (6), represents the wavelength of the sinusoidal factor, θ represents the orientation of the Gabor function, ϕ is the phase offset, σ is the standard deviation of the Gaussian factor and γ is the spatial aspect ratio to specify the ellipticity of Gabor function.
To calculate the texture feature from an image the outputs of the symmetric (ϕ = 0) and anti-symmetric (ϕ = π/2) Gabor kernel are combined using the distance metric and 2D linear convolution. The mathematical model of such is given as: Textural features of an image are calculated by using a set of Gabor filters with different frequencies and orientations. We use five wavelengths,i.e., λ = {2 √ 2 and four orientations in [0, π) i.e. θ = {0 0 , 45 0 , 90 0 , 135 0 which are taken at an equal interval of 45 0 . The special aspect ratio (γ = 1) is selected for computation. Thus the above statement concludes that for a single wavelength factor and particular orientation, we can obtain five textural features for an image. Thus the total feature set obtained is of size 100. Now the above feature set obtained is only dependent on the single standard deviation value which is taken into consideration. We used two iterative values of standard deviation, i.e., σ = (1.5, 2.5), and finally, the obtained feature vector size becomes 200 features in total for texture analysis using Gabor.
The above two feature extraction techniques, named Gabor wavelet and Discrete Wavelet Transformation, helped to form a diverse feature set by combining both feature domain vectors, resulting in the formation of a large feature vector of 220 features.

2) TEXTURE BASED FEATURES
The gray level extracted ROI's based MR dataset is processed for feature extraction using Gray Level co-occurrence Matrix (GLCM). Several features are added to the feature set based on the computation over GLCM. The computation in this paper is based on the selected four offset values i.e. θ = {0 0 , 45 0 , 90 0 , 135 0 . For each offset value, various features are extracted, which is discussed as: Let V θ (i, j) be the co-occurrence value of the gray level matrix at intensity pair (i, ) j on specific offset θ . The extracted features description are given as:- Inverse Difference moment Let N g be defined as the no. of distinct gray levels in an image obtained using histogram of an image and V x+y (k) = The above feature extraction mechanism generates 12 different features for a particular offset, which results in a total of 48 features based on the four different offsets which are added to the feature pool.

3) INTENSITY-BASED FEATURES
In the intensity-based features extraction system, the gray level intensity image is modelled using a histogram of an image. The histogram is the process in which each eminentintensity pixel is mapped to the total number of times it appears in an image. If h(k) be the histogram representation of thek th intensity value of an image, then h(k) is defined as: where,N is the number of times thek th intensity value appears in an image. For each ROI's dataset of MR image, several features are extracted using a histogram of an image. If N is the total number of pixels in an image and I (i, j) is the intensity value of an image at indexed (i, j) then these features are described as: The above intensity-based feature extraction mechanism generates 5 features that are added to the feature set pool.

4) SHAPE-BASED FEATURES
The shape-based feature extraction mechanism has a great impact on image retrieval systems. The main idea of these features is to search out the image based on specific shape descriptions. For the classification of the tumor, some basic shape descriptor features are calculated. These basic shape features are described as: where P is perimeter and a is an area of an object where, curvature of the contour at the i th position can be approximated as: x i , y i represent the i th coordinate location describing an object's contour.
where, σ r and µ r are the mean and standard deviation of the radial distance from the centroid (g x , g y ) of the shape to the boundary points (x i , y i ) i.e.
where, A s is the area of a shape and A r is the area of the minimum bounding rectangle.
Convexity is defined as the ratio of perimeters of the convex hull over that of the original contour C. The above feature extraction mechanism helps to get shape-based features of the extracted ROI's from brain MR images and thus addsfive shape-based features to the feature set pool.

5) CONTOUR BASED FEATURES
Let, x i , and y i represent the i th coordinate location describing an object's contour and z i be the Euclidean distance of this point to the Centroid of the object. Then, the p th CSM is defined as: Similarly, the p th central CSM is defined as: Here in this work we select the following low-order moments to form a set of shape features: The difference between F 3 and F 1 can give information about shape roughness which may not be represented by both F 1 and F 3 . The above feature extraction mechanism generates 4 contour point features which are added to the feature set pool.

6) MOMENT BASED FEATURES
A moment is a specific quantitative measure of the shape or set of points. If the points represent probability density, then the zero th moment is the total probability (i.e. one), the first moment is the mean, the second moment is the variance, and the third moment is the skewness [29].In image analysis, image moments are useful in describing objects after segmentation. Thus, after extracting the ROI's from the malignant brain MR image, moment based features help to describe the tumor type in image classification. Some of the basic image moments features are described as: where, I (x, y) is the pixel intensity at indexed (x, y).
Rotation invariant moments: seven rotation invariant moment features are given as: then, The above moment-based image feature extraction technique helps to extract the features set, which are rotation and translation invariant. These features help the radiologist to diagnose the malignant ROI extracted from brain MR images even after rotation and translation of the images on a variety of dimensional scales. This mechanism helps to generate 16 features that are added to the feature set pool of machine classification. The overall summary of all the feature extraction techniques with the number of extracted features is shown in Table 2. The feature set pool now consists of a large number of features from a variety of feature extraction domains. This huge, diverse feature set pool is used for machine learning for classification.

D. FEATURE SELECTION
This subsection describes the feature selection mechanism, which is an important phase used in the machine learning system. Feature selection is also known as variable selection, subset selection, or even by the name of attribute selection. Feature selection is defined as a process of selecting a relevant set of features or attributes from the large feature set vector, which is used for building a machine learning model. The generated feature set can have some features which are irrelevant and redundant. These features have a lower impact on generating machine learning models, but identifying the irrelevant features is a major concern for model building.
In regard to machine learning, features are considered redundant if the selected feature provides no more information than the previously selected features. While irrelevant features are those features that provide no useful information in any context, they are also those features that provide no useful information in any context. For filtering out redundant and irrelevant features, feature selection algorithms are divided into two major sub-domains, i.e., feature ranking methods and feature subset selection methods [24].
Feature ranking methods are generally used for highdimensional datasets. This method processes the initially generated features and computes the ranking score of each feature. These ranking score features are now sorted in decreasing order of their rank and the top-ranked features are selected for the classification. But the major problem associated with this approach is the selection of redundant features in the selected feature set. As if the features are redundant, then they have nearly the same feature value, which results in the formation of a higher ranking for both features, and these redundant features are selected. This problem occurs due to the absence of finding the correlation between the selected features.
On the other hand, feature subset selection mechanism spotlight on selecting a subset (s') from the overall feature set (S) such that s S. The subset (s') must form greater significance than the rest of the other features subset. For selecting the relevant subset, various methods are available as a wrapper method, Filter based method, and embedded methods [24].
In the proposed work, three experiments are done based on different feature selection algorithms, i.e., Cumulative Variance based feature selection (CVM), Genetic Algorithm (GA), and Independent Component Analysis (ICA). CVM is our previous proposed feature selection approach whose description is found in [28]. Other algorithms like GA and ICA are state-of-the-art algorithms. The basic description of the algorithms is given below:

1) CUMULATIVE VARIANCE METHOD FOR FEATURE SELECTION (CVM)
In this algorithm, CVM utilizes the advantages of both the feature selection approaches, i.e., feature ranking and subset selection. Firstly, the redundant and irrelevant features are eliminated from the feature vector using the feature ranking approach that is based on variance analysis among the features. The ranking of the features is dependent on the cumulative variance percentage. Thus, based on the aboveproposed approach, we utilized the benefit of feature ranking. In the second step, the most desirable subset of features is selected among the ranked feature vectors using a statistical t-test approach as defined below: where, x1, x2 are the sample means and v 1 , v 2 are the sample variance of a particular feature subset. N 1 , N 2 is the number of features in each subset. The proposed cumulative variance and t-test based algorithm is given below:

2) Detailed Description of CVM
In feature selection, the input is the vector 'V ', i.e., V = v1, v2, . . . , vn returned by the feature extraction step.

Algorithm 1 CVM Based Feature Selection
Input: Feature vector (V) Output: Relevant feature subset (RFS) Procedure: For each vector v i εV The vector is of size 1 * n where each vi is of dimension p,and n = no. of data elements,p = no. of features. Initially, we find out the mean of each column vector which is represented by the µi given in Equation 45. Then the data normalization is done by subtracting the mean from each column vector. For this normalized column vector, a covariance matrix is generated using Equation 46.
where D T is the transposed of the normalized vectorD. The generated covariance matrix is of size p * p. An Eigenvector is calculated from the covariance matrix using Equation 47 as given below: where Ev is the Eigenvector matrix, Cov is the covariance matrix and D is the diagonal Eigenvalues matrix. After getting Eigenvectors, the variance among the vectors is computed for each Eigenvector as given in Equation 48.
Such that, x i εEv (Eigenvectors). All the extracted variances are now sorted in decreasing order of magnitude and a test is applied as such. The statistical t-test helps to find out the subsets using Equation 44. The subsets that can pass the t-test will be considered as relevant features, and the rest are all rated as irrelevant at once.
The above selection mechanism will automatically find the number of relevant features and return a vector with a lower number of features as compared with the original vector. Features that pass this significance level are selected, and the rest of the features are dropped. This reduced feature set is now used in the classification step.

3) GENETIC ALGORITHM
A Genetic Algorithm (GA) is constructed to evolve the evolutionary process. GA helps by allowing the current population to reproduce and generate the next level of children. The basic steps for implementing the GA are described below: -Initially, set the no. of children in each generation (G), mutation probability (MP), and stopping criteria (SC). -Initialize the random set of binary chromosomes -loop -do for each chromosome -Train the model and calculate its fitness.
-for reproduction, i = 1 to G/2 -do select 2 chromosomes based on fitness -do the crossover over them -do mutation using probability (MP) and generate new child chromosomes -end till stopping criteria (SC) achieved The basic of the GA algorithm, which is used in this work for selecting features, is described in [31].

4) INDEPENDENT COMPONENT ANALYSIS (ICA)
ICA is defined as finding a linear transformation that maximizes non-Gaussianity of statistically independent components [30] given by a matrix M as given in Equation below i.e., so that the random variable yi, i = 1, 2, . . . , n are as independent as possible.
where t is the time or sample index, M is some unknown matrix. ICA now consists of estimating both M and x i (t) when we only observe the y i (t). The matrix M is used to project the feature vector into independent components. The method used for feature selection using ICA in this work is well defined in [30].

E. CLASSIFICATION
This subsection deals with the final processing step for model building using machine learning. The last phase of the machine learning model is the classification of the dataset. Classification is the procedure for organizing the input patterns into equivalent classes and providing labels for them. The selection of a suitable classifier requires consideration of many factors, like classification accuracy, algorithmic performance, computational resources, etc. Classification is performed by using the selective feature set using the feature selection mechanism. These selective features are used for learning and prediction of the dataset by the model, called training and testing of the machine model. Training and testing are the two main domains for any classification model. In classification, the accuracy mainly depends upon how well the classification network is trained. A better-trained classifier model always has the advantage of predicting outputs and generating class labels that are free from over-fitting and under-fitting conditions. In this work, several machine learning-based classification algorithms like K-Nearest Neighbour (KNN), multiclass Support Vector Machine (m-SVM), and Neural Network (NN) is used for classification purposes. Various experiments are conducted based on the selected features set using CVM, GA, and ICA methods. Each of the classifiers has experimented with each of the selected feature set using the above three defined feature selection methods. All three of the above classifiers are multi-class classifiers, which means that classifiers can classify the given data pattern into two or more classes. Also, all the classifiers are from different domains, like KNN, which is a linear, lazy-learning classifier, m-SVM is a non-linear classifier, and NN is a probabilistic classifier based on the initial selection of the weights, activation function, learning rate, and the number of hidden layers.

1) K-NEAREST NEIGHBOUR (KNN) CLASSIFIER
A supervised nearest neighbor-based classifier named K-NN is used for the result analysis of the proposed approach. Dissimilarity is used as criteria for neighborhood selection. The square of Euclidean distance is a metric used in dissimilarity selection. KNN is based on the closest training samples in feature space thus, KNN classification is performed by starting with the k = 2 nearest neighbors and regularly increasing the value of K until classification accuracy no longer improves. There is no proper justification present for the initial selection of the best value of 'k'. This 'k' value must be defined in a specific range, i.e. 0 < k ≤ n; Where, n = no. of items in a dataset (50) If k = 1, then the algorithm will search for the very first nearest neighbor who is closest and return the class label of the unknown sample as the same as its nearest neighbor. While if k = m, where m [2, n] then the algorithm will search for the first m closest neighbors from the sample and return the class label of the neighbor who has the maximum majority of class labels among the'm' nearest samples. One of the main and critical parameters which should be kept in consideration before using KNN is a selection of the value of 'k', i.e., nearest neighbors for any particular application domain. In general, the value of the nearest neighbor should always be greater or equal to the number of unique class labels present in the training dataset. Also, the value of k must be kept odd in number in respect of making the algorithm bias-free.

2) MULTI-CLASS SUPPORT VECTOR MACHINE (M-SVM) CLASSIFIER
The Support Vector Machine (SVM) is a popular binary non-linear classifier that is used in the majority of pattern classification problems.SVM is used in problems where a binary distinction is required. In problems where more than VOLUME 10, 2022 two classes are present for classification, the multi-class SVM (m-SVM) classifier is used. For multi-class classification using SVM, several methods are used, which are given as: -Ranking based multi-class Classification -One -Vs-Other Classification In the ranking-based mechanism, one SVM-based decision function is used to classify all the instances according to their particular class label. While in other one-vs.-other classification mechanisms, including one-against-all classification, in which one class instance is separated from all the other instances of the other classes, and pairwise classification, in which two instances are taken from classes which are separated by other instances of the classes.
In this paper, a one-against-all classification mechanism is used to classify the instances of various brain tumor dataset images using m-SVM. The experiment is performed initially to identify the number of correct instances using one-againstall m-SVM. Then the incorrect classified images are again being processed to find out the number of incorrect classified images for each type of tumor dataset.

3) NEURAL NETWORK (NN) CLASSIFIER
A neural network classifier is a well-known state-of-art classifier that is based on the neural configuration of the brain. From the classification perspective, they process one input sample at a time and learn the network by comparing the obtained class label of the sample to the original supervised sample known previously. The error rate obtained, the difference between the actual result and the obtained result, feeds backward to update the weighting factor. The whole process loops till the error rate comes within the limits defined initially.
A neural network consists of several modules: -A set of inputs, (Xi) and initial weights, (Wi) -An activation function, (∅) that sums the weight -An output, (Yi) An input layer of the network includes the data values, selected features through a feature selection algorithm in our case, which is given as an input to the next layer known as the hidden layer. Each of the input layers is connected with each of the hidden layers using some initial weights associated with them. The last layer is the output layer in which each node of the layer represents the class label. In this paper, a three-layered neural network architecture is used, having a variable number of nodes depending upon the type of feature selection algorithm used. The number of selected features represents the number of input nodes in the neural network. The number of hidden layers is used in the range of 15-50, again depending upon the feature selection algorithm.

F. 2-FOLD CROSS-VALIDATION
During the training part of the above-mentioned classifier in the proposed work, a 5 * 2 fold cross-validation is performed on the malignant brain tumor MR image ROI's dataset. Cross-validation helps in removing bias between the dataset images. The purpose of cross-validation is to characterize a dataset to learn the model in the training phase to limit problems such as over-fitting and give an insight into how the model will simplify to an independent data set.
Here, in this work, a whole malignant brain neoplasm dataset 'D' is divided into two equal subsets i.e. D0 and D1. Firstly, the training of the classification model is being done using D0 and the testing of the model using D1. Then the training and testing datasets are swapped, i.e., training of the model using D1 and testing using D0. Figure 2 shows the 2-fold cross-validation approach. Similarly, the process iterates 5 times with two classification accuracy results on each run. The datasets D0 and D1 are selected randomly on every run, which helps the model learn about almost all the dataset images. This also results in the removal of the biasing among the dataset images. The final accuracy of the dataset is claimed as the mean of all the accuracies obtained in each run.

IV. RESULT AND ANALYSIS
This subsection describes the experimental results and the analysis carried out based on experimental results. In the proposed work, a diverse feature set based on frequency-based features, texture features, intensity features, shape features, contour features, and moment features are used for extracting significant features from the MR imaging dataset, which consists of 5 types of malignant brain tumors named Central NeuroCytoma (CNC), GlioblastomaMultiforme (GBM), Gliomas (GLI), Intra Ventricular Malignant Mass (IVMM), and Metastasis (MTS). There are 110 malignant brain tumor suspects are taken into consideration. From these patients, a dataset of 660 malignant tumor images is generated. Among the generated dataset, an overall of 1160 Regions of Interest (ROI's) are extracted, which are taken into consideration. These ROIs are classified into 760 malicious regions and 400 normal regions (NR). The numbers of samples taken into consideration for an experiment are given as CNC-133, GBM-160, GLI-155, IVMM-152, MTS-160, and NR-400. Here, 5 × 2 fold cross-validationsare applied to the dataset where the dataset is divided equally for training and testing. In each fold, randomly dataset is divided and then the model is trained by using training dataset. Later, in another fold, this splitting is reversed, and thus model is now trained with testing dataset, which now act as training. The whole methodology is repeatedby applying three different feature selection mechanisms (CVM, GA, and ICA) and three different classifiers (KNN, mSVM, and NN).
The results of various feature selection algorithms are shown in Table 3. The first column shows the type of MR image which is taken into consideration. The second column represents the achieved results for various feature selection algorithms based on the input feature set given as an input. The first algorithm, CVM, is based on the parameter of cumulative variance and t-test with a 95% confidence level, while the ICA algorithm is based upon finding the independent components based on the information present in features. GA is based upon the fitness value of the parameter and number of iterations which is given manually during the execution of the program. Here, the number of iterations on which GA runs is 100. Table 2 shows the detailed results of each algorithm with the number of selected features out of a total number of features (298). These selected features are used for machine learning and classification.

A. EVALUATION METRICS
To evaluate the performance of the classifier, several statistical measurement parameters named precision, recall, accuracy, and F-measure are used for evaluating the performance of brain tumor classification. Precision for a class is defined as the fraction of the total number of images that are correctly classified to the total number of images that are classified into the class (sum of True Positives (TP) and False Positives (FP)). The recall is the fraction of the total number of correctly classified images to the total number of images that belong to a class (the sum of True Positives and False Negatives (FN)). F-measure is the combination of both precision and recall. The F-measure is used to report the performance of classifiers for tumor classification. The statistical evaluation of this is shown in Table 4.
For a particular class or an individual class performance analysis through each classifier, the accuracy of each class is identified. The classification for a particular class is defined as the percentage of correctly classified samples over the whole class. Mathematically, it is given as: In the proposed work, three experiments are performed based on the selected features through three feature selection algorithms. The first experiment is done using a KNN classifier, the second with a multi-class SVM classifier, and the third with the neural network. The corresponding experimental results are shown in Table 5, Table 6, and Table 7. Each of the tables represents the confusion matrix obtained during the experimental run.

B. EXPERIMENT 1
The first experiment is performed with a KNN classifier in conjunction with three feature selection algorithms, i.e., CVM, GA, and ICA, over the whole dataset of 1160 malignant brain MR ROI images. The dataset has six classes named Central NeuroCytoma (CNC), GlioblastomaMultiforme (GBM), Gliomas (GLI), Intra Ventricular Malignant Mass (IVMM), Metastasis (MTS), and Normal regions (NR). The numbers of samples taken into consideration for the experiment are given as CNC-133, GBM-160, GLI-155, IVMM-152, MTS-160, and NR-400. This whole dataset is now divided into the training set, which has 80% of the images, and the testing set, which includes the remaining 20% of the images from the dataset. Over these samples and three feature selection algorithms, the experiment is performed on several values of 'k'. The best accuracy achieved is at k=7, which is shown in Table 5. The accuracy for each particular class for various feature selection algorithms is CNC-92%, VOLUME 10, 2022  GBM-86%, GLI-91%, IVMM-92%, MTS-85.6%, and NR-84% when using CVM as a feature selection algorithm. The average accuracy gained is 88.43%, which is greater than other feature selection algorithms like ICA (82.87%) and GA (85.07%) as shown in Table 5. The experimental result shows that the order performance of three feature selection algorithms is CVM > GA > ICA.

C. EXPERIMENT NO. 2
The second experiment is performed with a non-linear multiclass SVM classifier (mSVM). The dataset is kept constant throughout this work as discussed in experiment 1. When multi-class SVM is used as a classifier with three feature selection algorithms, then an increase in the accuracy of particular classes is observed (as shown in Table 6 ). The classification accuracy achieved for various classes using CVM as a feature selection approach is CNC-95.4%, GBM-89.3%, GLI-92.9%, IVMM-94.7%, MTS-92.5%, and NR-90.25%, which is greater than for MTS using GA, which is presented clearly in Table 6. Still, it is seen that the overall average classification accuracy of the mSVM classifier is still higher for CVM-92.5% than for other feature selection algorithms like ICA-87.88% and GA-90.52%. There is no change seen in the order of performance of the three feature selection algorithms, i.e., CVM > GA > ICA.

D. EXPERIMENT NO. 3
The third experiment is performed with a neural network (NN) classifier. The experimental result is presented in Table 7. The accuracy achieved using the NN classifier with the CVM feature selection algorithm is CNC-92.7%, GBM-96.2%, GLI-95.4%, IVMM-96.7%, MTS-95%, and NR-94.2%. The average accuracy gained is 95.86%, which is more than any other feature selection algorithm presented in Table 7. The order performance of the feature selection algorithm is still the same as shown in earlier experiments.
Based on the experimentation results it is identified that the proposed approach is providing satisfactory results as compared with other methods. One of the main reasons for this is towards use of diversified feature set. In this work, six different feature domains are used to extract the large set of features in the feature pool. It includes an overall of 298 features having texture, spectral, shape, and moment information. These features are further filtered out using the proposed CVM method, which uses the cumulative variance of the extracted features. Thus, using the standard Joliffe's B4 method threshold, the filtered features are relevant and informative to capture different information.

E. EXPERIMENT NO. 4
The proposed methodology is also experimented with one of the multiclass brain tumor dataset having 3 classes i.e. Meningioma, Glioma, and pituitary tumor. This dataset is collected from the online platform which consists of 3064 brain MR scans [49]. The dataset is having T1-weighted image slices of 233 individuals with the distribution as Gliomas-1426, Meningiomas-708, and pituitary-930 images. In the literature, there exists many of the research contributions on the similar dataset [45]- [48]. These research contributions uses the CNN based deep models for classification. The same is being shown in Table 1 with the predicted accuracy.
In this experimentation setup, the proposed methodology is experimented on the similar dataset having 3 classes of brain tumors using machine learning environment. The experimentation results of the proposed method gives significant accuracies when using with the three machine learning classifiers. The gained in the classification accuracies are given as  Table 9. The experimentation results were found significant enough as compared with other feature selection algorithms.

V. DISCUSSION
This paper uses three feature selection algorithms, i.e., proposed CVM, ICA, and GA, with three well-known state-ofthe-art classifiers, KNN, mSVM, and NN. The experimental result of each classifier with every feature selection algorithm is presented in Table 5, Table 6, and Table 7. The experimental results show that the CVM feature selection algorithm gives better accuracy results in respect of each class of malignant brain tumor prediction concerning other feature selection algorithms like ICA and GA. The overall order of the feature selection algorithm in respect of accuracy result is found as CVM > GA > ICA. Also, it has been noticed that the performance level of the classifier is also varied. The order of the classifier in NN > mSVM > KNN as the highest average accuracy achieved using NN is 95.86% using CVM for malignant tumor class label prediction. Table 8 will summarize the analysis result of the three-feature selection algorithm with three classifiers. Each row in Table 8 represents the feature selection algorithm and each column represents the classifier. The values in bold displayed in the first row of Table 7 represent that the CVM method gets the best accuracy concerning other algorithms. In contrast, the last column bold indicated values represent that NN gives the best classification results among all classifiers.  These experimental results could help the radiologist perform better analyses of the malignant brain tumor suspect using a computer-aided system with the help of various feature extraction techniques described in this paper. The extracted features are free from rotation, scaling and include the properties of both spatial and frequency domains. Over this, three feature selection algorithms and three classifiers are presented, which help a radiologist better analyze malignant tumors. In the future, frameworks based on hybrid decision support techniques may also be explored for the better brain tumor prediction.
The main advantage of the proposed methodology is to find the most relevant and non-redundant features by using the computationally efficient proposed feature selection algorithm called CVM. It helps to gain the best in-class accuracies when experimented with three state-of-art classifiers like KNN, mSVM, and Neural Network. This gives the new insight in the machine learning framework for problem solving. On the other hand, the limitation of the proposed work is its scalability. The proposed methodology is not tested with some of the open accessible imaging datasets having abnormality characteristics. However, majorthe extension of the proposed methodology with deep learning models can also be one of the limitations that need to be taken care of in near future.

VI. CONCLUSION
In this paper, a study was performed on 110 patients having high-grade malignant brain tumors like Central Neuro-Cytoma (CNC), GlioblastomaMultiforme (GBM), Gliomas (GLI), Intra Ventricular Malignant Mass (IVMM), and Metastasis (MTS). From these 110 patients, the T1-Weighted post-contrast image specification dataset was generated, containing 660 malignant tumor images. Among the generated dataset, 1160 Regions of Interest's (ROI's) are extracted, which are taken into consideration. These ROIs are classified into 760 malicious regions and 400 normal regions. The whole dataset is categorized into six broad domains,includingfive classes for high-grade malignant tumors and one normal class. Over the entire dataset, a 5 x2 cross-validation mechanism is used to partition the dataset into training (50%) and testing (50%) subsets of images.
The main contributions in this work is defined in two folds i.e. the designing of the new feature selection algorithm called CVM and the experimentation of the multiclass brain imaging dataset using benchmark models in machine learning environment. The proposed CVM algorithm is based on variance analysis and thus will retain the features having high cumulative variance up to the threshold limit 99.5%. While in the experimentation part, the selected features are used to train the machine learning classification models and the comparative result analysis is presented by using three benchmark classifiers and three feature selection algorithms.
The experiments were performed by incorporating the diversified features set containing six different domain features. It results in extracting 298 features from the dataset used for classification purposes in the machine learning environment. These features were filtered out using the proposed CVM feature selection algorithm based on cumulative variance. Later, the model training uses all the filtered features using three state-of-art classifiers, i.e., KNN, mSVM, and NN. The average gain in the accuracy is 88.43%, 92.5%, and 95.86%, respectively, using the CVM method. Based on the comparative result, it was also found that the proposed approach is outperforming as compared with other feature selection algorithms.
In the future, the study can be extended towards the use of the deep learning models for the classification of the brain tumor images. In the literature there are significant works on the use of the deep learning models for the binary classification of the brain tumor images. However, the limited work was published on multiclass classification using deep learning. This work can be further extended towards the use of new deep learning model for the multiclass classification of the brain tumor types. Moreover, the work can also extendable enough to include the more number of images in the dataset. Also, the algorithm and the models can be experimented on some gold standard benchmark datasets in future.