Ensemble Learning Driven Computer-Aided Diagnosis Model for Brain Tumor Classification on Magnetic Resonance Imaging

Brain tumour (BT) detection involves the process of identifying the presence of a brain tumour in medical imaging, such as MRI scans. BT detection often relies on medical imaging techniques, such as MRI (Magnetic Resonance Imaging), CT (Computed Tomography), or PET (Positron Emission Tomography) scans. Early detection of BT is important and MRI is one of the primary imaging techniques used to diagnose and assess BT. Deep learning (DL) techniques, particularly convolutional neural networks (CNNs) have shown promising results in assisting with BT detection on MRI scans. This study designs an Ensemble Learning Driven Computer-Aided Diagnosis Model for Brain Tumor Classification (ELCAD-BTC) technique on MRIs. The presented system purposes to detect and classify various steps of BTs. The presented system contains a Gabor filtering (GF) approach to remove the noise and increase the quality of MRI images. Moreover, ensemble learning of three DL models namely EfficientNet, DenseNet, and MobileNet is utilized as feature extractors. Furthermore, the denoising autoencoder (DAE) approach can be exploited to detect the presence of BTs. Finally, a social spider optimization algorithm (SSOA) was carried out for the hyperparameter tuning of the DL models. For simulating the improved BT classification outcome, a brief set of simulations occur on BRATS 2015 database.


I. INTRODUCTION
Brain tumours (BTs) are considered to be a serious and potentially life-threatening form of cancer [1]. Frequent headaches, memory loss, difficulty in concentration, seizures, coordination, and speech problems are some of the major symptoms The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang . of BT. Based on the rate of growth, origin, and progression state, BTs are classified into different grades [2]. The detection of BTs at earlier stages and categorizing them into a certain grade is of high importance for the best treatment. Amongst the presented imaging technologies, BT is considerably identified using Magnetic Resonance Imaging (MRI) without medicinal brain surgery [3]. The MRI system is considered a non-invasive medical imaging technique and is pain-free which attains high-resolution images of tumours. In addition, MRI is considered the best medical imaging approach for BTs detection, due to its high-resolution property [4]. At present, many automated techniques are established by research workers for the detection of BTs. Mostly, the present system is carried out based on machine learning (ML) algorithms that involve supervised and unsupervised learning approaches [5].
In recent times, deep learning (DL), which is a subgroup of ML demonstrated high effectiveness as a conventional method, particularly in classification and segmentation drawbacks [6]. The convolutional neural network (CNN) model is now quickly expanded due to its performance limits. CNN is a kind of DL that is used in evaluating visual imagery and generally needs minimal preprocessing [7]. The CNN model offers improved accuracy and feature learning used to categorize different grades and types of BTs compared with traditional ML [8]. Medical image classification represents the concept where the images are classified into various types based on the lesion type observed in images using a supervised learning technique. When the training process is implemented by a set of images, the classifier is used in succeeding machine-based healthcare diagnoses. Lately, BT classification was performed using ML and imaging techniques [9]. CNNs use a convolution operator in multiple layers of the network rather than matrix multiplication and subsequently contributed towards the priority of the convolution network in solving disadvantages with greater computation value [10].
This study designs an Ensemble Learning Driven Computer-Aided Diagnosis Model for Brain Tumor Classification (ELCAD-BTC) technique on MRIs. The presented system contains a Gabor filtering (GF) approach to remove the noise and increase the quality of MRI images. Moreover, ensemble learning of three DL models namely EfficientNet, DenseNet, and MobileNet is utilized as feature extractors. Furthermore, the denoising autoencoder (DAE) approach can be exploited to detect the presence of BTs. Finally, a social spider optimization algorithm (SSOA) was executed for the hyperparameter tuning of DL models. For simulating the improved BT classification outcome, a brief set of simulations occur on BRATS 2015 database.

II. RELATED WORKS
In [11], a deep (DCNN) EfficientNet-B0 base architecture is finetuned with the presented layer to effectively detect and classify BT images. The image enhancement technique was applied by the different filters to optimize the image quality. The data augmentation method is exploited for increasing the data sample for further training. Toğaçar et al. [12] introduced a new CNN model termed BrainMRNet. This model is based on the hypercolumn technique and attention module; it has a ResNet. Initially, a set of images are pre-processed in BrainMRNet. In the second phase, they are transported to the attention module through the image augmentation technique for all the images. The attention module chooses a significant area of the image and the images are transported to the convolution layer. In [13], a DL-based technique that applied various modalities of MRI is proposed for the BT classification. The presented hybrid CNN architecture uses a patch-based technique and considered contextual and local while forecasting output labels. The presented method handles overfitting problems by using batch normalization and dropout regularize, while data imbalance problems are addressed by using a two-stage training model.
In [14], the authors proposed a three-phase pre-processing technique for enhancing the quality of MRI images, alongside a new DCNN framework for robust diagnoses of glioma, meningioma, and pituitary. The framework makes utilize Batch Normalization (BN) for faster training with the highest rate of learning and alleviates initialized layer weight. The presented model is a computationally light-weighted method with a smaller amount of convolutional layers, a max-pooling layer and trained iterations. In [15], the authors presented an incorporation of ANN and Fuzzy K-means algorithm for classifying the tumour locale. It encompasses four stages, (1) Feature selection and extraction (2) Noise evacuation (3) Segmentation and (4) Classification. At first, the procured images are denoised using a wiener filter, and later the considerable grey level co-occurrence matrix (GLCM) attribute was extracted from the image. Next, DL-based classification was implemented to categorize the abnormal image from the normal image. At last, it can be managed by employing the Fuzzy K-Means approach to classifying the tumour area.
Gull et al. [16] present a novel architecture for the diagnosis of BT using MRI scans. The architecture depends on the transfer learning and Fully CNN (FCNN) approaches. The presented architecture contains five stages that are transfer learning, skull stripping, preprocessing, CNN-based tumour classification, and post-processing-based BT binary classification. For the classification of BT images, the presented framework is applied, and for post-processing, the global threshold approach was exploited for eliminating tiny non-tumour areas that improved segmentation performance. Gupta and Gupta [17] suggested a method for the fully automatic classification of BT. In the presented work, a unique ensemble of CNN (ConvNet) for glioma segmentation in MRI. Two fully connected ConvNets established the ensemble method (2D and 3D ConvNets).
The existing models do not focus on the hyperparameter selection process which mainly influences the performance of the classification model. Particularly, hyperparameters such as epoch count, batch size, and learning rate selection are essential to attain effectual outcomes. Since the trial and error method for hyperparameter tuning is a tedious and erroneous process, metaheuristic algorithms can be applied. Therefore, in this work, we employ the SSOA algorithm for the parameter selection of the DAE model.

III. THE PROPOSED MODEL
In this manuscript, we have introduced a novel ELCAD-BTC system for accurate BT classification using MRIs. VOLUME 11, 2023 The proposed ELCAD-BTC technique exploits the ensemble learning concept to detect and classify various phases of BTs. The presented system encompasses GF-based noise elimination, ensemble feature extraction, DAE-based classification, and SSOA-based hyperparameter tuning. Fig. 1 represents the working process of the ELCAD-BTC algorithm.

A. GF-BASED NOISE REMOVAL
At the preliminary level, the GF approach is used to eradicate the noise in the MRI. Fourier transform is an effective mechanism in processing signals that might be useful to transform images from spatial to frequency domains [18] and extract features that are not easier for extracting in the spatial domain. But after FT, frequency features of an image at dissimilar positions are frequently combined, however, GF is capable of extracting spatial local frequency features which is a robust texture detection technique. The GF is calculated by multiplying a Gaussian with the cosine function as follows: The overall process of the ELCAD-BTC approach.
× cos i 2π where, x ′ = xcos θ + ysinθ, y ′ = −xsinθ + ycosθ. Where, x, and y describe the coordinate location of the pixel, λ signifies the wavelength of the filter, θ characterizes the tilt degree of the Gabor kernel image, φ is the phase offset, σ denotes the standard deviation of the Gaussian function, and γ symbolizes the aspect ratio.

B. ENSEMBLE LEARNING-BASED FEATURE EXTRACTION
In this work, ensemble learning of three DL models namely EfficientNet, DenseNet, and MobileNet is utilized as feature extractors. Assumed the number of classes as n and D base classification model for voting, the prediction class c k of weighted voting to all the instances, k is expressed as: Now, ji characterizes the binary parameter. If the i th base classification orders the k instances as j th classes, next ji = 1; if not, ji = 0.w i signifies the weight of i th base classification in an ensemble.

1) EFFICIENTNET MODEL
EfficientNet is a class of DL techniques which are scaled for balancing width, depth, and input data of the network resolve for achieving a better efficiency-trained time trade-off [19]. Previously the EfficientNet came along, the most frequent manner for scaling up CNN was each one of 3 dimensions: Depth (count of hidden layers): while a deeper network provides optimum image classifier accuracy, it is also more complex for training because of the famous vanishing gradient problems. Accuracy obtains rapidly reduce above a particular depth.
Width (count of channels or filters): although simpler for training and capable of capturing fine-grained features, it encounters problems from the capture of superior-level image content.
Image resolution (image size): the improved resolution of input imageries from the rule offers CNN further data.
EfficientNet instead of executing Compound Scaling, scales simultaneously all 3 dimensions, image resolution, depth, and width, whereas preserving a balance betwixt every dimension of networks.

2) DENSENET MODEL
The DenseNet201 model makes use of a condensed network to optimize the efficiency and construct simple-to-train, extremely parametrical, and robust models [20]. The denseNet201 model has been well-performed on datasets such as CIFAR-100 and ImageNet. The denseNet201 model provides a direct connection that spans from one layer to the other, which enhances connectivity. The feature mapping layer 0 via 1 is fused with the single tensor to make the application easier. A transition layer is a network structure module. After these layers derive a 1-1 convolutional layer and then the -2 BN pooling layers. The ''H'' hyperparameter set the rate of growth of the DenseNet201 model and illustrates that dense structure optimizes efficiency. Notwithstanding its moderate growth rate, DenseNet201 performs well since its structure applies feature mapping. Fig. 2 demonstrates the infrastructure of the DenseNet model. Accordingly, the existing layer has access to every mapping function in the preceding layer. The amount of input feature mapping at every layer, represented as ''fm'', might be evaluated by the following: for all the layers, (fm) I = H0 + H1, the ''H'' feature map relates to the global state. The input layer channel comes from H0. Every 3 × 3 convolutional layer has an additional 1 × 1 convolutional layer to accelerate processing. This decreases the size as the input feature map is usually greater than the output. The 1 × 1 convolution established feature mapping before it is named a ''bottleneck''. The property of images is used for calculating the probability of a segment performing in an image. The dropout layer and activation function are commonly used for creating non-linearity and decreasing overfitting. Two deep layers of 64 and 128 neurons are used for categorizing the data, correspondingly. Sigmoid activation was employed before applying the DenseNet201 model for dual classification. This can be used to enhance performance. All the neurons in the brain were FC before the thick FC layer, although it is in distinctive layers. For the FC layer, it can be applied mathematical expression to convert 2D feature maps into 1D feature vectors. The Bernoulli function generates vi = 1 with a specified probability and 0-1 distribution. Bernoulli can produce this random vector. Dropout stops arbitrary neurons from firing from the initial 2-FC levels. The mathematical expression of the sigmoid function is characterized by: S = (1/(1 + e(−xz)))(1). S signifies the output neurons. The weights and inputs in that sequence reflect xi and zi parameters.

3) MOBILENET MODEL
The MobileNet structure was utilized for feature extraction in this study. Mainly, CNN was collected from convolution, fully connected (FC) input, pooling, and output layers [21]. In comparison to the traditional neural network, it features weighted sharing, local connection, and downsampling. It might effectually minimize the network parameter, avoid overfitting, and optimize the effectiveness of eliminating local features. The convolution layer was a major component of the CNNs, and the local extracting feature was identified by interconnecting the input of all the neurons to the local sensing area of the previous layer. The convolution function is categorized into activation and convolution layers, and it is computed by the following expression: In Eq. (6), C and T correspondingly signify the input and the resultant of the convolutional layer; r and s correspondingly characterize the sequential number of convolution kernels, and the channel amounts; w and b characterize the weight and bias of convolution kernels; f k shows the activation function of k th layers; and x, y, and z characterize the dimensional of the input dataset.

C. BT CLASSIFICATION
To classify the existence of BT, the DAE model is used. DAE model is the extended edition of AE which aims at recovering the original dataset in noise corrupted dataset [22]. DAE model was depending on the fact that the data preserves its fundamental features, although it is destroyed partially. Hence, the DAE is capable of recovering the original dataset from the noise-added input. DAE is well-established for noise filtering, recovery of voice or image, and typo correction amongst other applications. It comprises two parts. The former is an autoencoder that is only decomposed as encoded and the latter is decoded part. The encoded part is entitled Mani folding learning, which successively decreases the size of the input dataset. Consequently, the core of the original dataset named the hidden values has sufficient data about the original dataset, is attained. For the provided dataset x∈R D , the encoder process, f θ is formulated by.
In Eq. (7), W indicates the d×D dimension weighted matrices, b shows the d dimension bias vector, h represents the d dimension hidden value, and s indicates the activation function.
The decoded is the reverse procedure of encoded that is named generative learning system. The decoded part makes use of successively enhancing layers and recovers the original dataset in the resultant encoded. The formula for the decoded, g θ ′ is given below: VOLUME 11, 2023 In Eq. (8),x∈R D denotes the recovered dataset from the decoded input or, equally, the encoded output, h, and parameter weighted θ = {W , b} and θ ′ = {W , b ′ } are assessed by the learning algorithm. The AE learn θ and θ ′ weight parameters by minimalizing the loss function L(θ, θ that measures the comparison among x and X . In the proposed model, the MSE function of Eq. (9) was assumed as a loss function to approximate the missing value. Assumed the trained data {x 1 , . . . ,x N }, the loss function is minimalized by upgrading θ and θ ′ via the BP model that depends on the GD technique.
Another part of DAE is the addition of noise to the raw dataset. After this step, the noise dataset is selected as x. The noise dataset is produced through the stochastic corruption process X ∼q D (x|x). With these processes, nearly half of the input dataset is substituted randomly with 0. Then, the noise dataset is fed into the AE model that producesx. Meanwhile, the loss function can be determined by the errors among the reconstructed data X and the original data x, as the training proceeds, it takes noise input X and provides the output same as the original dataset x. In other words, the model act as a noisy filter.

D. HYPERPARAMETER TUNING
Finally, the SSOA is utilized for the optimum hyperparameter adjustment of the DAE algorithm. SSOA is assumed a bio-simulated meta-heuristic system which simulates the procedure of spider colonies [23]. All the members in the colony are both male and female. All the spiders define the possible solution to problems. The count of female spiders (FSP) (arbitrarily chosen from the range of 65% to 90% of every spider) is measured in Eq. (10).
where N represents the number of spider positions (solutions). The population comprises female f i and male m i . The count of male spiders (MSP) N m as in Eq. (11).
The position of FSP and MSP are evaluated as in Eqs. (12) and (13), correspondingly.
where p low denotes the low primary parameter bound and p high refers to the high primary parameter bound to FSP and MSP. The estimates of FSP and MSP can be measured by Eq. (14).
In which J (S i ) signifies the fitness value attained by position s i and is also by the maximal and minimal values of solutions from the population. All the vibrations are created dependent upon the weight and distance of all the created spiders. The vibration (communicated data) of the spider i in spider j, V i b i,j can be defined in Eq. (15).
whereas d ij denotes the Euclidean distance betwixt spider i and j. During all the iterations, an attraction or dislike is projected to MSPs by FSP based on its vibration (the weighted and distance). All the FSPs search for stronger vibration. Thus, when it appears, the Euclidean distance was computed. Afterwards, the distance around the individual's spider is then computed and indexed to the direct way. Therefore, the movement was carried out by the FSP dependent upon that distance. The attraction or dislike movements are created movement was created as in Eq. (16).
where α, β, δ and rand signify the arbitrary numbers betwixt 0 and 1, and S c and S b imply the adjacent member to i that has the maximum weighted and optimum spider from the colony correspondingly. The location of MSPs can be evaluated as in Eq. (17) In which, S f implies the adjacent FSP to MSP i and W denotes the median weighted of MSP populations. The MSPs can be separated into two classes like dominate whose weighted value is over the median, and non-dominate whose weighted value was below the median value. The mating function was carried out by dominant male as well as female members. When the dominant male m g spider determines an FSP within a r (mating range), afterwards, it is mated and a novel brood was designed. The mating function was computed as in Eq. (18).
j ) 2·n (18) whereas n refers to the problem as dimensional, and l high j signifies the upper bound, l low j denotes the lower bound. Afterwards, the novel spider was created, then related to the worse spider. Once the novel spider was optimum, afterwards, the newer exchanges it. This function was iterated still accomplishing an optimum weight to everyone and convergence to a better solution.
The fitness choice is a key aspect of the SSOA algorithm. An encoding result was applied for evaluating the fitness of candidate solutions. At present, the accuracy value is the central condition employed to design a fitness function.
where TP stands for the true positive and FP defines the false positive value.

IV. RESULTS AND DISCUSSION
The proposed model is simulated using Python 3.6.5 tool with different Python Packages such as tensorflow(GPU-CUDA Enabled), keras, numpy, pickle, matplotlib, sklearn, pillow, and opencv-python. The proposed model is experimented on PC i5-8600k, GeForce 1050Ti 4GB, 16GB RAM, 250GB SSD, and 1TB HDD. In this section, the BT classification outcome of the ELCAD-BTC system is investigated utilizing the BraTS 2015 database. To enhance the size of the dataset, the data augmentation process is involved in different ways: Rotation, Cropping, Flipping, Translation, and Color Space. The dataset holds 1320 images with two classes as represented in Table 1.  In Table 2, an overall BT classifier result of the ELCAD-BTC approach with 80:20 of TRS and TSS is given. On 80% of TRS. The ELCAD-BTC technique has proficiently recognized benign and malignant samples. For instance, on benign class, the ELCAD-BTC technique has obtained accu bal of 99.49%, prec n of 97.52%, reca l of 99.49%, F score of 98.50%, AUC score of 98.13%, and MCC of 96.55%. Furthermore, on malignant class, the ELCAD-BTC system has attained accu bal of 96.76%, prec n of 99.33%,  In Table 3, an overall BT classifier outcome of the ELCAD-BTC method with 70:30 of TRS and TSS is given. On 70% of TRS, the ELCAD-BTC method has proficiently recognized benign and malignant samples. For example, on benign class, the ELCAD-BTC method has attained accu bal of 99.01%, prec n of 99.40%, reca l of 99.01%, F score of 99.21%, AUC score of 99.15%, and MCC of 98.25%. Furthermore, on malignant class, the ELCAD-BTC system has attained accu bal of 99.28%, prec n of 98.81%, reca l of 99.28%, F score of 99.05%, AUC score of 99.15%, and MCC of 98.25%.   The training accuracy (TAY) and validation accuracy (VAY) of the ELCAD-BTC technique are inspected on BT efficiency in Fig. 4. The result implied that the ELCAD-BTC method has depicted higher performance with the highest values of TAY and VAY. It can be noticeable that the ELCAD-BTC system has obtained maximal TAY outcomes.
The training loss (TLSS) and validation loss (VLSS) of the ELCAD-BTC technique are tested on BT efficiency in    An obvious precision-recall (PR) examination of the ELCAD-BTC system under the test database is portrayed in 91404 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  A comprehensive ROC outcome of the ELCAD-BTC approach on the test database is represented in Fig. 7. The outcomes indicated the ELCAD-BTC system has demonstrated its capability in categorizing two classes.
A brief comparative outcome of the ELCAD-BTC algorithm with other DL systems is made in Table 4 and Fig. 8 [24]. The experimental outcome stated that the novel 3D-CNN and VGG-19 models attain reduced accu y of 89.5% and 90.70% correspondingly.
On the contrary, the Inception-v3 and fine-tuned VGG-19 models have resulted in closer accu y of 95.60% and 94% respectively. Although the novel 2D-CNN and 3D-CNN models have managed to accomplish reasonable accu y of 98% and 98.32%, the ELCAD-BTC technique demonstrates a maximum accu y of 99.24%. These outcomes confirmed the effectual performance of the ELCAD-BTC system on BT classification.

V. CONCLUSION
In this manuscript, we have introduced a novel ELCAD-BTC system for accurate BT classification using MRIs. The proposed ELCAD-BTC technique exploits the ensemble learning concept to detect and classify various steps of BTs. The presented system contains a GF approach to remove the noise and increase the quality of MRI images. Moreover, ensemble learning of three DL models namely EfficientNet, DenseNet, and MobileNet was exploited as feature extraction. Furthermore, the SSOA with the DAE model is exploited to detect the presence of BTs. For simulating the greater BT classification result, a brief set of simulations occur on BRATS 2015 database. The enhanced results of the ELCAD-BTC approach show its promising performance on BT classification. In the future, the presented ELCAD-BTC technique can be extended to three-dimensional MRI for accurate BT classification performance.