VGG-SCNet: A VGG Net-Based Deep Learning Framework for Brain Tumor Detection on MRI Images

A brain tumor is a life-threatening neurological condition caused by the unregulated development of cells inside the brain or skull. The death rate of people with this condition is steadily increasing. Early diagnosis of malignant tumors is critical for providing treatment to patients, and early discovery improves the patient’s chances of survival. The patient’s survival rate is usually very less if they are not adequately treated. If a brain tumor cannot be identified in an early stage, it can surely lead to death. Therefore, early diagnosis of brain tumors necessitates the use of an automated tool. The segmentation, diagnosis, and isolation of contaminated tumor areas from magnetic resonance (MR) images is a prime concern. However, it is a tedious and time-consuming process that radiologists or clinical specialists must undertake, and their performance is solely dependent on their expertise. To address these limitations, the use of computer-assisted techniques becomes critical. In this paper, different traditional and hybrid ML models were built and analyzed in detail to classify the brain tumor images without any human intervention. Along with these, 16 different transfer learning models were also analyzed to identify the best transfer learning model to classify brain tumors based on neural networks. Finally, using different state-of-the-art technologies, a stacked classifier was proposed which outperforms all the other developed models. The proposed VGG-SCNet’s (VGG Stacked Classifier Network) precision, recall, and f1 scores were found to be 99.2%, 99.1%, and 99.2% respectively.


I. INTRODUCTION
A brain tumor is a clump of irregular cells in the brain that forms a mass [1]. The human brain is enclosed by a rigid skull. Any expansion in such a small area will trigger severe issues. Brain tumors can be cancerous and non-cancerous. The pressure within the skull will rise as benign or malignant tumors develop. This will result in permanent brain injury and even death. The Magnetic Resonance Imaging (MRI) picture of a healthy brain is shown in Figure 1 (A), while the picture of a brain containing a tumor is shown in Figure 1 (B). Approximately 700,000 people worldwide have a brain tumor, with approximately 86,000 new cases diagnosed in 2019. Since 2019, 16,830 people have died from brain tumors, with a 35 percent life expectancy [2]. As such, scientists and researchers have been working towards developing sophisticated techniques and methods for identifying brain The associate editor coordinating the review of this manuscript and approving it for publication was Haruna Chiroma . tumors. Although MRI and Computer Tomography (CT) are the two modalities widely used for marking the abnormalities in terms of shape, size, or location of brain tissues which in turn help in detecting the tumors, MRI is preferred more by the doctors. As a consequence, scientists and researchers have more focused on MRI. While identifying brain tumors from MRI images, conventional inspection by physicians is mostly used. However, automated approaches mainly implemented by computer-aided medical image processing techniques are increasingly aiding physicians in detecting brain tumors.
Machine Learning (ML) algorithms gain insight from training data samples and can predict the class label of the unknown data objects. ML algorithms are popularly being used in the field of health-informatics [3]- [5], forecasting pandemic [6], evaluating user experience in playing games [7], predicting shear strength [8]. Similarly, many MLbased studies are conducted on medical images to classify brain tumors [9]- [11]. Medical image processing involves pre-processing (enhancement, filter application, segmentation, feature selection) and post-processing (identification and/ classification) [12]. These steps can be implemented by the conventional machine learning approach as well as the deep learning approach. In the conventional machine learning approach, hand-crafted features are used to obtain results from test images and the process is fast. In the deep learning approach, models are tuned by appropriately selecting the number of layers, activation function, pooling, and sometimes pre-trained models are added for transfer learning. However, in both approaches' metaheuristic algorithms may be used to enhance the classification accuracy. From a broader perspective, this research covers both conventional and deep learning approaches for the identification of brain tumors from MRI images. A significant body of research has been conducted focusing on the detection of brain tumors using MRI Images through machine learning approaches. Although the conventional machine learning approach is faster in comparison to deep learning, the accuracy of the deep learning approach is better than the conventional machine learning approach.
The objectives of this research are, firstly, to explore how various image processing techniques are applied on MRI images for the detection of brain tumors. Secondly, to compare the performance of existing image processing techniques applied on MRI images for the detection of brain tumors. Finally, to propose an efficient technique for the detection of brain tumors using MRI images through machine learning approaches. As a result, this research will provide the expected outcome i.e., efficient image processing technique to detect brain tumors using MRI images through machine learning approach which will assist pathology experts to provide proper treatment.
Later sections of this paper are organized as follows. The related works are briefly introduced in Section 2. Section 3 briefly discusses developing and analyzing ML models. Later on, in Section 4, the performance of the ML models is compared. Finally, the discussion and conclusion are stated in Section 5.

II. LITERATURE REVIEW
This section briefly discusses the studies that are conducted to detect brain tumors using different state-of-the-art technologies. Rehman et al. [2] proposed a new learning-based method for microscopic brain tumor detection and tumor type classification. The first phase of their study was to build a 3D convolutional neural network (CNN) architecture to extract brain tumors, which are then transferred to a CNN model that has already been trained to extract features. The extracted features are fed into a correlation-based selection process, and the best features are chosen as the result. For final classification, these selected features are tested using a feed-forward neural network. Amin et al. [13] employed a Deep Neural Network (DNN) based architecture for brain tumor segmentation. For classification, the proposed model employs 07 layers, including 03 convolutional, 03 ReLU, and a SoftMax layer. Thaha et al. [14] proposed a deep learning method using CNN for the segmentation. This method employs 3 × 3 small kernels for the deep architecture of the CNN model. Intensity normalization and data augmentation have been performed for the preprocessing of images. Kebir et al. [15] proposed a supervised method for detecting the brain abnormalities from the MRI images in three steps, first step is to develop a deep learning CNN model, then a subdivision of brain MRI images is done by the k-mean algorithm followed by brain component classification as normal or abnormal classes according to the developed CNN model. Vinoth et al. [16] proposed a programmed division strategy based on CNN. Here, kernels are used for classification, and SVM classification is performed with the calculated parameters. And, extraction and detection of tumors from MRI scan images of the brain are done by using the MATLAB tool. A three Incremental Deep Convolutional Neural networks 2CNet, 3CNet, and EnsembleNet for automatic brain tumor segmentation have been proposed in [17]. This method adopted the technique of Ensemble Learning and to avoid the hit and trial for training the CNN, they bounded the hyper-parameters to accelerate the training. Mohsen et al. [18] used DNN for classifying a dataset of 66 brain MRIs into 4 classes (normal, glioblastoma, sarcoma, and metastatic bronchogenic carcinoma tumors). A classifier was combined with the DWT and PCA. An automatic brain tumor segmentation algorithm [19] has been proposed using a Deep Convolutional Neural Network. An effective brain tumor segmentation from MRI images has been proposed in [20] by extracting the relevant features from combining the segmented pathological tissues, white matter, gray matter, and fluid (CSF) and then classifying them using the Neural Network model. The comparison has been done by implementing the k-nearest neighbor classifier and Bayesian Classifier.
In summary, it can be said from the literature review that: (a) None of the existing approaches are fully automated as calibration of processing parameters is essential.  this field needs to be carried out in order to fill the research gaps.

III. METHODOLOGY
The research was conducted in three phases: First, different traditional machine learning (ML) models are developed to find the best algorithm for detecting brain tumors. The traditional algorithms were selected based on the literature review of the previous works and the selected algorithms include Convolutional Neural Network (CNN), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Naive Bayes (NB) and K Nearest Neighbors (KNN). In the second phase, fourteen different transfer learning models are developed. To do that, CNN model from the first phase is used as the base CNN model and fourteen different pre-trained models are used on the top layer of the CNN model keeping the target to find out the best pre-trained model. In this phase, for developing each of the transfer learning models, initial weights from the pre-trained models are obtained and these initial weights are used to initialize the base CNN model so that the model can be trained efficiently with a minimum number of epochs. In the last phase, four different hybrid models are proposed to detect brain tumors. For developing each of the hybrid models, features were extracted from the top-performing transfer learning model. Then, the extracted features were used as input/ independent variables for building the hybrid models. The algorithms used in this phase include Stacked Classifier (SC), AdaBoost (AB), CatBoost (CB), and XgBoost (XB). One of the main objectives of analyzing different machine learning algorithms was to find out the top-performing algorithm for brain tumor classification. To achieve this, we thoroughly carried out the following steps: data collection, data synthesis, development of prediction models. These steps are discussed in detail, as follows. The workflow diagram of this research is shown in Figure 2.

A. DATA COLLECTION
In this research, an open-access dataset, which is available on Kaggle, was used for training the models. The open-access dataset contains a total of 253 MRI Images where 140 images were labeled as YES whether the rest of the images were classified as NO. However, for testing the ML models, a separate dataset containing 90 images of samples was used, which was collected from a renowned Pathology Institute in Bangladesh. Additionally, one of the medical experts from the same pathology laboratory, who has 30 years of experience in histology and serving as an Associate Professor in a Medical College, was nominated as the domain expert for this research study. The test dataset was meticulously labeled by the nominated domain expert.

B. DATA SYNTHESIS
In the second step, the collected images from the first step were synthesized. First, duplicate images were removed from the dataset. Second, completely black portions of each of the images were removed. To do that for each of the images, contours are identified on the top, bottom, left, and right direction based on the presence of the black regions. Each image was cropped based on these four contours. Hence, any portion out of these contours is removed from the images. Thus, the cropped image set contains the region of interest for each of the images. There was a class imbalance problem, the number of yes and no in the dataset was not the same, in the combined dataset. Therefore lastly, data augmentation was done to handle this issue. As the number of images labeled as 'no' was less than the number of images labeled as 'yes', for each image, which was labeled as 'yes', 10 new images were generated, on the other hand for each image which was labeled as 'no', 7 new images were generated. Figure 3 shows the steps required in cropping a specific image data.

C. CONVENTIONAL ML MODEL DEVELOPMENT
In this step, different algorithms are chosen based on the recent works relating to brain tumor classification, which included CNN, SVM, RF, DT, NB, and KNN. The models were developed using scikit-learn, which is a Python module integrated into a wide range of machine learning algorithms. Each of the algorithms and their working procedure is analyzed in detail in the following sub-sections.

1) CONVOLUTIONAL NEURAL NETWORK (CNN)
Due to the secret potential to use the geometric of the images, CNN's main applications are in the field of image processing [20]. CNN outperforms many strategies in graph analysis [21]. It combines three architectural ideas: local receptive fields, shared weights, and spatial or temporal subsampling. The architecture of the proposed CNN model is shown in Figure 4. The proposed CNN structure for the study holds a total of seven layers which are as follows. First, the input layer: image features are given as input in this layer. Second, convolutional layer: 32 filters/ layer was used while having a 7 × 7 size kernel. Basically, in a convolutional layer, a filter is applied to the images and unnecessary details are removed while keeping the relevant information. Nonetheless, the sigmoid was used as the activation function in this layer. Third, max pooling layer: max pooling is performed in this layer with a pool size of 4×4. Fourth: dropout layer with a dropout rate of 50%, this layer removed 50% neurons randomly from the whole neural network architecture. Fifth, flatten layer: which is the function that converts the pooled feature map to a single column that is passed to the fully connected layer. Sixth, dense layer, which is a fully connected layer where the features from the previous layer are given as input. A sigmoid function was used as an activation function for this layer. Seventh, output layer, which gives the prediction probability that whether a particular MRI image contains a tumor or not. For the prediction probability is more than 50%, a particular image was considered to be containing a brain tumor else the image was considered not containing a brain tumor.

2) SUPPORT VECTOR MACHINE (SVM)
SVM is a supervised machine learning algorithm that can be used to solve classification and regression problems [22]. This algorithm creates a hyperplane (or a group of hyperplanes) in a high-or infinite-dimensional space to determine the best boundary between the potential outputs. In general, the aim is to find a hyperplane in n-dimensional space that maximizes the isolation of data points from their possible groups. SVMs can accommodate these wide feature spaces because they use overfitting protection that isn't dependent on the number of features.

3) RANDOM FOREST (RF)
The RF classifier is an ensemble approach that uses bootstrapping and aggregation to train multiple decision trees in parallel, the process known as bagging [23].

4) DECISION TREE (DT)
DT is a well-known hierarchical machine learning method for the prediction that employs a tree-like model of decisions and their potential outcomes [24]. Each internal node (not a leaf node) of DT evaluates an attribute, each branch represents the test results, and each leaf node (or terminal node) specifies the class label.

5) NAÏVE BAYES (NB)
Naive Bayes is a classification algorithm that is both supervised and statistical in nature [25]. It predicts the class of an undefined data set using Bayes' probability theorem. It calculates membership probabilities for each class, such as the probability that a given record or data point belongs to a VOLUME 9, 2021 certain one. The most possible class is the one with the greatest probability.

6) K NEAREST NEIGHBOR (KNN)
KNN is a supervised machine learning algorithm that predicts the values of new data points using ''feature similarity'' [26]. This stores the feature vectors and class labels of the training samples in the training phase, and the unlabeled sample is categorized in the classification phase by assigning the class label based on the most frequent of the k training samples closest to the query point, where k is a defined constant.

D. DEEP ML MODEL DEVELOPMENT
In this section, fourteen different pre-trained models are used to build fourteen different transfer learning models. Transfer learning is a machine learning technique in which a model created for one job is utilized as the basis for a model on a different task. The pretrained models are used to initialize the weights of the CNN model in this research include: VGG-16, VGG-19, Xception, ResNet152, ResNet50V2, ResNet101V2, Incep-tionV3, InceptionResNetV2, MobileNet, MobileNetV2, DenseNet121, DenseNet169, DenseNet201 and NASNetMobile. Among the pre-trained models, VGG-16 was found to be the best performing, which is analyzed in detail in section IV. The overview of the VGG-16 pre-trained model is provided in the following subsection.

1) VGG16
VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper ''Very Deep Convolutional Networks for Large-Scale Image Recognition''. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous models submitted to ILSVRC-2014. It improves AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3 × 3 kernel-sized filters one after another. The architecture of the VGG-16 model is presented in Figure 5.

E. HYBRID MODEL ANALYSIS
In this section, different hybrid models are developed for detecting brain tumors. To develop the hybrid models, the features were extracted from the second last (sixth) layer of the top-performing transfer learning model (VGG16_CNN) as it provides the second most reduced set of features. In total there were 32 features in that layer. These features were then used as input for developing four other models, such as AdaBoost, CatBoost, XGboost, and the Stacked classifier. However, due to high time and space complexity, developing these models considering all the features of the images was not feasible. Thus, the VGG16_CNN model was used as a feature selection method to gain a reduced feature set, which is representative of the whole feature set. Nonetheless, these models are analyzed in detail in the following subsections.

1) ADABOOST
AdaBoost, short for Adaptive Boosting, is an AI metacalculation that can be utilized related to numerous different sorts of learning calculations to improve execution. The yield of the other learning calculations ('feeble learner') is consolidated into a weighted whole that addresses the last yield of the supported classifier. AdaBoost is versatile as in ensuing feeble learner are changed for those occasions misclassified by past classifiers. In certain issues, it very well may be less vulnerable to the overfitting issue than other learning calculations. The individual learner can be feeble, however as long as the presentation of everyone is somewhat better compared to arbitrary speculating, the last model can be demonstrated to join to a solid learner.

2) CATBOOST
CatBoost is an open-source programming library created by Yandex. It gives a slope-boosting structure that endeavors to tackle categorical highlights utilizing a stage-driven option contrasted with the old-style algorithm. CatBoost has acquired ubiquity contrasted with other angle boosting calculations fundamentally because of the accompanying highlights: (a) Ordered Boosting to defeat overfitting. (b) Native dealing with clear-cut highlights. (c) Using Oblivious Trees or symmetric trees for quicker execution.

3) XGBOOST
XGBoost is a decision tree-based ensemble Machine Learning algorithm built on a gradient boosting framework. Artificial neural networks outperform all other algorithms or  systems in prediction problems involving unstructured data (images, text, etc.). However, decision tree-based algorithms are currently considered best-in-class for small-to-medium structured/tabular data.

4) STACKED CLASSIFIER
A 2-layer stacked classifier was built on 32 features, which was selected by the neural network model. The architecture of the proposed stacked classifier is shown in Figure 6.
Three basic ML algorithms, namely SVM, Multi-Layer Perceptron (MLP), and RF were kept as the first layer of the SC model whereas, a Logistic Regression (LR) model was used in the second layer of the stacked classifier model. For each of the observations/ samples in the dataset, the verdicts are obtained from three different algorithms. The verdicts obtained from these algorithms are used as input for the second layer LR model. Then, the final verdict was obtained from the second layer model.

IV. RESULTS
To evaluate the performance of each prediction model, a separate test set was used which contains a total of 90 images, where 81 images were labeled as YES, on the other hand, the rest of the images were labeled as NO. The performance of each prediction model was measured in terms of its precision, recall, and F1 scores. The experiments have been conducted on a local machine platform, which provides the following specifications ( Table 1).
Each of the evaluation parameters was obtained by performing macro averaging on the actual and model-predicted class labels, which calculated parameters for each class label and found their unweighted means. Since the resampling method had been utilized to balance the classes, accuracy was not considered a performance metric for evaluating the performance of the classifiers as the literature had shown that accuracy was not an appropriate metric to use in such a case [27]. The performance of running each of the conventional machine learning algorithms is presented in Figure 7.
The best training performance was obtained by CNN, RF, DT, and KNN followed by SVM and DT (see Figure 7). However, the best test performance was obtained by CNN, followed by SVM, RF, DT, NB, and KNN. From the selected algorithms, CNN achieved 100% train performances for different performance metrics including precision, recall, and f1-score, while it obtained an 88.7% precision, recall, and f1-score for each in analyzing the performance on the test dataset. For the SVM algorithm, the precision, recall, and f1-score for the training dataset were 97.9%, 96%, and 96% respectively whereas, the precision, recall, and f1 score for VOLUME 9, 2021 the test dataset were 94.6%, 85.4%, and 89.7%. The RF algorithm also had a promising performance on the training dataset as the value of all the evaluation parameter were 100%. The test performance of RF was 86.6%, 80%, and 82.8% for the precision, recall, and f1 scores respectively. Also, For the DT model, values of all the evaluation parameters for the training dataset were 100% whereas, the values of the evaluation parameters for test data for precision, recall, and f1 score were 88.5%, 73.3%, and 78.7% respectively. The train performance was lowest for NB algorithm among all the algorithms as the values of precision, recall and f1score were 63.5%, 61.1%, and 60.5%. However, the value of the precision, recall, and f1 scores for the test dataset considering this algorithm were 85.5%, 61.1%, and 69.5%. Again, the result for the KNN algorithm shows that precision, recall, and f1-scores for the training dataset were 100% whereas the precision, recall, and f1 score for the test dataset were 84.7%, 55.6%, and 65%. It can be observed that the lowest f1 score has been obtained by the KNN algorithm.
The evaluation result for the deep ML-based pre-trained models is presented in Figure 10, while all the scores obtained from those models are shown in Table 2. It can be observed from Figure 10 that the best performance for both the train as well as the test dataset was obtained by the VGG-16 pretrained model. The precision, recall, and f1 score of the VGG-16 model on the train dataset was 100%, while the precision, recall and f1 score on the test dataset was 97.8%. The confusion matrix for the test and train data of the VGG-16 model is highlighted in Figure 8, while the accuracy vs. loss curve for each epoch of the model is shown in Figure 9.
Again, for the final phase of this research, the performance of the hybrid models on the test data is represented in Table 3. Based on the results it is evident that Stacked Classifier (SC) achieves the best score and outperforms others, available hybrid model. However, precision, recall, and f1 score for the AdaBoost model were 94.2%, 93.3%, and 93.7%. The performance measures for the CatBoost model were 93.9%, 94.4%, and 93.9%. And, for the XgBoost model, precision, recall, and F1 scores were 95.6%. Thus, the best performance was obtained by the SC model with 99.1%, 98.9%, and 99.2% of precision, recall, and f1 scores respectively.

V. DISCUSSION
This research yielded three clearly defined outcomes. First, the best performing ML model was identified for classifying brain tumors from MRI images by applying different classical algorithms, subsuming: CNN, SVM, RF, DT, NB, and KNN, on the labeled dataset. It was found that CNN with seven fine-tuned layers gives better performance (88.7% F1 Score) than other ML algorithms. Second, among 14 different pretrained DL models, the top-performing pre-trained model was identified by having these pre-trained models on top of the proposed CNN architecture. In this study, VGG-16 (F1 Score 97.8%) was found as the best pre-trained model for classifying brain tumor images. Third, four different hybrid models are proposed by extracting features from the second last layer of the VGG-16_CNN transfer learning model. The proposed Stacked Classifier (SC) hybrid model provides the best performance than all the other models.
While carrying out the study, the following issues are identified: firstly, there is no Benchmark Dataset that can be used to compare existing approaches for the detection of brain tumors from MRI images. Thus, there is a scope of data collaboration. The dataset made available for this study may be used as Benchmark Dataset (with prior permission from the originator). Secondly, a huge volume of VOLUME 9, 2021 data was not available which is a prerequisite for developing the deep learning models. Thirdly, extensive hyperparameter tunning can help to obtain better performance for ML and DL models as by doing so better performances are obtained for the developed models in this study. Fourthly, there is a trade-off between the algorithmic performance and the time complexity; better performances are obtained by the DL techniques, where the time complexity of these techniques was very high, meaning that it required a colossal amount of time to obtain the results on these techniques. On the other hand, the time complexity of the ML algorithms was comparatively low. Fifthly, the performance of classification largely depends on the dataset and techniques that are being adopted; different performances can be obtained on the same dataset while applying different state-of-the-art methodologies; the same methodology provides different performances for the different datasets. Nonetheless, a comprehensive overview of similar research outcomes with the proposed model has been further designed to understand the state-of-the-art methodologies and their performance metrics for the classification of Brain Tumor (Table 4).
This research has the following limitations. First, the number of images considered in this study contains a limited number of images where using more images would make the study more robust. Second, no traditional image processing-based classification techniques were analyzed in this study. Third, the current study only classifies brain tumors from 2D data. Fourth, this study only focuses on classifying MRI images into tumorous and non-tumorous, no grading of tumors (e. g. Grade 1, 2, 3, 4) is done which helps the clinicians to determine its size, whether it has spread and the best treatment options available. Therefore, future studies may include: (a) working on both ML and image processing techniques with more images for classifying brain tumors from MRI images, (b) classifying brain tumors can be done on both on 2D all as 3D data, (c) grading of tumors can be obtained if the colossal amount of data is available (collaboration with hospitals may ensure the availability of huge volume of data), and (d) analyzing the computational complexity of the proposed models. Furthermore, a Benchmark Dataset can be contributed by the appropriate Medical Authority for carrying out comparisons of the existing approaches (classical image processing, and ML-based classification methods).

VI. CONCLUSION
MRI-based medical image analysis for brain tumor studies has been gaining attention in recent times due to an increased need for efficient and objective evaluation of large amounts of medical data. Because of the high death rate linked with brain tumors, it is critical to diagnose them early to treat them and reduce mortality. Manual diagnosis of the brain and tumor tissues is time-consuming and operator-dependent due to the intricacy of brain tissue. Therefore, in this research, an effective transfer learning-based SC model, which was named VGG-SCNet, was proposed to classify brain tumors from MRI Images. The F1 score obtained by the proposed classifier shows the efficacy of the approach followed by this study.