On the Performance of Deep Transfer Learning Networks for Brain Tumor Detection Using MR Images

A brain tumor need to be identified in its early stage, otherwise it may cause severe condition that cannot be cured once it is progressed. A precise diagnosis of brain tumor can play an important role to start the proper treatment, which eventually reduces the survival rate of patient. Recently, deep learning based classification method is popularly used for brain tumor detection from 2D Magnetic Resonance (MR) images. In this article, several transfer learning based deep learning methods are analyzed using number of traditional classifiers to detect the brain tumor. The investigation results are based on a labeled dataset with the images of both normal- and abnormal brain. For transfer learning, seven methods are used such as VGG-16, VGG-19, ResNet50, InceptionResNetV2, InceptionV3, Xception, and DenseNet201. Each of them is followed by five traditional classifiers, which are Support Vector Machine, Random Forest, Decision Tree, AdaBoost, and Gradient Boosting. All the combinations of deep learning based feature extractor and classifier are investigated to evaluate the relevant performance in terms of accuracy, precision, recall, F1-score, Cohen’s kappa, AUC, Jaccard, and Specificity. Later on, learning curves for all of the combinations that achieved the highest accuracies were presented. The presented results show that the best model achieved an accuracy of 99.39% with a 10-fold cross validation. The results presented in this article are expected to be useful for the selection of suitable method in deep transfer learning based brain tumor detection.


I. INTRODUCTION
A tumor is caused by an abnormal growth of cells that has no purpose. In the case of benign tumors that do not invade surrounding tissues and thus, they grow in a contained area. However, if such tumors grow near to a vital area, they can still cause troubles. On the other hand, malignant tumors grow and spread in such a way that can cause life-threatening cancerous disease. When the majority of the cells are damaged or old, they are removed or replaced with new cells. It may cause problems if the damaged or old cells are not removed. The development of a mass of tissue, which refers to the growth or tumor, is often the result of the creation of additional cells. Because of the size, shape, position, and form of tumor in the brain, the identification of brain tumor is a challenging task. In particular, early-stage brain tumor diagnosis is quite The associate editor coordinating the review of this manuscript and approving it for publication was Yu Zhang. difficult due to the lack of precise information about tumor's size results from low resolution image of tumor areas. The patients can be treated in good way if the tumor is detected and treated early in the tumor formation process. As a result, tumor treatment is highly dependent on the timely diagnosis of tumor with its proper classification. To diagnose the brain tumors, there are several medical imaging technologies are used, for example, Magnetic Resonance Imaging (MRI), Computerized Tomography (CT) scan, Ultrasound, Simple Photon Emission Computed Tomography (SPECT), Positron Emission Tomography (PET), and X-ray. Among these, MRI is the most commonly used medical imaging technique as it offers better contrast images of brain tumor in compared to other medical imaging techniques. Recently, machine learning (ML) based approaches are gained much popularity to identify the brain tumor from the MR images as it gives quite accurate and precise detection results. Especially, transfer learning technique has demonstrated in several investigations, VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ where the knowledge learned from a task can be reused for another similar task to achieve improved performance in classification on target dataset [1], [2]. Conventionally, the amount of computational complexity is quite high to train a deep convolutional neural network (DCNN) model using a massive dataset. Therefore, such learning procedure can be simplified by reusing the model weights from previously trained models. The trained model's layers are then employed in a new model to be trained with new dataset of interest. As a result, the training time and generalization error is significantly reduced. However, a detail study of using different traditional deep transfer learning models followed by well-known classifiers is necessary for the selection of best performing model in target application. In this article, combination of several transfer learning based deep learning methods with different classifiers are investigated to detect the brain tumor from MR images and finally, compared their relative performance. An effective deep transfer learning system is identified to detect and classify brain tumor with greater accuracy even in the presence of lower dataset. In particular, seven transfer learning methods are used as feature extractors such as VGG-16, VGG-19, ResNet50, InceptionResNetV2, InceptionV3, Xception, and DenseNet201. Moreover, each of the CNN model is further followed by five traditional classifiers namely SVM, Random Forest, Decision Tree, Adaptive Boosting (AdaBoost), Gradient Boosting. Pre-trained deep CNNs are used for MR brain images to extract the necessary features and further categorized using classifiers with 10-fold cross-validation. Finally, a detail comparative results are computed in the presence of different performance matrices.
The rest of the paper is organized as follows. Section II covers the recent works for brain tumor detection from MR images using CNN models. Section III presents the investigation framework including the detail description of brain image dataset and data augmentation, image pre-processing, CNN models and classifiers used for this research. The evaluation matrices and the corresponding results are discussed in section IV. Finally, the conclusion of this work is presented in section V.

II. RELATED WORKS
In [3], a hybrid technique is introduced using wavelet transform, principal component analysis, and supervised learning algorithms, where the detection accuracy of brain tumor is reached to 98.6%. However, the proposed system requires to train in each time if the image database is changed.
Beside this, a novel combination of methods such as discrete wavelet packet transform (DWPT), Shannon entropy (SE), Tsallis entropy (TE) and generalized eigenvalue proximate support vector machine (GEPSVM) are also utilized to classify the brain images [4]. In [5], tenserflow is used to implement a 5-layer convolutional neural network for MRI-based brain tumor detection. However, a limited number of training data are used for machine learning. In [6], spatial gray level dependency (SGLD) matrix is used for MRI images to extract the necessary features of brain tumor and finally applied to an ANN model for classification. The proposed method shows the accuracy 99% and sensitivity 97.9%. However, the system increases the computational complexity due to the long processing time. In [7], three multi-resolution techniques such as wavelet transform, curvelet transform and shearlet transform are used to detect the brain abnormality. By using only fifteen shearlet features, the SVM classifier with the radial basis function (RBF) kernel approach achieved a maximum classification accuracy 97.38%. In [8], a CNN model named as BrainMRNet is proposed using the combination of residual blocks, attention module, and hypercolumn technique followed by dense layer and softmax to detect brain tumor. Here, the proposed study claimed to reach the accuracy of 96.05%. In [9], Support Vector Machine along with a Fully Automatic Heterogeneous Segmentation (FAHS-SVM) process is utilized to locate the tumor areas, where the model accuracy reached to 98.51%. A modified ResNet50 model is also constructed in [10], where the 5 layers are removed from the existing structure and new 10 layers are added at the end. Even though the modified model shows the classification accuracy of 97.01%, the system complexity is increased due to the presence of additional layers. In [11], Le-Net and U-Net models are combined to develop a new model LU-Net that provides less number of layers to reduce the system complexity. A detail comparative analysis is performed by considering the Le-Net, VGG-16 and proposed LU-Net, where the new model achieved the highest accuracy of 98.00% in compare to other models. However, there is an uncertainty of system performance in the presence of large dataset. In [12], authors are used superpixels and Principal Component Analysis (PCA) for the feature extraction, which is again followed by a filter to enhance the images. Moreover, TK-means clustering is added in the model for image segmentation and brain tumor detection. However, the study is carried out using low number of image dataset. By incorporating clinical presentations and traditional MRI analysis, a deep learning based paradigm is proposed in [13]. Here the backward propagation for the gradients is used to increase the depth of the network, which eventually improves the model accuracy. However, the model suffers with long computational time as well as increase of development complexity due to the presence of additional layers. In [14], a demonstration of ensemble features and ensemble classifiers was proposed. The DenseNet-169 model achieved an average accuracy of 92.37% using small dataset, whereas the ResNeXt-101 model achieved an average accuracy of 96.13% using large dataset. However, the model size is insufficient for a real-time medical diagnostic system based on knowledge distillation techniques. Moreover, a single classifier shows better results for some cases compared to ensemble configuration with average results. A brief summary of related works using ML based brain tumor detection are presented in Table 1.
In summary, the aforementioned ML approaches are mostly used the standard CNN models for brain tumor detection. On the other hand, the pre-trained model by transfer  learning technique results in less computational time, higher accuracy and removes the constraint of maintaining large dataset for training. Moreover, the classification performance of traditional classifiers is better than softmax or fully connected layers used in previous investigations. Overall, the major contributions of this study can be summarized as follows: -To provide in-depth analysis of seven pre-trained models such as VGG-16, VGG-19, ResNet50, InceptionRes-NetV2, InceptionV3, Xception, and DenseNet201. The transfer learning techniques are used to extract deep features from target dataset of MR brain images. -To provide in-depth analysis of five classifiers such as SVM, Random Forest, Decision Tree, Adaptive Boosting (AdaBoost), Gradient Boosting. Different classifiers are used to classify the brain MR images into benign and malignant. -To conduct an extensive analysis on seven pre-trained models followed by five classifiers considering all the combinations and finally, compare the effectiveness of all the CNN models and ML classifiers on the target dataset. -To propose the best-performing model that achieved the highest accuracy and optimal computational time among all the models. Moreover, the corresponding parameter settings are also explored. -To provide a comparison with the state-of-the-art models that justify the use of best performing model for classifying the brain tumor MR images to achieve the highest accuracy.

III. INVESTIGATION FRAMEWORK
The investigation framework used for this study is presented in Fig. 1. The process is started with MR brain image dataset, which is further used for data augmentation. The dataset splits in three ways namely train set, test set, and validation set. Later on, the MR brain images are further processed to reduce the noise and ready for feature extraction. In feature extraction part, several CNN models are tested such as VGG-16, VGG-19, ResNet50, Inception-ResNetV2, InceptionV3, Xception, and DenseNet201. The pre-processed images were fed into the transfer learning models with a batch size of 32. Finally, classification stages are prepared using different classifiers like SVM, Random Forest, Decision Tree, AdaBoost, and Gradient Boosting. Based on these feature extractors and classifiers, the relative performance of detecting the brain tumor is evaluated to select the best performing machine learning model using brain MR images.

A. BRAIN IMAGE DATASET AND DATA AUGMENTATION
In this investigation, a publicly accessible MRI dataset from Kaggle [https://www.kaggle.com/navoneel/brain-mri images-for-brain-tumor-detection] is used to analyze and evaluate the developed framework. The images are in two folders labeled as 'yes' and 'no' corresponding to the abnormal-and normal brain images as shown in Fig.2. Originally, it contains 152 abnormal brain images and 98 normal brain images, thus a total of 250 images of varying dimensions. The images are grayscale in JPG format. Later on, augmentation technique is applied to increase the size of the dataset.  Data augmentation is a process of adding slightly changed copies of current data or newly created synthetic data from existing data to expand the size of present dataset. By generating new and varied samples of dataset, data augmentation process can help to improve the performance of machine learning models. When a machine learning model's dataset is large and diverse, the model performs better to get more accurate results. Several methods can be used for augmentation, however, the present article used the process like width shifting, height shifting, shear intensity, brightness, horizontal flip, and vertical flip for dataset size improvement as shown in Fig.3. After applying the augmentation process, the dataset is converted to 1240 abnormal-and 1078 normal-brain images. Using this dataset, the 5 images of each category are used as VOLUME 10, 2022 test set and the remaining data is divided into; 80% as train set and 20% as validation set. Based on this distribution, the train set has 987 abnormal-and 858 normal brain images; the test set has 5 abnormal-and 5 normal brain images; the validation set has 248 abnormal-and 215 normal brain images. Fig. 4 shows the dataset distribution using a bar graph.

B. IMAGE PRE-PROCESSING
In machine learning, the used dataset is typically not organized as it comes from different sources. Therefore, the dataset needs to be standardized and processed before being fed to the ML model. Moreover, MR images may contain defects such as inhomogeneity distortions and motion heterogeneity due to the person's body motion during image acquisition or instability of the scanning hardware. These distortions eventually add unwanted intensity rates in the acquired images to develop false positives. Image pre-processing is commonly used to reduce these unwanted noises by collecting the useful information from the images and hence, such process improves the classification performance.
In this research, the image pre-processing stage comprises with number of steps as shown in Fig. 5. Firstly, the original grayscale MR images in varying sizes are loaded for preprocessing. In step 2, the active contour-based segmentation technique in used to select the region of interest area by defining the biggest contour. A contour is a set of points that are interpolated together using different interpolation methods like linear, splines, or polynomial to describe the curve in an image [15]. In step 3, the extreme points are selected by thresholding technique. Thresholding is a basic noncontextual segmentation technique that converts a greyscale or color image into a binary image to create a binary area map with one threshold [16]. The binary map has two potentially disjoint domains, one containing pixels with input data values less than a threshold and the other containing pixels values equal or greater than the threshold. In step 4 and 5, the images are cropped to collect the useful portion and resized 224 × 224 pixels with RGB format to fit for the input layer dimension of the feature extractors. Moreover, the small patches of the unnecessary noises are also removed by applying the erosion and dilations operations.

C. FEATURE EXTRACTION USING DCNN
Deep learning technique has proved an essential tool in various applications due to its feature learning ability and thus, highlighted its potential in many research articles including a review work published in nature [17]. In particular, convolutional neural network, a popular part of deep learning family, has attracted by many researchers just after the published results at ILSVRC-2012 (ImageNet Large Scale Visual Recognition Challenge) image classification competition using AlexNet model [18]. Even though such deep CNN shows good performance in the presence of large labeled dataset like ImageNet, the model has limited in application for medical imaging like MR images classification due to the availability of small sample size. Especially for small dataset applications, a well investigated and good alternative approach to train the deep CNN using a pre-trained model with transfer learning. The pre-trained models are proven to be easier and faster to build with improved accuracy for the target application [2]. In recent years, various CNN architectures using transfer learning have outperformed classical machine learning models. They have also shown considerable success to improve the image classification performance. In image classification, extracting the key features of the images is an important part of the process and thus, the models are properly trained to distinguish multiple levels of visual representation thanks to the concept of deep learning. Conventionally, there are two ways to use the pre-trained models, firstly, the off-the-shelf pre-trained models are used for image dataset to extract the features and train a separate classifier to classify those features. Secondly, the pre-trained models are fine tuned in selected or all the layers to get the desire results [19]. Here, the first approach is adopted with the combination of number of pre-trained models and traditional classifiers. In this article, seven pre-trained CNN models are utilized for the feature extraction using MR brain image dataset. The pre-trained CNN models are trained on large ImageNet dataset [20]. The pre-trained CNN models used in this study are VGG-16 [21], InceptionResNetV2 [22], ResNet50 [23], VGG-19 [21], Xception [24], InceptionV3 [25], and DenseNet201 [26]. A summary of these models are presented in Table 2 and more details are available in the mentioned references. The performance results of each model are presented in the later section to show the relative efficiency for the detection of brain tumor from MR images.

D. CLASSIFIERS
Classifiers are used to divide a batch of data into categories. It is a method to map the input data in a certain category using an algorithm. In this study, the extracted features from deep CNN models are classified using five classifiers namely Support Vector Machine [27], [28], Random Forest [29], [30], Decision Tree [31], AdaBoost [32], and 59104 VOLUME 10, 2022 Gradient Boosting [33]. The brief details of these classifiers are presented in Table 3. The performance results of each CNN model followed by classifiers will be discussed in later section.

IV. RESULTS ANALYSIS AND DISCUSSION
This section mainly highlights the performance analysis of several transfer learning based CNN models for the features extraction from brain MR images. The extracted features are further classified using number of classifiers. All the combination of feature extractors and classifiers are evaluated in terms of computational time and accuracy with 10-fold cross validation as shown in Table 4. Moreover, the presented deep learning frameworks are also tested using different evaluation matrices like accuracy, precision, recall, F1-score, Cohen's kappa, AUC (Area Under ROC (Receiver Operating Characteristic) Curve), Jaccard, and Specificity scores as shown in Table 5. Based on the evaluation results, the best performing model is identified for effective classification of brain tumor into Benign and Malignant using brain MR images. The main parameter settings of best pre-trained model and different classifiers are also highlighted in Table 6 and Table 7 respectively. Moreover, the best performing model is compared with the state-of-the-art methods as shown in Table 8.

A. EVALUATION MATRICES
The efficiency of the proposed deep transfer learning framework is measured using four key outcomes: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). The following performance matrices are used to evaluate the proposed ML framework: The accuracy can be considered as a capacity to successful detection of brain tumor from the target image dataset. The fraction of true positive and true negative in all the cases under investigation are used to estimate the accuracy as follows [34]: Accuracy = TP + FN TP + TN + FP + FN Precision is a true positive measure, which is calculated as [34]: Precision = TP TP + FP Recall (Sensitivity) is a metric that evaluates the system's capacity to accurate classification of brain tumors, and it is determined by the percent of true positives as [34]: The F1-score takes the harmonic mean of a classifier's precision and recall to create a single statistic. The F1-score is given by [34]: Cohen's Kappa is a statistical measure that determines how often two raters agree on the same quantity and is measured as [34]:   The ROC curve is a binary classification task evaluation metric. It is a probability curve that compares true positive rate (TPR) to false positive rate (FPR) at various threshold levels, effectively separating the signal from the noise. The AUC is a summary of the ROC curve that measures a classifier's ability to distinguish between classes and is given by [35]: Here, FPR' (T) = First derivative of FPR with respect to T. T = The sample data Jaccard similarity coefficient is measured to address the similarities between sample sets. The mathematical formula is [36]: The fraction of real negatives that were projected as negatives, also known as true negatives, is defined as specificity. In other VOLUME 10, 2022 words, specificity is addressed as True Negative Rate (TNR). The mathematical formula is [37]: This section highlights the performance of seven pre-trained CNN models i.e., VGG16, InceptionResNetV2, ResNest50, VGG19, Xception, InceptionV3 and DenseNet201 and 59108 VOLUME 10, 2022 S. Ahmad, P. K. Choudhury: On Performance of Deep Transfer Learning Networks for Brain Tumor Detection further followed by five classifiers such as Support Vector Machine, Random Forest, Decision Tree, AdaBoost, and Gradient Boosting. The relative performance of each feature extractor and classifier pair is tested to identify the best performing model. In this investigation, the transfer learning models are used as standalone feature extractors and later on, the traditional classifiers are used to classify those features to detect the tumor from brain images. As a standalone feature extractor, the pre-trained network is used to process the images, extract features and the fully connected layers (classification layers) are kept inactive.
Conventionally, the CNN networks are used up to the last pooling layer and 'include_top' argument is defined as 'False' to unload the fully connected layers (classification layers). After the last pooling layer, here an additional flatten layer is added and the networks are incorporated with different traditional classifiers. The flatten layer works as a dimensionality reduction function as it reduces the number of parameters. It also converts the feature map that pooled from the last pooling layer to a single dimensional array and forwards the output to the classifiers in the next step. All the pair of extractor-classifier are analyzed using the performance parameters of accuracy and time in 10-fold cross validation. Cross-validation is used to estimate the skill of a model based on unseen data. In 10-folds cross-validation method, the dataset is shuffled randomly and split into 10 groups of VOLUME 10, 2022 equal sizes. At first, it takes data from one group for the validation test and the data from other nine groups use for training. The system evaluates the validation test based on the training set and stores the result. The process continues 10 times (a total of 10 observations) and each time, it takes data from a different group for validation test and the data from other nine groups as training set. The final result is the average of all the 10 processes.
According to the Table 4, the number of features that extracted by VGG16, InceptionResNetV2, ResNest50,  Table 4. Table 4 clearly illustrates that SVM classifier shows the best accuracy results compared to other classifiers while working with the ML models of VGG-16, ResNet50, VGG-19, Xception, InceptionV3 and DenseNet201. For Inception-ResNet-V2, Gradient Boosting classifier shows the improved performance in terms of accuracy to classify the MR images. In particular, the accuracy results of  Fig. 6 shows the summary of accuracy results in graphically for all the combinations of ML models followed by classifiers.
Beside accuracy, the computational time of each of the feature extractor-classifier pair is also estimated as shown in Table 4, where the lowest values are marked in Bold. The presented results clearly indicate that the Random Forest classifier performs the classification operation faster than the other classifiers maintaining the lowest value of 7.691 seconds while working with the VGG-19 model. Even though the Random Forest classifier performs better in classification time, however, it shows the accuracy of around 90% that indicates the performance degradation to accurately classify the brain MR images into benign and malignant. Overall, the performance of different pair of feature extractors and classifiers shows a tradeoff between computational time and accuracy.
The presented deep learning frameworks are also tested using different evaluation matrices as formulated in section IV-A and the corresponding results are appeared in Table 5. All the combination of deep learning based feature extractors with different classifiers are analyzed using precision, recall, F1-score, Cohen's kappa, AUC, Jaccard and Specificity. The performance matrices with highest values are marked in Bold as appeared in Table 5 Fig. 7. In summary, VGG-19-SVM model is considered to be the best performing deep learning system with respect to all the measured values of performance matrices as mentioned in Table 4 to Table 5. Based on this evaluation, the best performing CNN model is shown in Fig. 8. Moreover, Table 6 shows the hyper-parameter settings of best performing model VGG-19. The main parameter settings of all the classifiers are also shown in Table 7.
Finally, Table 8 shows a comparison of best performing model as presented here with state-of-the-art architectures proposed in [8]- [13]. Toğaçar et al. [8] used a combination of residual blocks, attention module, and hypercolumn technique with the claimed accuracy of 96.05%. Jia et al. [9] utilized the FAHS-SVM technique, where the mentioned accuracy of 98.51%. Besides this, Çinar et al. [10] achieved 97.01% accuracy with the improved ResNet50 model. Moreover, Rai et al. [11] combined Le-Net and U-Net to form LU-Net model that achieved an accuracy of 98.00%. Islam et al. [12] utilized superpixels and Principal Component Analysis (PCA) followed by TK-means clustering that achieved an accuracy of 95.00%. The study of Deep-CNN by Das et al. [13] achieved an accuracy of 98.00%. By comparing with all the aforementioned results, the presented model VGG-19-SVM shows the highest  classification accuracy of 99.39% and thus, it is expected to show good performance for detecting the brain tumor from MR images.

V. CONCLUSION
In this article, several transfer learning based deep learning methods are analyzed and corresponding results are compared to select a best performing CNN model for the detection of brain tumor from MR images. There are seven classical feature extractors are used to develop the deep learning framework, where the extracted features from each of the pre-trained model are classified using five traditional classifiers. The performance matrices such as Accuracy, computational time, Precision, Recall, F1-score, Cohen's kappa, AUC, Jaccard, and Specificity are computed for all the combination of feature extractors and classifiers with 10-folds cross validation. The best performing model i.e., VGG-19-SVM shows the highest accuracy of 99.39% among all the presented models in this investigation. Moreover, VGG-19-SVM model also performs better in compared with recent works of brain tumor detection using ML model. However, the presented model was not tested for different brain MRI modalities along with other imaging techniques. Also the proposed technique can also be extended for the classification of tumor types like Glioma, Meningioma, Pituitary using the MR image dataset. Above all, the use of larger dataset and better GPU based processing can also improve the accuracy results as well as computational speed of presented models. We aim to highlight those issues as a part of the future works.