Brain Tumor and Glioma Grade Classification Using Gaussian Convolutional Neural Network

Understanding brain diseases such as categorizing Brain-Tumor (BT) is critical to assess the tumors and facilitate the patient with proper cure as per their categorizations. Numerous imaging schemes exist for BT detection, such as Magnetic Resonance Imaging (MRI), generally utilized because of the better quality of images and the reality of depending on non-ionizing radiation. This paper proposes an approach to detect distinctive BT types using Gaussian Convolutional Neural Network (GCNN) on two datasets. One of the datasets is used to classify tumors into pituitary, glioma, and meningioma. The other one separates the three grades of glioma, i.e., Grade-two, Grade-three, and Grade-four. These datasets have ’233’ and ’73’ victims with a total of ’3064’ and ’516’ images on T1-weighted complexity improved pictures for the first and second datasets, separately. The proposed approach achieves an accuracy of 99.8% and 97.14% for the two datasets. The experimental results highlight the efficiency of the proposed approach for BT multi-class categorization.


I. INTRODUCTION
An uncontrolled and unnatural brain cell's development is known as BT [1]. The human brain is volume-restricted and a rigid body; therefore, a human capacity may be influenced by an unforeseen development; in addition, this might proliferate into other body organs and result in life-threatening conditions [2]- [4].
As per the worldwide (tumor growth) report, provided by the World Health Organization (WHO), BT lies under 2% of human cancer; extreme dismalness, complexities, and comorbidities also exist. Tumor-oriented research in the UK estimated approximately around 52, 250, succumbing to intracranial, Central-Nervous-System (CNS), and brain tumors in the United Kingdom. Existing studies report that around 30% of BTs are benign tumors. BTs classification can be classified based on the severity and type, such as The associate editor coordinating the review of this manuscript and approving it for publication was Wentao Fan . malignant and benign tumors. Such categorization is based on the tumor's source. Mainly, tumors can be defined as the tumor whose initial source is the brain, whereas the secondary tumor is the tumor whose initial source is some other part of the body, and later proliferated towards the brain, and the vast majority of the secondary tumors are dangerous [5]. Radiological images are one of the most widely recognized non-intrusive sources. Due to avoiding any ionizing radiation, MRI is most popular these days. Additionally, by using enhanced-contrast features or utilizing different imaging features, MRI can acquire images, and it possesses super-resolution power for soft tissues [6]. Various imaging procedures can be utilized to recognize and characterize BT [7].
The most common BTs are Glioma-tumors that start in the brain's Glial Cells (GCL). Gliomas incorporate 30% of CNS, BTs, and 80% of malignant BTs. WHO classified Glioma-tumors into four types, i.e., type-one to type-four. Grade-one BTs are benevolent and possess very identical VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ surfaces to the GCLs. Grade-two BTs are marginally texturewise distinct. Grade-three BT is dangerous (possess strange tissue appearance), whereas Grade-four BTs are the super extreme phase of tissue irregularities and gliomas, which can be observed through the eye [4]. Meningioma-Tumors (MTs) develop tranquility (among all BTs). It develops (inside the brain) on the spinal rope, and the cerebrum covers the layer. The vast majority of MTs are less severe/benign. Nonetheless, pituitary-organs oriented tumor is known as Pituitary-Tumors (PTs). In the human body, PTs direct and control hormones. It may proliferate towards bones and can be dangerous/malignant. At the same time, it may be less dangerous/benign. Difficulties of PTs comprise of vision loss or inadequacy of perpetual hormones [4], [8].
Due to the above-referenced knowledge, early BT's discovery and detection transform into an essential errand and likewise assist (to protect the patient's life) in choosing the most accessible curing approach. Besides, the categorization stage might be a confounding and monotonous task (for radiologists and doctors) in some sensitive cases. These cases need specialists to deal with tumor localization, contrast the tissues of tumor and neighboring locales, filter the picture if essential, make it all more straightforward for human vision, lastly, regardless of whether this is BT other than its grade and sort. We propose a more efficient deep learning based approach using a Gaussian filter for pre-processing (for noise filtering and smoothing the input images). It is timeconsuming, and we require Computer-Aided Design (CAD) based approach (without human intercession) for the earliest identification of BTs.
The significant contributions of this research are listed below: • We propose customized Gaussian Convolutional Neural Network (GCNN) for brain tumor type (i.e., pituitary, glioma, and meningioma) and glioma's grade (i.e., grade-1, grade-two, grade-three, and grade-four) classification.
• We apply and analyze various filters for pre-processing (for noise filtering and smoothing the input images) of BT images to improve the classification.
• We present a comparative analysis with state-of-the-art and standard machine learning algorithms.
• Results show that CNNA with Gaussian filter outperforms other common image pre-processing filters and provides better BT classification. The rest of the paper is organized as follows. Related work is presented in Section II. The proposed work is presented in Section III. Experimental analysis is presented in Section IV. Result and discussion is given in Section V, and conclusion is provided in Section VI.

II. RELATED WORK
Authors in [9] presented a brain-inspired hybrid system for the symbiotic intelligence of humanity. They pretend theoretical foundations, intelligence, knowledge-based system, and cognitive analysis towards developing next-generation cognitive systems. Using the patterns and without any outside instruction, particular tasks can be performed (with statistical inferences and algorithms) in Machine Learning (ML) can be done by cognitive computing [10]. AI algorithms have been generally developed in the clinical imaging field as a part of machine learning [4], [11]- [15]. Being a constituent of AI, ML schemes are now immensely utilized in bioinformatics. This has two primary classes, unsupervised and supervised. In the supervised learning strategies, the input to output mapping is done using different mapping algorithms to predict unforeseen samples. The objective is to learn inalienable correlations inside the data for training purposes, utilizing ML schemes such as K-Nearest Neighbors Algorithm (KNNA), Support Vector Machine Algorithm (SVMA), and Artificial Neural Network Algorithm (ANNA) [16]- [18].
Conversely, only input parameters are used in unsupervisedlearning algorithms, such as in Self Organization-Map Algorithm (SOMA) and fuzzy-c-mean algorithm. The feature extraction of training images is crucial, i.e., statistical parameters (for learning purposes), texture, and grayscale, and this might demand tumor segmentation before extracting the features. We can define them as handcrafted features, where a specialist is demanded with the proficiency to categorize the required features. Besides, in the case of big data size, it is inclined to errors and time-consuming. DLA develops AI-oriented models and frameworks that depend on information portrayals and progressive component learning. For feature extraction, DLA uses various layers of processing with nonlinearity. As we dive deep into the network, the yield of each successive layer is the contribution of the following one. Additionally, it assists in data abstraction. CNNA is a class of DLA and ordinarily utilized in visual imaging analysis also, intended to require little pre-processing [19], [20].
CNNA is motivated by natural procedures in the brain [21] and used to deal with distinct forms of data. The earlier utilization of the DLA with a comparison of its present application (presented a century ago) when Lecun presented a DLA ''lenet'' (in 1998), and it was utilized in the applications, where it was required to perform document's recognition. Numerous years later, it became considerably more mainstream directly (in the wake of utilizing DLA to perform the image classification by using a framework known as 'AlexNet' (AN)) [22]. During this session, AN showed extraordinary results with other utilized algorithms.
BT characterization has been performed utilizing numerous AI procedures, and imaging modalities [6]. Featurelearning and provision of robust accuracy-rate are the primary favorable points of CNNA (instead of conventional vanilla neural systems and machine learning which might be accomplished by expanding the training data), and in this way, it prompts a more robust and more precise model. For feature extraction, convolutional filters are utilized in the CNNA protocol. Heavily complex features (structural and spatial data) are extracted as we dive deeper into the network. With the input patterns, feature-extraction occurs through convolution of small-filters, followed by the most distinctive feature selection, and prepares the network for classification purposes.
For multi-categorization, the accuracy rate of 85% and 88% for binary-categorization is acquired. Authors in [23] presented a technique for the classification of 80 BT abnormal and normal CT images utilizing 'Discrete Wavelet Transform' approach (DWTA) for feature-extraction, 'Principal Component Analysis' Approach (PCAA) for featurereduction, and afterward for image-oriented classification is performed using ANN and KNNA with a precision of 97% and 98% separately. For feature extraction, three schemes are utilized, i.e., Bag-of-Words and intensity-histogram.
Comprising two joined resolutions, another model for the BT image classification (i.e., dependent on CNNA and Genetic-Algorithms (GA-CNNA)) is presented by Anaraki et al. [24]. Posteriorly, a capsule network (CapsNetK) is presented by Afshar et al. [25], that coordinates both the brain image's MRI and the coarse tumor, limited to the BT classification. Through this study, 90.89% of precision was acquired. A precision rate of 90.9% has been achieved (for the first analysis) to classify three glioma grades. The subsequent contextual investigation acquired a 94.2% precision rate for pituitary, meningioma, and glioma tumor classification.

III. PROPOSED WORK
A customized CNNA is proposed to categorize various grades and types of BT. The system's design is enhanced utilizing diverse configurations to acquire the most suitable framework. The proposed work's diagram is depicted in Fig. 1.
From the raw files of the dataset, the loading and extraction of labels and images are done. After splitting the training, validation, and testing data, the data is preprocessed and augmented. By setting the optimization algorithm, regularization approach, and hyper-parameters structure, the structure of the proposed work is presented. At last, the execution and training framework of the network is provided. Algorithm 1 provides the processing of the proposed work [26].
This paragraph elucidates the working of the Algorithm 1. First, the images are acquired by the system (as input), and the respective type of brain tumor is classified as output. At the initial stage, while performing the preprocessing, the color-space (of images) is transformed to convert them to grayscale images; the input images are cropped to smoothen the images and remove noise, the Gaussian filter is convolved over the input images. Next, after categorizing the labeled and unlabeled dataset, the model is tunned (through the training phase) in a hit and trial approach (where the hyper-parameters are selected). Backpropagation is performed if the error rate exceeds the threshold value and readjusted weights. Lastly, the true positives, true negatives, false positives, and false negatives are acquired from the results.

A. PRE-PROCESSING
Pre-processing is carried out before passing the CT scans into the algorithm. To boost the system for simpler computations and to exhibit superior performance in less time, the first step is to reduce the dimensionality of the actual images from 512 × 512 × 1 to 128 × 128 × 1-pixels. At that point, data is rearranged before parting them to prepare the unsorted data. After splitting the data, three parts are generated: training, validation, and test data (with every instance having a labeled target value). 35% of data is selected for validation and testing purposes and 65% for training purposes.
After that, to increment the model's robustness and to abstain from overfitting, data augmentation is done so that the framework can recognize it as new data. The images are augmented with a salt-noise/grayscale distortion (the geometric-augmentation). The actual three thousand sixty-one images are augmented (by the multiple of five) through the augmentation approach. Finally, for type classification, the last dataset of fifteen thousand three hundred seventeen images is acquired, and for grade classification, five hundred thirteen image-based datasets are utilized.  a comparative analysis among multiple imaging filters). After that, sixteen layers are incorporated, from pre-processing augmented images to the input layer, later downsampling (through Pooling, Normalization, Rectified-Linear-Unit (ReLU), and convolution), feature-selection, and convolution operation are performed. By using the dropout layer, overfitting is avoided. Later for the output prediction fully connected layer and softmax layer are utilized, and for the classification of predicted-class, a classification layer is added. The whole layered structure of CNNA is given in Fig. 2.

B. ARCHITECTURE OF GCNN
Four convolutional layers are utilized in the suggested work. Each layer's depiction is as follows; the input layer is utilized for data normalization and input (i.e., images of BT) size confirmation. By the movement of filters on the input BT images and by input's and weight's dot-product computation (where each filter of MXN size and there are K filters), a 2D convolution is applied. By following the horizontal and vertical steps, the sliding of filters/kernels is done on the input images, known as a stride. Before sliding the kernel, the actual image's padding is done. As feature-identifiers, these filters are utilized. The low-level highlights (blobs, lines, and edges) are classified by initial layers filters, whereas the advanced layers are utilized for complex feature detection.
The below-provided condition portrays as a function of y i.e. the ReLU activation-function. Here if y is +ve then results are equal to the inputs and for other cases it would be 0 (see Eq. (1)).
The input's normalization is done by adjusting and scaling associated activation operations. At that point, the input layer is standardized by a cross-channel normalization layer. With a specific sized window (which is discretionarily picked as five), (channel-wise) a reaction standardization/normalization layer is used. The normalization layer is utilized in network training and backpropagation. To acquire spatial invariance, small rectangles (of 2 × 2 size) are generated from a single image, and this kind of down-sampling is done through the max-pooling layer. For the 2 × 2 matrix, over the image, they are moved, and from the four values, the only max value is considered. Reduction in the network's computation is made by reducing the number of attributes, which is carried out with the help of the pooling layer. The pooling layer is utilized to decrease the parameter's quantity and subsequently the network calculations. Fig. 4 depicts an example of max-pooling. By using the dropout layer, the overfitting reduction is made. For the first and second dropout-layer, the highly appropriate dropout values are ten% and twenty%, respectively. Lastly, Classification (CLF), Softmax (SFT), and FC Layers are used. These are ordered as FC, SFT, and CLF, respectively. A few nodes/enactments are dropped out arbitrarily in this layer, which fundamentally helps in the training-stage acceleration.
The previous one is utilized to associate one layer's neurons to each other's neurons (preceding and following). At that point, a standardized exponential function is utilized, where the SFT layer follows the FC layer. To squash all the predicted categorizations somewhere in the range   of 0-1, the SFT layer is utilized, and the absolute entirety of these qualities is equivalent to one that is a hundred%. The yield of this layer can be determined as follows:(see Eq. (2)) z(a) k = −(e a k )/( The kth-class likelihood is computed using function z(a) over l distinct output classes (whose complete summation is equivalent to one). A cross-entropy-oriented classification layer is added at last for each input BT image prediction and estimation of prediction error rate. Eq. (3, shows the errorrate estimation. Here from the SFT-layer, r (y) is the vector for classified output, and q is the vector for target-label is the vector for target-labels (see Eq. (3)). The next section elaborates on the optimized algorithm and the regularization approach.
Regularization is meant to fit the function to avoid overfitting while training the model. Numerous methods are utilized during training and pre-processing stages to abstain from overfitting. One of these approaches is augmenting the data, where the actual pictures are augmented through color & geometric distortion (to prevent overfitting). At that point, diverse frameworks of the network are being tried to deflect the complex nature of the network. Additionally, to stochastically evacuate the weights of hidden units, dropout layers are being utilized [27]. The below equation Eq. (4) shows the decay of weight and penalty addition to the cost-operation using L2 regularization.
The hyper-parameter is represented by λ (regularization attribute), and the respective weight(s) is represented by x for j = 1, . . . , l. To avoid overfitting and ensure the stability of the model, the validation and training process is monitored time by time (before the completion of entire epochs), and all this is done through an early-stopping approach. Through convergence (by making little moves to the direction of negative-gradient) and approaching the global minimum, optimization is done (where loss-rate is minimized and the network parameters are updated) [28]. For the proposed work, momentum-oriented stochastic-gradient-descent is founded as the optimal optimizer. FIGURE 5 shows the flowchart of the system's working.

IV. EXPERIMENTS AND OUTCOMES
The two diverse datasets used in this work are obtained from General Hospital and Nanfang Hospital, Medical University of Tianjin, China from 2005-2010. 12 This dataset incorporates ''T1-weighted complexity improved pictures''. Three kinds of BTs (i.e., pituitary, glioma, and meningioma) are procured from 232 patients [29]. BTs can be various fits from the perspective of size, location, and shape as indicated by the respective grade and type as shown in Fig. 6. The dataset incorporates three distinct perspectives: sagittal, coronal, and axial, as appeared in Fig.6. The second dataset is retrieved from a public repository, ''The Cancer Imaging Archive (TCIA).'' 3 The repository contains MRI multi-sequence image scans of distinct ages, 1 https://www.med.upenn.edu/sbia/brats2018/data.html 2 https://figshare.com/articles/dataset/brain_tumor_dataset/1512427 3 https://www.cancerimagingarchive.net/    races, maladies, and evaluations of 129 patients with Molecular Brain Neoplasia Data (REMBRANDT) [30]. We select BT images on T1-weighted complexity, incorporating various glioma grades (Grade-two, Grade-three, and Grade-four) as shown in Fig. 7. The explanation of the two datasets is provided in Table 1 and Table 2.
We apply image shifting before applying CNNA as a prehandling stage. This is critical to identify the BT image's edges. Therefore, it is required to smooth, sharp, and remove the noise of the BT's input pictures. The comparative view of various basic imaging filters is depicted to perform their performance analysis (see Fig. 8). After this experiment, the best consequences of Gaussian-Filter (GF) are acquired, and for preprocessing of BT's images, GF is applied (that is why it is known as Gaussian Convolutional Neural Network-GCNN). Next, the filter size is elucidated by the horizontal axis, and the BRISQUE is represented by the vertical axis (after applying the particular GF, it addresses the image's quality). FIGURE 9, case-I shows both the exact progress and error rate during the proposed work's approval stage. Almost 100% precision is accomplished as depicted in Fig. 9(a). It shows the results after total iterations of 5000 th . After the 8515 th iteration, the accuracy level achieved is almost 100%. Lastly, the best overall precision obtained during the test stage is 96.13%. The loss-graph in small-batches is shown in Fig.9(b). The bend begins to drop pointedly, yet a minor uncertainty shows up because of utilizing a diminutive group size of 32 pictures. These variances tend to vanish after the total iterations of 6400, and the loss curve is nearly 0.
For case-II, both the validation and loss are shown in Fig. 10. It can be seen that the training accuracy of almost 100% is accomplished at around 1000 iterations. Consequently, the best accuracy of 98.7% is acquired during the test stage as shown in Fig. 10(a). Fig. 10(b) presents the loss-graph (for small-batches). The loss-graph nearly hits zero, and after 100 iterations, this incline is in general stopped.
The performance matrix is presented in Table 3. We use Accuracy, Specificity, Sensitivity, and Precision. Here, the quantity of positive anticipated class is known as True-Positive-Case (TPC), which are positive cases. The quantity of negative anticipated classes is regarded as True-Negative-Case (TNC), which are also really negative cases. The quantity of negative anticipated class while that is positive-case, also called as error type-two-error.
The proposed approach achieved 97.54% accuracy for meningioma, glioma 95.81%, and pituitary-categorization resulting in a 96.89% accuracy rate. A 100% accuracy is achieved in the categorization of glioma-Grade-two, 95% for glioma-Grade-three, and 100% for glioma-Grade-four.

A. HYPER-PARAMETERS AND EMPIRICAL ARCHITECTURES
We tune parameters of the distinct architectures (engaged with the procedure of selection). The distinctive tested parameters are given in Table 4 to get the presented final structure, representing the best performance level.

B. PLATFORM AND TIME COMPLEXITY
The proposed structure of GCNN is prepared using Python, MATLAB 2019b, 32GB RAM, Intel-i5-7700HQ CPU (2.5 GHz). Training for '10417' pictures is '299' minutes in experiment-I and experiment-II. For '350' pictures, it is recorded as 2.5 minutes. Thus, the normal execution time for the test is 8.4 and 9.7 milliseconds per picture for the first and second datasets, separately.

V. DISCUSSION
By applying the GCNN framework to the MRIM, this manuscript comprises a methodology for BTs characterization and glioma tumor grades classification. Before acquiring the final model, customization of various parameters of the GCNN model is done. Without underfitting and overfitting, GCNN training is a very critical one, as this may require months or weeks (for a dataset) to acquire the desired consequences. Table 5, we list the results from past literary works (utilized for similar BT types with different layers, hyperparameters, and architecture). Comparatively, the proposed structure gives the best prediction outcomes contrasted with other related literature studies, which show the dependability of the proposed framework. In addition, the proposed GCNN is a division-free technique (when the BT's images are loaded) to acquire the related classification results.
Despite utilizing pathological pictures to prepare the system, not many favorable outcomes have been acquired by combining two classifiers. Conversely, feature-engineering is utilized to remove highlights and afterward decrease their measurements to utilize them in another phase for detection and categorization. In different researches, the authors have utilized a Genetic Algorithm (GA) to demonstrate the    system's engineering. However, GA did not present the ideal forecast results. In [33], the authors have utilized just two convolutional layers with 64-kernels for each. Additionally, they have utilized four dropout layers which are moderately high for the introduced network.
The authors in [25] have utilized coarse tumor limits as an extra contribution to help the system in providing better outcomes. However, the upcoming stages need more procedures to confine the tumor before preparing a CNNA. Even though we have accomplished a reasonable classification rate, the proposed framework in this experimentation needs to be tried for more enormous scope datasets that incorporate various other parameters to build its portability convenience and expand it in other clinical applications later on. Also, the framework's structure cannot be reused to detect the modest number of pictures as it is one of the limitations of DLA, yet rather than that, the framework can be fine-tuned in the wake of preparing on an extensive dataset (after having a small dataset).

VI. CONCLUSION
This paper presented a CAD approach for detecting and categorizing BT's radiological images into three kinds (pituitary-tumor, glioma-tumor, and meningioma-tumor). We also classified glioma-tumor into various categories (Grade-two, Grade-three, and Grade-four) utilizing the GCNN approach(i.e., our proposed work). In this paper, first preprocessing is done using a Gaussian imaging filter, and later sixteen layers based network is generated. These layers are ordered like input layer convolutional layers (along with activation functions). CLF Layer (for output class categorization) follows the SFT and FC layers, following the dropout layer (for overfitting prevention). Data augmentation proved favorable to depict effective outcomes, even though the dataset is generally not huge (because of the assortment of imaging views). The presented work has accomplished (utilizing two datasets) the most noteworthy accuracy rate of 97.14% and 99.8% through this research. He has authored or coauthored several peer-reviewed articles in professional journals and the proceedings of conferences. His research interests include the areas of machine learning algorithms, wireless sensor networks, mobile computing, self-organized networks, big data analytics, and the Internet of Things.
AYSHA SHABBIR is currently pursuing the M.S. degree with the Department of Computer Science, Kinnaird College for Woman, Lahore, Pakistan. Her research interests include machine learning, wireless sensor networks, mobile computing, and security issues in mobile cloud computing. VOLUME 10, 2022