Loading web-font TeX/Main/Regular
RegionInpaint, Cutoff and RegionMix: Introducing Novel Augmentation Techniques for Enhancing the Generalization of Brain Tumor Identification | IEEE Journals & Magazine | IEEE Xplore

RegionInpaint, Cutoff and RegionMix: Introducing Novel Augmentation Techniques for Enhancing the Generalization of Brain Tumor Identification


0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume
Decrease Volume
Seek Forward
Seek Backward
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
00:00
00:00
00:00
 
Video representation for "RegionInpaint, Cutoff and RegionMix: Introducing Novel Augmentation Techniques for Enhancing the Generalization of Brain Tumor Identification".

Abstract:

Brain tumors are considered one of the most crucial and threatening diseases in the world as they affect the central nervous system and the main functionalities of the br...Show More

Abstract:

Brain tumors are considered one of the most crucial and threatening diseases in the world as they affect the central nervous system and the main functionalities of the brain. Early diagnosis and identification of brain tumors can significantly enhance the likelihood of patient survival. Generally, deep neural networks require large samples of annotated data to achieve promising results. Most studies in the medical domain suffer from limited data which negatively impacts the model performance. Common ways to handle such problems are to generate new samples using basic augmentation techniques, generative adversarial networks, etc. In this study, we propose several novel augmentation techniques, named RegionInpaint augmentation, Cutoff augmentation, and RegionMix augmentation to improve the performance of brain tumor identification and facilitate the training of deep learning models with limited samples. In addition, traditional augmentation techniques are used to extend the training samples. A pre-trained VGG19 model is experimented along with the proposed augmentation techniques and achieved an accuracy of 100% on the unseen validation set of the SPMRI small dataset using RegionInpaint and Cutoff augmentation techniques together. On the other hand, the best testing accuracy achieved is 96.88% on the Br35H dataset which is obtained when using all the augmentation techniques together (i.e., RegionInpaint, Cutoff, RegionMix, and Basic augmentation techniques). Compared to the state-of-the-art related studies, it has been observed that our results are superior which demonstrates the efficiency of our proposed augmentation techniques and the overall proposed methodology. The source code is available at https://github.com/omarsherif200/RegionInpaint-Cutoff-and-RegionMix-augmentation-techniques.
0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume
Decrease Volume
Seek Forward
Seek Backward
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
00:00
00:00
00:00
 
Video representation for "RegionInpaint, Cutoff and RegionMix: Introducing Novel Augmentation Techniques for Enhancing the Generalization of Brain Tumor Identification".
Published in: IEEE Access ( Volume: 11)
Page(s): 83232 - 83250
Date of Publication: 04 August 2023
Electronic ISSN: 2169-3536

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

In recent years, deep learning has shown a significant impact on medical imaging analysis in different fields [1], [2], [3], [4]. It has also been shown to have a significant impact in the field of brain tumor analysis. The brain is considered the most complex and important organ in the human body as it is responsible for main functions such as thinking, sensing, and memorization. A damage to brain cells affects the entire central system leading to a disability in most body organs. A brain tumor is defined as a mass or group of abnormal cells in the brain and is considered one of the most dangerous diseases that damage brain cells [5], [6], [7]. There exist different types of brain tumors. Some tumors are considered benign (non-cancerous) whereas others are considered malignant (cancerous). The difference between them is that benign tumors spread slower compared to the malignant tumors and are less likely to return back after treatment [8], [9]. In addition, brain tumor severity and treatment depend on the type, location and size of the tumor. It is also worth mentioning that brain tumors are more common in children than adults and its treatment in children is more challenging because the child’s brain is still developing [10]. Moreover, the growth rate of a tumor in the brain determines its severity and how it affects the brain functionalities and the central nervous system [11]. Since the central nervous system controls most body organs and brain tumors are the most common diseases that negatively affect the central nervous system. Hence, brain tumors can damage most body organs. According to the World Health Organization (WHO), brain tumors are considered the 10th leading cause of death worldwide [8], [12]. Consequently, early classification and detection of brain tumors can help much in the treatment process, lessen their awful damage, and reduce the overall death rate. The manual process of determining tumors in MRI images is time-consuming and difficult in some cases. Thus, the automation of such a process can ease the identification step and help detecting the disease at its earliest stages.

In general, deep learning models require diverse and large amounts of data to guarantee a well-generalized model. However, most medical datasets are limited and suffer from data imbalance problems. On the other hand, the collection of new medical data is a significant challenge, a time-cost intensive process, and requires the collaboration of medical domain experts. Different approaches are commonly used to handle such problems (i.e., limited and imbalanced data), including data augmentation, which is the most common approach used in computer vision tasks to handle these problems by generating synthetic samples from existing ones. Hence, increasing the diversity of the training data and enhancing the generalization and performance of the classification models. In the literature, most studies considered using basic data augmentation techniques such as rotation, flipping, shearing, translation, brightness and contrast adjustment [4], [13], [14], [15]. Other studies considered more effective augmentation techniques based on the idea of Generative Adversarial networks (GANs) [16], [17], [18], [19], [20] which capture the distribution of dataset samples and accordingly generate artificial samples that look realistic based on the learned distribution [21].

In this study, we introduce novel augmentation techniques called RegionInpaint augmentation, Cutoff augmentation, and RegionMix augmentation to significantly improve the generalization ability by augmenting the training samples. In addition to those techniques, basic augmentation techniques (e.g., flipping, rotation, etc.) are also experimented. These novel proposed augmentation techniques depend mainly on segmenting tumors from the MRI images. Thus, segmentation is a crucial step in our study to generate segmentation masks corresponding to the input MRI images and accordingly, the proposed augmentation techniques can be applied. To obtain a high-quality segmentation mask, a U-Net-like architecture is utilized named VGGUNET, where we take advantage of combining a pre-trained encoder (VGG16 [22]) with the U-Net architecture [23] to achieve promising results compared to the other used models for segmentation (i.e., U-Net [23], SegNet [24], and ResUNet [25]).

Two public datasets are used in this study including SPMRI [26] and Br35H datasets [27]. The SPMRI dataset is a small dataset used for training and validation purposes, whereas the Br35H dataset is used for testing purposes. Both datasets consist of two classes (tumor and non-tumor). The main focus of this research is to use a small set for training to significantly investigate the generalization ability of the classification model on the unseen validation and testing sets when using the proposed augmentation techniques. In addition, different pre-trained classifiers are experimented on the original small training set without augmentation. Accordingly, the classifier that attains the best results is selected for experimentation along with the proposed augmentation techniques.

To analyze the efficiency of the proposed augmentation techniques, the classification results are compared before and after extending the training set using the newly generated samples. Moreover, each proposed augmentation technique is applied initially on its own to investigate the effect of each one of them on the classification model and to select the best augmentation technique. Subsequently, combinations of these augmentation techniques are used to generate more diverse synthetic samples. The use of the proposed data augmentation techniques showed better results compared to related studies that used other popular augmentation techniques which demonstrates the superiority of the proposed novel augmentation techniques. It is also noteworthy to mention that the proposed augmentation techniques can be applied to a wide range of tasks where segmentation step is applicable.

In summary, we make the following contributions 1) we introduce different novel augmentation techniques named RegionInpaint and Cutoff, in addition to RegionMix augmentation. 2) To the best of our knowledge, we are the first to introduce new effective augmentation techniques suitable for medical tasks rather than the existing popular augmentation techniques and other GAN-based augmentation techniques. 3) We demonstrate that using the VGGUNET network for segmentation achieved promising results compared to the other segmentation networks used. 4) Extensive experiments show the efficiency of the proposed augmentation techniques and their ability to significantly enhance the generalization performance on unseen samples.

The rest of the paper is organized as follows: Section II briefly discusses related studies to our work. The full methodology and details of each model are discussed in detail in Section III, which can be divided into four main sections (Preprocessing, Segmentation, Data augmentation, and Classification). Section IV introduces the results and analysis for each method. Finally, Section V presents the conclusions of our study and future work.

SECTION II.

Related Work

Brain tumors are one of the most critical and fatal diseases. It affects and damages the central nervous system, which is responsible for most of the body functions. Many recent studies in machine learning, image processing, and deep learning fields have introduced different state-of-the-art techniques and methods, especially in classification, segmentation, and detection tasks, to identify brain tumors in MRI images in the early stages. Most studies consider extending the training data by applying different data augmentation techniques due to the problem of limited and imbalanced data. Thus, in this paper, we focus on studies that have utilized different augmentation techniques. Furthermore, this Section briefly discusses recent related research papers that focus on the identification of brain tumors in MRI images.

Asif et al. [14] proposed different transfer learning-based deep learning models for brain tumor detection in MRI images. First, they applied different preprocessing techniques, such as resizing images to a fixed size (224\times 224 \times 3 ), applying data normalization on the images from 0 to 1, and cropping the brain area from the MRI image and secondly, they used different augmentations such as rotation, flipping, and translation to increase the amount of training data. The preprocessed images were fed into different pre-trained models such as DenseNet121, NasNet Large, Xception and InceptionResNetV2. Their experiments were conducted on two different datasets: the BR35H and SPMRI datasets. The Xception model performed better than the three other experimental models on the two datasets, with an accuracy of 99.67% on the Br35H dataset and 91.94% on the SPMRI dataset.

Younis et al. [28] proposed a deep learning method for brain tumor analysis using the VGG-16 ensemble learning approaches. They applied different preprocessing techniques, such as data normalization and thresholding, followed by a series of erosions and dilations to remove any existing small patch of noise. Moreover, they cropped the brain regions from MRI images. Finally, they applied different data augmentation techniques such as shearing, rotation and shifting. Three different approaches including Custom CNN, VGG16, and Ensemble model, have been tested on the SPMRI dataset and achieved accuracies of 96%, 98.5%, and 98.14% respectively.

Ramtekkar et al. [29] proposed an optimized feature selection method for accurate brain tumor detection using deep learning techniques. They worked on the small SPMRI dataset in their study. For preprocessing, they used a compound filter, which is a combination of gaussian, median, and mean filters. Image segmentation is then applied using thresholding and histogram techniques. A gray-level co-occurrence matrix (GLCM) was then applied in the feature extraction step. Subsequently, they used different optimization algorithms, such as particle swarm optimization, genetic optimization, whale optimization, and wolf optimization for feature selection. Furthermore, the small dataset was augmented to reach 2318 samples. The best testing accuracy of 98.9% was achieved when using the whale optimization algorithm along with a custom CNN for classification purposes.

Kang et al. [15] presented an approach for the classification of brain tumors in MRI images using an ensemble of deep features and machine learning classifiers. Their experiments were conducted on three different public datasets: BT-small-2c, BT-large-2c, and BT-large-4c. They cropped the brain region from the MRI images as a pre-processing step. Thereafter, they extended the datasets using flipping and rotation augmentation strategies. For classification purposes, many different pre-trained CNN models were used for feature extraction with different ML classifiers. In most cases, the SVM classifier with the RBF kernel outperformed the other ML classifiers. However, for feature extraction, the DenseNet-161 deep feature alone achieved the best results on the BT-small-2c dataset and the ensemble of InceptionV3, DenseNet-169, and ResNeXt-50 deep features achieved the best results on the BT-large-2c dataset. Finally, the ensemble of ShuffleNetV2, DenseNet-169, and MnasNet deep features achieved the best results on the BT-large-4c dataset.

Sakib et al. [30]. developed a deep CNN network for brain tumor detection using MRI images. The SPMRI dataset was used in the study. The brain area was cropped from the MRI images, and normalization was applied to narrow the intensity values to a stable range. Different augmentation techniques were applied to extend the limited data including rotation, shifting, flipping, shearing, adjusting brightness, and darkness. Finally, a pre-trained VGG-16 network was used and achieved an accuracy of 96%.

Salama et al. [16] proposed a novel approach for brain tumor detection based on convolutional variational generative models. The experiments were conducted using the SPMRI dataset. The MRI images were resized to a fixed resolution of 224\times 224 , and a min-max normalization technique was applied to map the intensity values from 0 to 1. They took advantage of using generative models to extend the small existing dataset and obtain a large balanced one by creating new synthetic samples. Finally, a pre-trained VGG-16 model was used as the classification model, which achieved an accuracy of 96.88%.

Alsaif et al. [13] focused on developing a novel data augmentation-based brain tumor detection method using CNN. They utilized the SPMRI dataset in their experiments. First, they applied different data augmentation techniques such as flipping, rotation, and translation techniques. Consequently, the images were fed to different pre-trained networks including ResNet-50, ResNet-150, VGG16, VGG19, InceptionV3, and DenseNet121. The pre-trained VGG-16 network achieved the best results compared to the others with an accuracy of 96%.

Rai et al. [31] developed a novel LU-Net deep neural CNN model to detect brain abnormalities in MRI images. They used the SPMRI dataset in their experiments. Different preprocessing techniques were applied, such as converting the images to grayscale, cropping the brain region from the MRI image, and resizing the images into a fixed resolution of 224\times 224 . They generated 21 augmented images for each MRI image by applying basic transformations to extend the training dataset. Finally, different models including Le-Net, VGG16, and LU-Net were used. The LU-Net model outperformed the other models with an overall accuracy of 98%.

SECTION III.

Methodology and Proposed Work

In this section, we illustrate the entire proposed method for classifying brain MRI images as shown in Figure 1. The full methodology consists of four main steps: preprocessing the raw input MRI images, image segmentation, and applying different novel augmentation techniques, and finally the classification step. These steps are discussed in detail in the following sub-sections.

FIGURE 1. - Block diagram of the full proposed method.
FIGURE 1.

Block diagram of the full proposed method.

A. Preprocessing

Most of brain MRI images contain unnecessary black pixels. Thus, the brain area is cropped from the MRI images as the remaining black area does not contain any relevant features that can help for classification. In addition, this will help the CNN network to converge faster during training. The cropping process follows the steps shown in Figure 2. First, each Input RGB image is smoothed by applying a 3\times 3 Gaussian filter. Afterwards, the smoothed image is converted to grayscale, and thresholding is then applied to the grayscale MRI image to obtain a binary image. Furthermore, a set of erosion operations followed by dilation operations are applied to the binary image, in order to remove the noise in the images. Subsequently, the object with the largest contour in the binary image is grabbed, [32] and the four extreme points of that object (i.e., brain) are calculated accordingly. These points are used to crop the brain area and reduce the surrounding black area. Finally, the cropped image is resized to a fixed resolution of 256\times 256 using the bicubic interpolation technique. Moreover, all the input images are normalized from 0 to 1 using the min-max normalization technique.

FIGURE 2. - Preprocessing steps for cropping the brain area from MRI images.
FIGURE 2.

Preprocessing steps for cropping the brain area from MRI images.

B. Segmentation

Image segmentation is a crucial part in our research study. It is considered as a preprocessing step needed for applying the proposed augmentation techniques. The segmentation main objective is to produce masks for the input training images where those masks are used as an initial step for applying the proposed augmentation techniques. In this section, we discuss the segmentation models and the loss functions used.

U-Net [23] is one of the most popular and powerful convolution neural networks that was first introduced in 2015 for the segmentation of biomedical images. Since then, it has been used in many different segmentation tasks such as medical image segmentation, self-driving cars, and satellite image segmentation. The network consists of two parts: an encoder followed by a decoder. The encoder part is responsible for generating high level semantic features from the input image through a sequence of encoder blocks. Each encoder block consists of a set of convolution and max-pooling operations. The decoder is responsible for mapping the dense high-level features generated by the encoder into the desired segmentation mask through a sequence of decoder blocks. Each decoder block consists of a set of up-sampling and convolution operations. Skip connections are utilized in the U-Net architecture to combine the features from the encoder part with their corresponding feature resolution in the decoder part to refine the segmentation results. In addition, this allows recovering the spatial information lost during downsampling and enhancing the fine-grained details learned by the encoder.

VGGUNET is similar to U-Net. However, it uses a pre-trained VGG16 network on the ImageNet dataset [33] without its fully connected layers as its encoder. The main advantage of replacing the default encoder of U-Net with a pre-trained network is that it enables the segmentation model to produce better features, thereby enhancing the overall segmentation results. This also helps the segmentation model to converge faster. Some recent studies also replaced the default encoder of the segmentation model with different versions of VGG architecture [34], [35]. Figure 3 shows the full network of VGGUNET. The VGGUNET encoder consists of 13 convolutional layers and 5 max-pooling layers. Each convolutional layer uses a 3\times 3 kernel size followed by a ReLU activation function whereas each max-pooling layer reduces the feature map resolution by a factor of 2. The dense feature map produced at the last layer by the encoder is then passed to the decoder part that reconstructs the segmentation mask that corresponds to the given input image. The decoder consists of a set of convolution operations and transpose convolution operations. Each convolution operation uses a 3\times 3 kernel size followed by a BN layer and a ReLU activation function whereas each transpose convolution operation uses a 2\times 2 kernel size and stride of 2 to double the resolution of the given feature map each time. In addition, skip connections are used to concatenate the encoder features with their corresponding features in the decoder.

FIGURE 3. - VGGUNET segmentation network.
FIGURE 3.

VGGUNET segmentation network.

Finally, a 1\times 1 convolution operation with only one filter is applied, followed by a sigmoid activation function to produce a binary mask with the same resolution as the given input image.

Each pixel in the prediction mask indicates whether it belongs to a tumor or not.

The loss function used to optimize the VGGUNET segmentation model is a combination of two losses (binary cross entropy and dice loss functions) which is defined as follows:\begin{equation*} \mathrm {Total loss = Dice loss + 0.1\ast BCE loss } \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. The BCE loss is widely used in binary classification tasks as well as image segmentation tasks, where it focuses on optimizing the network to correctly classify each pixel individually. Hence, it works well for pixel-level classification of image segmentation tasks. It is defined as follows:\begin{equation*}\mathrm {BCE loss}=-\frac {1}{m}\sum \nolimits _{i=1}^{m} {(y_{i}log\hat {y}_{i}} +(1-y_{i})\mathrm {log(1-}\hat {y}_{i})) \tag{2}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where y_{i} denotes the actual value for each pixel and \hat {y}_{i} denotes the predicted value for each pixel.

The Dice loss function is primarily based on Dice coefficient. The Dice coefficient is one of the most common metrics that is widely used to assess the performance of segmentation models. It is considered an overlapping index measure and is very similar to the IOU metric, where both are responsible for calculating the similarity between the ground truth mask and the predicted one. It was then slightly adapted to be used as a loss function [36]. Both Dice loss and Dice coefficient are defined using Equations (3) and (4) respectively.\begin{align*} Dice loss&=1-Dice coefficient \tag{3}\\ Dice coefficient&= \frac {2\ast y\ast \hat {y}+\varepsilon }{y+\hat {y}+\varepsilon } \tag{4}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \boldsymbol {y} denotes the actual output, \hat { \boldsymbol {y}} denotes the predicted output, and \boldsymbol { \varepsilon } is used to avoid the extreme case of division by zero (i.e., \boldsymbol {y}=\hat { \boldsymbol {y}}= \boldsymbol {0} ). The Dice loss function is used to optimize the Dice coefficient. The relationship between the dice loss and dice coefficient is inversely proportional. Thus, the model is not penalized much when the Dice coefficient approaches 1. This indicates that there is a high similarity between the original mask and the predicted mask, and vice versa.

C. Data Augmentation

Data augmentation techniques are used to expand the training data by generating new synthetic samples from the existing ones. Thus, it helps to reduce overfitting and increases the generalization performance of the classification model for unseen samples. In this study, we introduce new effective augmentation techniques called RegionInpaint augmentation, Cutoff augmentation, and RegionMix augmentation. In addition, basic data augmentation techniques are used. These techniques are discussed in detail in the following sub-sections.

1) RegionInpaint Augmentation

Image inpainting is a task of reconstructing missing pixels in an image. The missing pixels are filled in a realistic-looking way to represent a complete image. Many recent researches [37], [38], [39] have introduced novel image inpainting techniques that showed promising results. This study introduces a novel augmentation technique based on the idea of image inpainting. Figure 4 illustrates the steps involved in applying our proposed augmentation technique. First, segmentation is applied to the training images as discussed in the previous section.

FIGURE 4. - Proposed RegionInpaint augmentation technique (the upper part is responsible for generating the binary mask which is then fed along with the Tumor image to the inpainting network to generate the augmented image as shown in the lower part).
FIGURE 4.

Proposed RegionInpaint augmentation technique (the upper part is responsible for generating the binary mask which is then fed along with the Tumor image to the inpainting network to generate the augmented image as shown in the lower part).

The goal of applying segmentation is to generate binary masks for our training images where the white pixels in the binary mask correspond to the tumor area and the black pixels correspond to the non-tumor area. Thereafter, a set of dilation and erosion operations is applied to the generated masks to fill any holes that may result from segmentation. These binary masks are then inverted such that the black pixels correspond to the tumor area. Finally, each inverted mask is fed to the inpainting network along with its corresponding original image. The inverted mask is used as the input mask for the inpainting network.

The goal of the inpainting network is to fill in the pixels in the input image that correspond to the black pixels in the given mask. This corresponds to filling the tumor area with a realistic non-tumor area. In this manner, the data are augmented by transferring the tumor images to non-tumor images, and thus increasing the number of samples in the “non-tumor class”. The inpainting method used in the proposed augmentation is primarily based on the method introduced by Liu et al. [37]. The image inpainting network utilized is depicted in Figure 5.

FIGURE 5. - Image inpainting network (Zoom in for a better view).
FIGURE 5.

Image inpainting network (Zoom in for a better view).

Similar to the Unet architecture, the inpainting network is designed in an encoder-decoder fashion, where skip connections exist between the encoder and decoder layers. The main differences are that the inpainting network uses partial convolution operation instead of using standard convolution operation. In addition, the input image is stacked with its corresponding input mask to be fed into the network. The encoder part comprises a set of partial convolution layers, each using a stride of 2\times 2 in order to downsample the feature map each time with a factor of 2. Each partial convolution operation is followed by a batch normalization layer and a ReLU activation function except for the first partial convolution layer, it is only followed by a ReLU activation function. The decoder part comprises a set of up-sampling layers. Each up-sampling layer is followed by concatenating the feature map and its mask with their corresponding ones in the encoder part, and a partial convolution operation with a stride of 1\times 1 is then applied, followed by a batch normalization layer and Leaky ReLU (alpha=0.2) activation function. Finally, a 1\times 1 convolution operation with three filters is applied, followed by a sigmoid activation function to obtain the desired reconstructed image of size HxWx3. The partial convolutional layer and the loss functions used in the network are discussed in detail below.

The Partial convolutional layer is responsible for performing two steps. The first step is applying convolution using only the valid pixels (non-missing pixels), which is called partial convolution. The second part is the mask update mechanism step. Since standard convolution considers all the pixels in the sliding window, including the missing pixels. Thus, it is replaced with partial convolution, where only valid pixels are considered, resulting in a much better image quality. The partial convolution operation is expressed by the following equation.\begin{align*}X^{\prime }=\begin{cases} W^{T}\left ({X\odot M }\right)\frac {sum\left ({1 }\right)}{sum\left ({M }\right)}+b, &if sum\left ({M }\right)>0 \\ 0, & otherwise \\ \end{cases} \tag{5}\end{align*}

View SourceRight-click on figure for MathML and additional features. where W refers to the kernel weights and b refers to the bias of the kernel. X represents the feature map values (pixel values) at a specific sliding window location and M is the corresponding binary mask. \odot is an element-wise multiplication operation which takes into account only the valid pixels corresponding to the 1’s in the mask pixels. sum(1) is the sum of a matrix full of ones that has the same size of the kernel (sliding window). sum(M) is the sum of all valid (non-missing) pixels in the sliding window. The scaling factor sum(1) /sum(M) is used to apply suitable scaling to adjust for the varying number of valid pixels in each sliding window.

The second step is the mask update mechanism, which is expressed by the following equation:\begin{align*}m^{\prime }=\begin{cases} 1, & if sum\left ({M }\right)>0 \\ 0, &otherwise \\ \end{cases} \tag{6}\end{align*}

View SourceRight-click on figure for MathML and additional features. In the update mechanism step, if the result of the current convolution is conditioned at one or more valid pixels, then this location in the mask will be marked as valid for the next partial convolution layer; otherwise, this location will remain marked as missing. After applying the mask update mechanism step successive times, the mask will be later all ones if the input feature contained any valid pixels.

A combination of different loss functions is used in order to optimize the inpainting network parameters. These loss functions are given below.

Pixel-wise loss: This loss is used to improve pixel-wise reconstruction process. The pixel-wise loss (L1 loss) is computed based on two equations; Equation (7) computes the pixel-wise loss for the reconstructed valid pixels (non-hole pixels) and Equation (8) computes the pixel-wise loss for the reconstructed missing pixels.\begin{align*}L_{valid}&=\frac {1}{N}{\vert \vert M\odot (I_{out}-I_{gt})\vert \vert }_{1} \tag{7}\\ L_{hole}&=\frac {1}{N}{\vert \vert (1-M)\odot (I_{out}-I_{gt})\vert \vert }_{1} \tag{8}\end{align*}

View SourceRight-click on figure for MathML and additional features. where M is the input binary mask, I_{out} is the reconstructed image generated by the inpainting network, I_{gt} is the original ground truth image and N is the number of elements in the original image (N=C \times H \times W).

Perceptual loss: The goal of perceptual loss [40], [41] is to check the perceptual similarity by passing each of the original ground truth images and the reconstructed images into a pre-trained deep neural network (i.e., VGG16). Therefore, instead of minimizing pixel-wise loss, it minimizes the L^{1} distance between the semantic features of the original image and the reconstructed image. The perceptual loss is defined as follows:\begin{align*} L_{perceptual}&= \sum \limits _{n=0}^{N} \frac {\left |{ \left |{ \psi _{n}\left ({I_{out} }\right)-\psi _{n}\left ({I_{gt} }\right) }\right | }\right |_{1}}{N_{\psi _{n}}} \\ &\quad + \sum \limits _{n=0}^{N} \frac {\left |{ \left |{ \psi _{n}\left ({I_{comp} }\right)-\psi _{n}\left ({I_{gt} }\right) }\right | }\right |_{1}}{N_{\psi _{n}}} \tag{9}\end{align*}

View SourceRight-click on figure for MathML and additional features. I_{comp} is the same as I_{out} , but the valid pixels in I_{comp} are replaced with their corresponding pixels in the original ground truth image, \psi _{n} refers to the feature map at layer n of the pre-trained VGG16 model, and N_{\psi _{n}} is the number of elements in the feature map \psi _{n} , the layers used to define this loss are pool1, pool2, and pool3. The first part of the perceptual loss tries to minimize the semantic differences between the original ground truth image and the reconstructed one, whereas the second part focuses only on minimizing the semantic differences between the reconstructed missing pixels and their corresponding pixels in the original image.

Style loss: It is similar to perceptual loss, [40], [41], [42] as it is also computed using the feature maps generated by a pre-trained model (VGG16) to define the loss. The difference is that the style loss performs autocorrelation (Gram matrix) on each feature map before applying the L^{1} distance. The Gram matrix represents some style information, such as textures and colors. Gram matrix is defined as:\begin{equation*}{GM\left ({X }\right)=Kn\ast (\psi _{n}\left ({X }\right)}^{T}\psi _{n}(X)) \tag{10}\end{equation*}

View SourceRight-click on figure for MathML and additional features. Kn is a normalization factor (Kn=\frac {1}{C_{n}\ast H_{n}\ast W_{n}} , where C_{n},H_{n},W_{n} are the dimensions of the feature map at the nth layer).

\psi _{n}(X) refers to the feature map at layer n of the pre-trained VGG16 model when passing X as the input image, where \psi _{n}(X) shape is (H_{n},W_{n},C_{n} ). The first and second dimensions are combined together so the updated shape of \psi _{n}(X) is (H_{n}\mathrm {\ast }W_{n},C_{n} ) and the shape of its transposed matrix {\psi _{n}(X)}^{T} is (C_{n}{,H}_{n}\mathrm {\ast }W_{n} ). The final output GM\left ({X }\right) which is computed by applying dot product between these two terms has a shape of (C_{n} , C_{n} ) where the GM\left ({X }\right) represents the autocorrelation for a feature map at a specific layer. The Style loss is defined by the following equations:\begin{align*} L_{style(out)}&=\sum \limits _{n=0}^{N-1} {\frac {1}{C_{n}^{2}}{\vert \vert GM(I_{out})-GM(I_{gt})\vert \vert }_{1}} \tag{11}\\ L_{style(comp)}&=\sum \limits _{ \boldsymbol {n}= \boldsymbol {0}}^{ \boldsymbol {N-1}} {\frac {1}{C_{n}^{2}}{\vert \vert GM(I_{comp})-GM(I_{gt})\vert \vert }_{1}} \tag{12}\end{align*}

View SourceRight-click on figure for MathML and additional features. Equation (11) focuses on minimizing the distance between the style of the original ground truth and the reconstructed one whereas equation (12) focuses only on minimizing the distance between the style of the reconstructed missing pixels and their corresponding pixels in the original image.

Total variation loss (TV loss): The use of the total variation loss encourages the network to reduce the noise in the resulting image [41]. It computes the summation of the absolute differences for the pixels and their corresponding neighbors in order to ensure the smoothness of the reconstructed missing pixels obtained by the inpainting network. It is defined by the following equation:\begin{align*} L_{tv}&=\sum \limits _{\left ({i,j }\right)\in P,\left ({i,j+1 }\right)\in P} \frac {\left |{ \left |{ I_{comp}^{i,j+1}-I_{comp}^{i,j} }\right | }\right |_{1}}{N_{I_{comp}}} \\ &\quad +\sum \limits _{\left ({i,j }\right)\in P,\left ({i,j+1 }\right)\in P} \frac {\left |{ \left |{ I_{comp}^{i+1,j}-I_{comp}^{i,j} }\right | }\right |_{1}}{N_{I_{comp}}} \tag{13}\end{align*}

View SourceRight-click on figure for MathML and additional features. where N_{I_{comp}} is the number of pixels in I_{comp}

The total loss is defined as a combination of all discussed losses with different weights as given by Liu et al. [37].\begin{align*} \boldsymbol {L}_{ \boldsymbol {total}}&= \boldsymbol {L}_{ \boldsymbol {valid}}+{ \boldsymbol {6L}}_{ \boldsymbol {hole}} +0.05 \boldsymbol {L}_{ \boldsymbol {perceptual}}+120(\boldsymbol {L}_{ \boldsymbol {style}\left ({\boldsymbol {out} }\right)} \\ &\quad + \boldsymbol {L}_{ \boldsymbol {style}\left ({\boldsymbol {comp} }\right)}+0.1 \boldsymbol {L}_{ \boldsymbol {tv}} \tag{14}\end{align*}

View SourceRight-click on figure for MathML and additional features.

2) Cutoff Augmentation

The proposed Cutoff augmentation approach is based on randomly selecting two images, one from the “tumor class” and the other from the “non-tumor class”. Figure 6 illustrates the steps for applying this augmentation technique. The first step is applying segmentation on the tumor image to obtain its corresponding mask, where the white pixels in the predicted mask represent the tumor region. Afterwards, a set of dilations and erosions is applied to the predicted mask to fill any small holes that may result from the segmentation step. The predicted mask is then superimposed with the original image to obtain the segmented tumor. This segmented tumor is copied to the non-tumor image to obtain a new augmented image. Hence, using this augmentation approach, we are able to increase the number of images in the “tumor class”. Finally, a Gaussian blur filter is applied to the resulting image, and thus enabling the copied tumor to blend with the background. Moreover, different transformations are applied such as rotation, flipping, adjusting brightness, and contrast, to make the tumor in the resulting image look different from the tumor in the original image and accordingly increasing the variety of the training samples.

FIGURE 6. - Proposed Cutoff augmentation technique (the upper part is responsible for generating the segmented tumor which is then copied to the non-tumor image to generate the augmented image as shown in the lower part).
FIGURE 6.

Proposed Cutoff augmentation technique (the upper part is responsible for generating the segmented tumor which is then copied to the non-tumor image to generate the augmented image as shown in the lower part).

3) RegionMix Augmentation

We introduce a new effective augmentation technique called RegionMix. It is based mainly on segmentation and the MixUp approach [43] where the region of interest (i.e., the tumor region) is first extracted from a tumor image through a segmentation network and then mixed with a non-tumor image. The Mixup approach was introduced by Zhang et al. [43]. Since then, it has been widely used in different deep learning tasks such as segmentation, image recognition, natural language processing, and speech recognition. Mixup also has a great advantage for being data-agnostic as it works with different types of data, such as images, text, speech, or any other source of data.

The main idea of Mixup is to extend the training samples in the training distribution by generating new ones that act as a linear interpolation of existing training samples and their corresponding labels. In addition, it acts as a regularization technique because it helps to reduce overfitting and increases the generalization and robustness of the model. In short, mixup generates new samples as a weighted linear combination of random image pairs from the training set as shown in Figure 7. The Mixup process is simply defined by the following equation:\begin{align*} \overline x &=\lambda x_{i}+\left ({1-\lambda }\right)x_{j} \tag{15}\\ \overline y& =\lambda y_{i}+\left ({1-\lambda }\right)y_{j} \tag{16}\end{align*}

View SourceRight-click on figure for MathML and additional features. where (x_{i} , x_{j} ) corresponds to the pair of random images selected from the training set. (y_{i} , y_{j} ) refer to the corresponding labels of the two selected samples x_{i} and x_{j} represented in one hot encoding. \lambda is the weighting hyperparameter that controls the contribution of each image where \lambda ~\epsilon [1,0] and is sampled from the beta-distribution. (\overline x , \overline y ) correspond to the newly generated mixed image and its corresponding label respectively.

FIGURE 7. - Mixup approach.
FIGURE 7.

Mixup approach.

Figure 8 illustrates the steps for applying the proposed RegionMix augmentation technique. Similar to the Cutoff augmentation technique, two random images are selected where the first image belongs to the “tumor class” and the second one belongs to the “non-tumor class”. The segmented tumor from the first image is mixed with the pixels that share the same tumor location in the second non-tumor image. The newly generated image is considered in-between both classes. Finally, different transformations are applied to the resulting image.

FIGURE 8. - Proposed RegionMix augmentation technique (the upper part is responsible for generating the segmented tumor. Mixup approach is then applied between the segmented tumor region and the non-tumor image as shown in the lower part.
FIGURE 8.

Proposed RegionMix augmentation technique (the upper part is responsible for generating the segmented tumor. Mixup approach is then applied between the segmented tumor region and the non-tumor image as shown in the lower part.

4) Basic Data Augmentation

In this work, basic data augmentation techniques are also used to obtain more images in each class, it is simply done by applying different transformations on the input images to generate new ones. Different transformations that are applicable for medical image classification tasks are applied, including horizontal and vertical flipping, image rotation, and adjusting the image brightness and contrast. Figure 9 shows a sample of the augmented images obtained using these different basic transformations.

FIGURE 9. - Sample of brain tumor MRI image and its corresponding augmentation results: (A) Original MRI, (B) Vertical flipping, (C) Adjusting brightness and (D) Rotation.
FIGURE 9.

Sample of brain tumor MRI image and its corresponding augmentation results: (A) Original MRI, (B) Vertical flipping, (C) Adjusting brightness and (D) Rotation.

D. Classification

For the classification of brain MRI images, different pre-trained CNNs are experimented including VGG16 [22], VGG19 [22], ResNet50 [44], InceptionV3 [45], DenseNet121 [46], Xception [47], and MobileNetV2 [48]. First, these models are trained on the original training set only without augmentation. The model with the best performance on the validation and testing sets is selected. This model will be considered the main classification model and will be further used along with the proposed augmentation techniques. Figure 10 shows the architecture of the VGG19 network that achieved the best performance. The input image size for the network is 256\times 256 \times 3 . VGG19 comprises 16 convolution layers and 5 max-pool layers. Each convolution layer in the network uses a kernel size of 3\times 3 and stride of 1, followed by a ReLU activation function which is used to add non-linearity to the network. Each max-pooling layer decreases the resolution of the given feature map by a factor of 2. We replaced the fully connected (dense) layers in the network with a global average pooling layer [49], which takes the spatial average of the feature maps and acts as a form of regularization. Global average pooling is preferable in some cases than fully connected layers, as it is less prone to overfitting and increases the generalization of the model. Moreover, it significantly reduces the number parameters and computational complexity of the model. Finally, a sigmoid layer is used for classification purposes to classify the resulting feature vector into “Tumor class” or “non-Tumor class”.

FIGURE 10. - VGG19 network classifier.
FIGURE 10.

VGG19 network classifier.

SECTION IV.

Experiments & Results

All the experiments in this study are conducted on NVIDIA Tesla P100 GPU using TensorFlow and Keras frameworks. The next sub-sections provide all the details and discussions about the datasets used, the considered evaluation metrics, the segmentation results, the inpainting results, and the classification results.

A. Datasets

We performed our experiments using two different brain MRI datasets. The first dataset used is a small public dataset that has been released on Kaggle in 2020 [26]. For simplicity, we refer to this dataset as the SPMRI dataset. It consists of 253 images distributed among two classes. The first class corresponds to the “Tumor” class and contains 98 images.

The second class corresponds to the “non-tumor” class and contains 155 images.

The second dataset (Br35H dataset) is also a public dataset released on Kaggle in 2020 [27]. it has the same two classes as those described in the SPMRI dataset. It consists of 3000 images, which are distributed equally among both classes; therefore, each class has 1500 image. This dataset is more diverse and has many more images than the SPMRI dataset.

To ensure that the proposed augmentations enable the classification model to generalize much better on unseen samples, we used two sets for evaluating the classification models (i.e., the validation and testing sets). The training set represents 80% of the SPMRI dataset, whereas the validation set represents the remaining 20% of the SPMRI dataset samples. It should be noted that the validation set used in our experiments was completely unseen during training. The testing set represents the samples of the Br35H dataset. We aimed to use a tiny training set to experiment on how well our classification model will generalize to both validation and testing sets when extending the small training set with the new synthetic samples generated by the different proposed augmentation techniques. In addition, we considered using BR35H dataset for evaluation besides the validation set, in order to evaluate the model to a larger dataset with different distribution from the SPMRI dataset and thus, we can ensure that the proposed augmentation techniques enable the classification model to even generalize well on new samples with different distributions. Table 1 shows the distributions of the training, validation, and testing sets.

TABLE 1 Distribution of the Training, Validation and Testing Sets
Table 1- 
Distribution of the Training, Validation and Testing Sets

B. Evaluation Metrics

To assess the effectiveness of the classification and segmentation models, four metrics including precision, recall, F1-score, and overall accuracy are used. These metrics are derived from the normalized confusion matrices, which are constructed using the four elements of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) according to the following equations:\begin{align*} Precision&=\frac {TP}{TP+FP} \tag{17}\\ Recall&=\frac {TP}{TP+FN} \tag{18}\\ F1-score&=\frac {2\ast Precision\ast Recall}{Precision+Recall} \tag{19}\\ Overall accuracy&=\frac {TP+TN}{TP+FP+TN+FN} \tag{20}\end{align*}

View SourceRight-click on figure for MathML and additional features. where TP refers to the number of samples that are correctly classified as positive. Similarly, TN refers to the number of samples correctly classified as negative. On the other hand, FP is the number of samples that are incorrectly classified as positive and the FN is the number of samples incorrectly classified as negative.

For evaluating the image inpainting model performance. Two different metrics are considered, which are Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). PSNR is a widely used measure for image reconstruction tasks. It is a modified version of MSE, which analyzes the comparison of each pixel in the images. It represents the ratio between the maximum power of image to the maximum power of noise that distorts the image representation. When the PSNR value increases, this indicates the better image reconstruction compared to the original one. The PSNR metric is calculated based on the following equation:\begin{equation*} PSNR=20.\log \left ({\frac {MAX_{I}}{\sqrt {MSE}} }\right) \tag{21}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where {MAX}_{I} is the maximum pixel intensity value.

The SSIM metric [50] is widely used for determining the similarity between two images. It is considered an image perceptual metric, as it correlates with the human visual system’s perception (HSV color mode). When the SSIM value approaches 1, this indicates that the reconstructed image is almost identical to the original image. The SSIM is calculated using the following equation:\begin{equation*} SSIM\left ({x,y }\right)=\frac {\left ({2~\mu _{x}\mu _{y}+c_{1} }\right)\left ({2\sigma _{xy}+c_{2} }\right)}{(\mu _{x}^{2}+ \mu _{y}^{2}+ c_{1})(\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2})} \tag{22}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \mu _{x} , \mu _{y} are the mean values of the two images. Similarly, the \sigma _{x},\sigma _{y} are the variances of the two images. \sigma _{xy} is the covariance between the two images intensities. c_{1} and c_{2} are constants that stabilize division with a weak denominator.

C. Segmentation Results

This section demonstrates the configurations and the achieved results obtained from the experimental segmentation models. As mentioned previously, the main purpose of using segmentation is to obtain segmentation masks for the training images. These masks are needed as an initial step for applying the proposed augmentation techniques. Different segmentation models are experimented such as U-Net [23], VGGUNET, SegNet [24], and ResUNet [25]. In the BR3H dataset, 800 images only out of the total 3000 image are annotated with their corresponding segmentation masks. These 800 annotated samples are used to train and evaluate the segmentation models, where 500 images from the annotated samples are used for training, 200 images are used for validation and the remaining 100 are used for testing the models. Table 2 lists the configurations used to train each segmentation model. The hyperparameters used in Table 2 for each segmentation model are tuned and selected based on the achieved results on the validation set. Table 3 shows the validation and testing results for each model using different evaluation metrics including accuracy, precision, recall, and dice coefficient score (DSC).

TABLE 2 The Configurations Used for Training the Segmentation Models
Table 2- 
The Configurations Used for Training the Segmentation Models
TABLE 3 Comparison Between the Performance of the Utilized Segmentation Models
Table 3- 
Comparison Between the Performance of the Utilized Segmentation Models

The Dice coefficient follows the rule depicted in Equation (4), while precision and recall are calculated using Equations (17) and (18) respectively. As shown in Table 3, U-Net achieved comparable results to VGGUNET, however the VGGUNET model outperformed all other models by achieving the best results for all evaluation metrics. This is due to the use of a pre-trained VGG-16 as an encoder which enabled the model to converge much faster and learn better feature representations. Since VGGUNET achieved the best results among the remaining segmentation models, it is used to predict the segmentation masks on the training set of the SPMRI dataset as an initial step for applying the proposed novel augmentation techniques. Figure 11 shows random sample images from the Br35H dataset, along with their corresponding annotation masks. Figure 12 shows the prediction masks of the VGGUNET model for some sample images in the SPMRI dataset.

FIGURE 11. - Sample images from the BR35H dataset with their corresponding annotations.
FIGURE 11.

Sample images from the BR35H dataset with their corresponding annotations.

FIGURE 12. - Sample images of VGGUNET predictions on the SPRMI dataset.
FIGURE 12.

Sample images of VGGUNET predictions on the SPRMI dataset.

D. Inpainting Results

1) Dataset Preparation and Training

RegionInpaint augmentation is one of the proposed augmentation techniques used to augment the training set of the SPRMI dataset to be ready for classification (see Section III-C.1).

Prior to augmentation, the inpainting network is first trained using 155 images in the “non-Tumor class”. These 155 images are split as follow: 100 images are used for training and the remaining 55 are used for validation. For training purposes, five different random masks are generated for each training image, which enables the model to learn filling different missing areas based on each random mask. Hence, a total of 500 images are used to train our inpainting model. Moreover, during training the inpainting network, each image has a 50% chance of being flipped to increase the variety of the images fed to the network. Two different techniques are used to generate random masks.

The first method generates masks with random small circles, ellipses, and lines of different sizes. This can be seen in the examples in Figure 13 (3rd and 4th rows). The second technique generates masks by using random circles only with varying sizes. This can be seen in the examples in Figure 13 (1st and 2nd rows). The first technique enables the model to learn how to fill in the small missing parts of different shapes with appropriate pixels based on the known surrounding context. Although, this technique is used in most deep learning-based approaches for image inpainting, the second technique is considered more relevant to our research study as the brain tumor areas tend to look like circles that vary in their radius. Thus, it helps the model to be more robust when filling in missing parts that look like tumors of varying sizes.

FIGURE 13. - Sample of the generated masks for image inpainting.
FIGURE 13.

Sample of the generated masks for image inpainting.

Another important notice is that inpainting models can easily achieve remarkable progress in restoring small missing holes in an image with appropriate pixels. However, when the holes become larger, their fillings contents begin to suffer from blurry textures and distortion due to the large gap between the known and the unknown pixels. In such case, the model tries to replace the large hole with a blurry area instead of an appropriate visual area. Accordingly, combining both techniques of random masks generation allows the model to converge faster and learn a more natural filling of the missing areas/holes regardless of its size.

2) Results of Image Inpainting

This section represents the results and the configurations of the image inpainting model. The inpainting network is trained using a combination of different losses (see section III-C.1). The training setup of the inpainting network used is shown in Table 4. Adam optimizer [51] is used to update the network weights with learning rate value of 1e-5 and batch size of 4. The model is pre-trained on ImageNet dataset [33] and fine-tuned for 100 epochs on the training set. PSNR and SSIM metrics are considered to evaluate the model performance. The model achieved 28.9 PSNR and 0.875 SSIM on the validation set. Figure 14 and Figure 15 shows the values of PSNR and SSIM over the number of epochs respectively. After training the inpainting network, it is used to generate the augmentation images for the SPRMI train split (i.e., non-tumor class) using the generated masks from the segmentation task. Both images and their generated binary masks with size 256\times 256 \times 3 are fed into the image inpainting network. The black pixels of the binary mask correspond to the missing parts in the input image that we are interested to fill. Figure 16 shows sample input images along with their corresponding segmentation masks obtained from VGGUNET model that highlight the tumor region and the inpainting model results.

TABLE 4 The Configurations Used for Training the Inpainting Model
Table 4- 
The Configurations Used for Training the Inpainting Model
FIGURE 14. - Validation PSNR over the number of epochs.
FIGURE 14.

Validation PSNR over the number of epochs.

FIGURE 15. - Validation SSIM over the number of epochs.
FIGURE 15.

Validation SSIM over the number of epochs.

FIGURE 16. - Sample input images with their corresponding segmentation masks and their inpainting results.
FIGURE 16.

Sample input images with their corresponding segmentation masks and their inpainting results.

E. Classification Results

1) Experiment I - Comparing Different Classification Models

For classification, different pre-trained classification models are used including VGG16 [22], VGG19 [22], ResNet50 [44], DenseNet121 [46], InceptionV3 [45], Xception [47], and MobileNetV2 [48]. The Adam optimizer [51] is utilized for tuning the model parameters with a mini-batch size of 32. Moreover, all these models are pre-trained on the ImageNet dataset [33], trained and validated on the SPMRI dataset and finally tested on the Br35H dataset. The number of samples in the training, validation, and testing sets for each class is listed in Table 1. The training and validation sets are randomly selected from the SPMRI dataset with a ratio of 80% and 20% respectively. Since the SPMRI dataset samples are not equally distributed on both classes. Thus, the number of samples in the training and validation sets are also not equally distributed among the two classes.

To fairly compare between the experimental models, all of them are trained on the same training set and evaluated on the same validation and testing sets. In addition, to illustrate the efficiency of our proposed augmentation techniques, these classification models are trained on the original training set without augmentation. The model that achieves the best results is selected for further training on the extended dataset after applying the different proposed augmentation techniques to investigate its generalization ability on the validation and testing data. Different metrics are used to evaluate our models including Accuracy, Precision, Recall, F1-score and AUC-score. It can be observed from Table 5 that VGG19 outperformed the remaining models in terms of the considered evaluation metrics. It achieved the best overall accuracy, F1-score and AUC-score on the unseen samples of the validation and testing sets. VGG16 and InceptionV3 also achieved comparable results to VGG19.

TABLE 5 Comparison of the Different Pre-Trained Classification Networks on the Unbalanced SPMRI Dataset Without Augmentation
Table 5- 
Comparison of the Different Pre-Trained Classification Networks on the Unbalanced SPMRI Dataset Without Augmentation

2) Experiment II - Effect of Balancing the Training Set

Since the training set samples are not balanced among the two classes, thus the second experiment aims to investigate the effect of balancing the training set using different proposed augmentation techniques. Referring to the unbalanced training sample sizes of each class (see Table 1.), the minor class (i.e., non-tumor class) is extended with 40 samples using basic and RegionInpaint augmentation techniques, which are considered the valid augmentation techniques for extending the minor class. This allows both classes to be almost equally distributed. Consequently, VGG19 is used along with both augmentation techniques as depicted in Table 6 with the same hyperparameters used for training before augmentation. It is observed from Table 6 that the proposed RegionInpaint augmentation showed a significantly favorable performance compared to the cases of no augmentation and basic augmentation on both the validation and testing sets.

TABLE 6 Comparison of the Classification Results When Using Different Augmentation Techniques for Balancing the Training Samples Among Both Classes
Table 6- 
Comparison of the Classification Results When Using Different Augmentation Techniques for Balancing the Training Samples Among Both Classes

This demonstrates that the proposed technique (RegionInpaint augmentation) creates new synthetic samples capable of significantly improving the generalization ability of the model. Furthermore, the proposed RegionInpaint augmentation gives an advantage of enabling the classification model to consider the same brain image twice; once with the tumor area and once without the tumor area. This helps the model focus more on distinguishing between images through the presence or absence of the discriminative features (i.e., tumor region) regardless of the other non-informative features in the image.

3) Experiment III - Comparison of the Different Proposed Augmentation Techniques

In this sub-section, we investigate the effect of all the proposed augmentation techniques. The VGG19 model is used in this experiment along with the proposed augmentation techniques. First, each augmentation technique is used individually to observe its effect on the classification model. Thereafter, two or more techniques are combined together to generate more and diverse synthetic samples while trying to maintain the balance of the dataset among both classes. As mentioned previously, the RegionInpaint augmentation technique is used to increase the number of samples in the “non-tumor class”. In contrast, the Cutoff augmentation technique is used to increase the number of samples in the “tumor class”. Thus, both techniques are used together to increase the number of samples in both classes.

RegionMix augmentation technique is used to generate new synthetic samples that are considered in-between both classes as illustrated in Figure 8. The weighting parameter (\lambda ) value used in our experiments for RegionMix augmentation is either between [0 to 0.3] or [0.7 to 1]. These ranges of values are the ones that achieved the best results when experimenting Mixup approach. The comparison results obtained using the different augmentations are given in table 7.

TABLE 7 Comparison of the Classification Results Using the Different Proposed Augmentation Techniques
Table 7- 
Comparison of the Classification Results Using the Different Proposed Augmentation Techniques

To fairly observe the effect of each augmentation technique, 150 samples are initially generated by each augmentation technique. In this context, the combination of using RegionInpaint and Cutoff techniques achieved the best results on the validation and testing sets compared to the basic and RegionMix augmentation techniques. Moreover, the RegionMix augmentation outperformed the basic data augmentation on the testing set.

Generally, the best overall validation accuracy (100%) is achieved when using the RegionInpaint and Cutoff augmentation techniques together as well as when using RegionInpaint, Cutoff augmentation, and basic augmentation techniques. On the other hand, the best overall testing accuracy (96.8%) is achieved when using all augmentation techniques together (i.e., RegionInpaint, Cutoff, RegionMix and basic augmentations). Finally, it is notable to mention that when using RegionMix augmentation in training the network, the convergence is slower compared to other augmentation techniques because it explores new samples with different distributions in the data space.

4) Experiment IV - Comparison of the Proposed Work Results With Related Works

In this sub-section, our quantitative results are compared with other studies from the literature that works on the SPMRI dataset. Most studies have only considered extending the SPMRI dataset using the traditional augmentation techniques. Moreover, all these studies evaluated their experimental models on the validation set that belongs to the same SPMRI dataset according to their train/test split.

As depicted in Table 8, our results on the validation set (i.e., when using VGG19 along with Cutoff, RegionInpaint, and basic augmentation techniques) outperforms the remaining studies which demonstrates the efficiency and robustness of the proposed work. Table 8 also provides other details needed for comparison including the model used, applied augmentation techniques, and the overall accuracy.

TABLE 8 Comparison of Brain Tumor Classification Results With Related Works on SPRMI Dataset
Table 8- 
Comparison of Brain Tumor Classification Results With Related Works on SPRMI Dataset

SECTION V.

Conclusion

In this study, we investigate the impact of training a deep CNN network using a small dataset of MRI brain tumor images and how this adversely affects the generalization of the model. To address this issue, we introduce several novel augmentation techniques named; RegionInpaint augmentation, Cutoff augmentation, and RegionMix augmentation. Traditional augmentation methods are also used in addition to the proposed ones. By using the proposed augmentation techniques to generate synthetic samples, the performance of the used classifier improved significantly.

The full proposed approach can be described in the following manner. First, the brain area is cropped from the input MRI images to remove irrelevant backgrounds. Segmentation is then applied to generate their corresponding segmentation masks that highlight the tumor area. The aim of applying segmentation in our study is that the proposed augmentation techniques depend mainly on segmentation. Thus, segmentation is a crucial step in this study to generate the segmentation masks, and accordingly the proposed augmentation techniques can be applied. A U-Net like architecture called VGGUNET is used, which take advantage of using a pre-trained VGG16 network instead of the default encoder of U-Net. It achieved 85.82% and 82% Dice coefficients on the validation and testing sets respectively. In addition, it outperformed the other segmentation models used, including U-Net, SegNet, and ResUNet. Thereafter, VGGUNET is used to obtain the segmentation masks for the input training MRI images.

Finally, different pre-trained classifiers are used including VGG16, VGG19, DensNet121, ResNet50, IncepitonV3, Xception, and MobileNetV2. They are trained and validated on the train split (80%) and validation split (20%) of the SPMRI dataset respectively and tested on Br35H dataset. VGG19 achieved the best results among them and thus, it is furtherly selected to be used along with the proposed augmentation techniques. Initially, each augmentation technique is applied individually to observe its effect on the classification model. Afterwards, more augmentation techniques are used together to generate more diverse samples. The best validation accuracy obtained is 100% when using the Cutoff and RegionInpaint augmentation techniques while the best testing accuracy achieved is 96.88% when using all the augmentation techniques together. The results obtained reveal that the proposed augmentation techniques guarantee a well-generalized model with superior performance, surpassing other studies that have applied other common augmentation techniques.

Although our proposed augmentation techniques along with the used models achieved promising results, there is still a number of challenges and future works to consider. First, we aim to apply the novel proposed augmentation techniques to other applicable medical datasets. In addition, we will investigate to experiment more deep learning models in the segmentation and classification steps with reduced time complexity to ease their deployment in a real-time environment. Moreover, one limitation of the proposed RegionInpaint augmentation is that we can only generate limited number of images according to the number of samples in the class that we are interested to remove the region of interest from (i.e., Tumor area in our case). Another constraint of this study is that the proposed augmentation techniques depend mainly on the segmentation step. Thus, if the annotated segmentation masks are not available for a specific task, the proposed augmentation techniques cannot be applied.

References

References is not available for this document.