Brain Tumour Image Segmentation Using Deep Networks

Automated segmentation of brain tumour from multimodal MR images is pivotal for the analysis and monitoring of disease progression. As gliomas are malignant and heterogeneous, efficient and accurate segmentation techniques are used for the successful delineation of tumours into intra-tumoural classes. Deep learning algorithms outperform on tasks of semantic segmentation as opposed to the more conventional, context-based computer vision approaches. Extensively used for biomedical image segmentation, Convolutional Neural Networks have significantly improved the state-of-the-art accuracy on the task of brain tumour segmentation. In this paper, we propose an ensemble of two segmentation networks: a 3D CNN and a U-Net, in a significant yet straightforward combinative technique that results in better and accurate predictions. Both models were trained separately on the BraTS-19 challenge dataset and evaluated to yield segmentation maps which considerably differed from each other in terms of segmented tumour sub-regions and were ensembled variably to achieve the final prediction. The suggested ensemble achieved dice scores of 0.750, 0.906 and 0.846 for enhancing tumour, whole tumour, and tumour core, respectively, on the validation set, performing favourably in comparison to the state-of-the-art architectures currently available.


I. INTRODUCTION
Accurate segmentation of tumours through medical images is of particular importance as it provides information essential for analysis and diagnosis of cancer as well as for mapping out treatment options and monitoring the progression of the disease. Brain tumours are one of the fatal cancers worldwide and are categorised, depending upon their origin, into primary and secondary tumour types [1]. The most common histological form of primary brain cancer is the glioma, which originates from the brain glial cells [2] and attributes towards 80% of all malignant brain tumours [3]. Gliomas can be of the slow-progressing low-grade (LGG) subtype with a better patient prognosis or are the more aggressive and infiltrative high-grade glioma (HGG) or glioblastoma, which require immediate treatment [4]. These tumours are associated with substantial morbidity, where the median survival for a patient with glioblastoma is only about 14 months with a 5-year survival rate near zero despite maximal surgical and medical The associate editor coordinating the review of this manuscript and approving it for publication was Nilanjan Dey. therapy [5]. A timely diagnosis, therefore, becomes imperative for effective treatment of the patients.
Magnetic Resonance Imaging (MRI) is a preferred technique widely employed by radiologists for the evaluation and assessment of brain tumours [1]. It provides several complimentary 3D MRI modalities acquired based on the degree of excitation and repetition times, i.e. T1-weighted, post-contrast T1-weighted (T1ce), T2-weighted and Fluid-Attenuated Inversion Recovery (FLAIR). The highlighted subregions of the tumour across different intensities of these sequences [6], such as the whole tumour (the entire tumour inclusive of infiltrative oedema), is more prominent in FLAIR and T2 modalities. In contrast, T1 and T1ce images show the tumour core exclusive of peritumoural oedema [7]. It allows for the combinative use of these scans and the complementary information they deliver towards the detection of different tumour subregions.
The Multimodal Brain Tumour Segmentation Challenge (BraTS) is a platform to evaluate the development of machine learning models for the task of tumour segmentation, by facilitating the participants with an extensive dataset VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of 3D MRI images of the gliomas (both LGG and HGG) and associated ground truths annotated by expert physicians. The provided multimodal scans are used for both training and validating the neural networks designed for the particular segmentation task [6], [8]- [11]. Manually delineating brain tumour subregions from MRI scans is a subjective task, and therefore it is time-consuming and prone to variability [12]. Automated segmentation of gliomas from multimodal MRI images can consequently assist the physicians to speed-up diagnosis and surgical planning as well as provide an accurate, reproducible solution for further tumour analysis and monitoring [13], [14]. The classical methods of automated brain tumour segmentation rely on feature engineering, which involves the extraction of handcrafted features from input images with follow up training of classifier [11], [15]. Unsupervised learning algorithms bypass the complexity in designing and selecting features by automatically learning a hierarchy of feature representations [16]- [19], with deep learning models excelling at the task [11]. Convolutional Neural Networks (CNNs) is regarded as the state of the art methods for brain tumour image segmentation as they learn the most useful and relevant features automatically [6].
However, accurate segmentation of tumour remains a challenge; due to heterogeneity in terms of shape, size, and appearance of the gliomas as well as ambiguous and fuzzy boundary existing between cancer and brain tissue [20]. The intensity variability of the MRI data further adds to this difficulty [13]. Therefore, it is still open to improvement, allowing further exploration for better segmentation techniques and accuracy.
In this work, we utilise multiple 3D CNN models for brain tumour segmentation from multimodal MRI scans and ensemble their probability maps for more stable predictions. The networks are trained separately, with hyperparameters optimised for each model, on the training dataset acquired from the 2019 Brain Tumour Segmentation (BraTS) challenge. A rigorous evaluation on the BraTS validation set resulted with the proposed ensemble achieving dice scores of 0.750, 0.906 and 0.846 for enhancing tumour, whole tumour, and tumour core, respectively.

II. LITERATURE REVIEW
Numerous research studies highlight the importance of machine learning (ML) to facilitate and improve the efficiency of human practices. From combining ML with ubiquitous computing [21] to employing it for foreign object detection [22], many techniques have emerged to automate otherwise challenging tasks. Pervasive as gliomas have become, it is imperative that they are monitored carefully and operated on, depending on the prognosis. Many ML algorithms can accurately segment the cancer regions and assist the neuroradiologists in disease monitoring and planning.
The data used for these techniques must illuminate the variable characteristics of the gliomas, from the tumour infiltrative growth patterns to their heterogeneity [23], to attain considerable accuracy during segmentation. A study demonstrates the use of multimodal MRI data in a tissue type mapping protocol that serves to identify the grade as well as acquire spatial information of the tumour [24]. Multi-sequence MRI data is also provided by the BraTS challenge, containing both HGG and LGG scans of multiinstitute patients, to facilitate users for devising successful glioma delineation techniques [9]- [11].

A. MACHINE LEARNING TECHNIQUES
Supervised learning techniques with discriminative classifiers have been used for accurate delineation of gliomas, of which the most successful are random forests (RF) and support vector machines (SVM). Soltaninejad et al. [25] initially devised an approach to classify brain tumours grades using superpixels generated through bi-modal MRI data of patients, particularly by using FLAIR and T2-weighted MR data. The mean intensity of the superpixels was utilised to obtain the region of interest (ROI) from which the 1st and 2nd order feature representations were extracted and passed onto the SVM classifier to delineate and differentiate between tumour grades. They continued down this avenue of research, and worked further with superpixels, acquired using mono-modal MRI data of patients [26]. After their segmentation from the FLAIR-MRI, statistical and textural features were extracted from these voxel-wise class labels, which were then fed into the extremely randomised trees (ERT) as well as the SVM classifier to ascertain whether the voxels represented healthy or tumoural brain regions. The method performed well on BraTS 2012 dataset, with the classification results compared to show that ERT works marginally better than SVM on detection and segmentation of the tumour grades.
Expounding on their earlier work, Soltaninejad et al. [27] employed multi-sequence MRI images, along with diffusion tensor imaging (DTI) data, to obtain 3D superxovels which provide clear tumour boundaries across the image modalities. The extracted texton and intensity-based statistical features were given to the RF classifier to classify the voxels. Inclusion of DTI components (isotropic (p) and anisotropic (q)) with the conventional MRI data resulted in considerable improvement of classification results. The method performed well and provided expert segmentations of the tumours when tested on the BraTS 2013 dataset. However, they are not the first to have utilised DTI for refined tumour segmentation. Jones et al. [28] suggested the use of diffusion characteristics to semi-automatically segment lesions from volumetric MRI data in a method termed as diffusion segmentation (D-SEG). After appropriating the voxels in the (p, q) space into clusters through k-means clustering, the boundaries segregating the healthy brain tissue and tumour regions are made apparent and clear in the resulting tissue segments. This information is utilised to extract the volume of interests (VOIs) from which the D-SEG spectrum is calculated, representing the variable proportion of diffusion within the VOIs. The spectra are then classified through SVM to achieve considerable classification accuracy.

B. DEEP LEARNING ARCHITECTURES
Deep learning algorithms outperform on tasks of semantic segmentation as opposed to the more conventional, context-based computer vision approaches [29]. Extensively used for biomedical image segmentation, the Deep Convolutional Neural Networks have carved out a niche for achieving the state of the art accuracy on the task of brain tumour segmentation [30]- [35].
A 2D U-Net architecture was put forth for the automated segmentation of brain tumour [36]. For increased network efficiency, various data augmentation techniques were applied along with the soft dice loss function to mitigate the class imbalance issue in the data. Fidon et al. [37] refined a neural network previously used for the task of brain parcellation and adapted it for multimodal MRI data input. ScaleNet made use of a merging operation in place of concatenation to connect the frontend and backend of the network, thereby allowing it to be scalable and generalised. Le et al. [38] designed an architecture which combined the standard variational level set (VLS) with a fully convolutional network (FCN). The new model referred to as the deep recurrent level set (DRLS), performed well in segmenting the tumour in comparison to the other models of the time, improving the otherwise rudimentary VLS into a deep learnable framework. Qin et al. [39] introduced the autofocus layer, which enhanced the multi-scale processing of network and learned through an attention mechanism to select the optimal scale for object identification in medical images. The dilated convolution layer improved the interpretability and representation capacity of the network leading to improved tumour segmentation.
A fully convolutional network (FCN) was suggested by Shen et al. [40], trained to learn boundary and region tasks, and successfully extracted contextual information from MRI scans with considerably low computation cost. Working with a similar architecture, Pereira et al. [41] set forth an FCN which captured more sophisticated features through feature recombination and also introduced a recalibration block in the structure. Zhou et al. [42] proposed a multi-task CNN, which integrated and trained on the different tasks of brain tumour segmentation in terms of their correlation and simplified the inference process through a one-pass computational scheme. Ji et al. [43] proposed a weakly-supervised U-Net that employed a scribble-based approach. They initially trained the network on whole tumour scribbles before exposing it to global labels for accurate substructure segmentation. Another network is trained on the results of the previously trained U-Net to segment the enhancing tumour and tumour core. Xu et al. [44] introduced this 3D deep cascaded attention network (DCAN), which is more straightforward in complexity compared to other cascaded models. It dealt with the multi-class segmentation task through separate branches and a shared feature extractor between them. It extracted the correlational information between the sub-regions through a cascaded attention method for guidance.
Myronenko [45] ranked first among the top submissions of the BraTS 2018 challenge with their encoder-decoder based CNN architecture. It augmented a variational autoencoder (VAE) for regularisation, allowing the reconstruction of original input images. During training, they used a crop size of 160 × 192 × 128 and a batch size of 1, with no additional training dataset employed. The method proposed by Isensee et al. [46] placed second in the same challenge with minor alterations made to the original U-Net architecture. The 3D U-Net, or the no-new-Net (nnU-Net) as named by the authors, replaced ReLU activation functions with leaky ReLU and instance normalisation with batch normalisation. The training performed with an image patch size of 128 × 128 × 128 and batch size of 2. The same architecture, trained from scratch with changed hyperparameters, is expanded and used as part of our ensemble as well. Working with a U-Net like structure, McKinley et al. [47] incorporated dilated convolutions into the DenseNet architecture and trained with a newly formulated label certainty loss function. The tensor fed into the network was of the dimensions 2 × 4 × 5 × 192 × 192, with the batch size of 2. Another noteworthy model is ensemble proposed by Zhou et al. [48] which consisted of various improved CNN architectures (previously used by them as mentioned above) trained to learn contextual information that served to produce robust predictions.
In this study, we propose an ensemble of two networks; a 3D CNN and a U-Net, in a different yet straightforward combinative technique that results in better and accurate predictions in comparison to uniform weighting. The task is to develop an automated brain tumour segmentation method, for successful delineation of tumours into intra-tumoural classes with improved efficiency and accuracy in comparison to existing methods. Our proposed model shows comparable, and in some cases, improved results to the state-of-the-art models.

A. DATASET
We use the 2019 Brain Tumour Segmentation Challenge (BraTS) dataset [6], with the training set employed to train the models and the validation set for the evaluation of the proposed ensemble. The training set consists of 259 high-grade glioma and 76 low-grade glioma patients with expertly annotated ground truths. In contrast, the validation set includes 125 cases of unknown grade (the labels are not made available to the public) [8]- [11].
The multi-institutional dataset, acquired from 19 different contributors, contains multimodal MRI scans of each patient, namely T1, T1 contrast-enhanced (T1ce), T2-weighted (T2), and Fluid Attenuated Inversion Recovery (FLAIR), from which the tumoural subregions are segmented. The data is processed to overcome discrepancies such that they are skull-stripped, aligned to match an anatomical template, and resampled at a resolution of 1mm 3 . Each sequence has  A schematic visualization of the 3D CNN architecture, where g represents the convolutional channels that are split into groups to reduce feature map connectivity. The multi-fiber (MF) blocks makes use of a multiplexer allowing for the flow of information between groups. Each dilated multi-fiber (DMF) block is a dilated convolutional unit, with adaptive weighting, which serves to capture spatial information of the tumour. a volume (dimension) of 240 × 240 × 155. Example images from the training set, as well as the corresponding ground truth, are shown in Figure 1. The manual ground truths (inclusive in the training set) highlight the three tumour regions: the peritumoural oedema, the enhancing tumour, and the necrotic and non-enhancing core.
It is worth mentioning that we did not use any external dataset in our experiments. Additionally, access to the BraTS-19 test set is limited to the challenge participants only. Therefore, we report test results on the BraTS-19 validation set. We first report the segmentation results of the proposed network on the validation set and later compare it to the existing state of the art architectures.

B. METHODOLOGY
Ensembling is often adapted for the task of brain tumour segmentation and has the advantage of improving both results and performance [47]- [49]. We propose a lightweight ensemble consisting of as few as two networks, each selectively trained on the training set. The outputs of these networks are segmentation map that differs in terms of segmented tumour sub-regions. The segmentation maps are then combined to get the final prediction. In the following sections, we provide further details on these two networks.

1) NETWORK 1 (3D CNN)
The first model used in the ensemble is a 3D CNN, initially developed by Chen et al. [50]. It uses a multifiber unit (an array of 3D CNN, Figure 2) with weighted dilated convolutions to glean feature representation at multi-scale for volumetric segmentation. The network showed good results on the BraTS 2018 Challenge. Extending on their work, we finetune the model for improved segmentation.

a: PRE-PROCESSING
The data is augmented using a multitude of techniques (cropping, rotation, mirroring) before feeding it into the network for training.

b: TRAINING
We trained the model for 150 epochs with a patch size of 128 × 128 and modified loss function, combining the generalised dice loss and the focal loss. The fine-tuned hyperparameters are shown in Table 1.

c: INFERENCE
We applied zero-padding to the MRI data so that the original 240 × 240 × 155 voxels are converted to 240 × 240 × 160, a depth which is divisible by the network. Once the data is ready for the inference, we pass it through the trained network to generate probability maps. The ensemble subsequently uses these maps for final prediction.

2) NETWORK 2 (3D U-NET)
The second model of our ensemble is a 3D U-Net variant which is different from the classical U-Net architecture; ReLU activation function is replaced by leaky ReLUs and the use of instance normalisation in place of batch normalisation [37]. The network has shown comparable results on the medical segmentation benchmark, Medical Segmentation Decathlon, and BraTS 2018 Challenge. The model is trained from scratch on our dataset while having the same architecture ( Figure 3) as reported in [37].

a: PRE-PROCESSING
We crop the data to reduce the size of the MRI slice. Afterwards, we resample the images along with median voxel spaces of the otherwise heterogeneous data followed by a z-score normalisation.

b: TRAINING
For training the network, we use the input patch size of 128 × 128 × 128 voxels and batch size of 2. Different data augmentation techniques (rotation, mirroring and gamma correction) are applied on the data during runtime to circumvent overfitting and to enhance the segmentation accuracy of the model. The loss function combines the binary cross-entropy and the dice-table 2 details the hyperparameters during training.

c: INFERENCE
Inference is a patch-based where all the patches overlap by half their size and the voxels near the centre have a higher weight attributed to them. Mirroring along the patch axes serves as additional data augmentation during the test time. The outputs are probability maps for the ensemble.

3) ENSEMBLING
The ensemble is not built by simple averaging of the predictions (probability maps) generated by the two models. We merge the outputs of the two models after rigorously VOLUME 8, 2020 FIGURE 4. A general representation of the ensembling technique used to generate the ensemble predictions. The 3D CNN (mentioned as N 1 ) more accurately segments the enhancing tumour (ET), while the 3D U-Net (mentioned as N 2 ) performs better for the tumour core (TC), therefore, the respective models' segmentation for that particular subregion are used in the final prediction (P f ) of the ensemble. For the whole tumour (WT), both models contribute equally towards to the output. testing a strategy termed as variable ensembling (illustrated in Figure 4).
We separately test these trained networks on the validation set to obtain corresponding segmentation images. These predictions from the individual models are evaluated on the online BraTS server 1 independently to determine their efficiency in segmenting the tumour regions successfully. We then compare the dice scores of the two models to identify which network is more accurate, and outperforms the other, for any specific tumour region. Qualitative and quantitative (dice scores) results demonstrate that CNN performs better for segmenting the enhancing tumour. At the same time, the U-Net is more accurate for segmenting tumour core. 1 https://ipp.cbica.upenn.edu However, in case of the whole tumour, combining the predictions from both networks (equally) outperforms the segmentation results independently. Therefore, to generate the final ensemble predictions for three regions; (1) tumour core, we used only U-Net's output (2) enhancing tumour, we used only CNN's output (3) the whole tumour, we equally weighed the output of both networks. The predictions were evaluated on the online server to obtain the dice scores for the ensemble. We discuss these results in more detail in the next section.

IV. RESULTS
Here we present results from an ensemble of 2 networks, variants of a U-Net and a CNN, both selectively trained on the BraTS 2019 training set (n = 335) and tested on the provided BraTS 2019 validation set (n = 125). We then intelligently combine the segmentation maps from these models to give a final prediction for tumour tissue type instead of simple averaging. The dice scores achieved by the ensemble (proposed) are 0.750 for enhancing tumour, 0.906 for the whole tumour, and 0.846 for tumour core.In Figure 5, we show the segmentation results of a single patient overlaid on the MRI Flair.
The segmentation maps are generated from both models separately, and then the final merged output is shown. The dice score for the patient was 0.930, 0.949 and 0.927 for enhancing tumour, whole tumour, and tumour core, respectively.
We further analysed different ensemble techniques (as shown in Table 3) to determine if there is any difference between the methods and which of the two results in the most accurate of segmentations. As depicted in Table 3, the proposed ensembling scheme gives better accuracy in comparison to simple averaging.

A. COMPARISON WITH CHALLENGE PARTICIPANTS
We evaluated the proposed ensemble on the BraTS 2019 validation set and later compared it to top ranking architectures on the challenge website. Table 4 shows comparative dice scores obtained through the online BraTS server. The ensemble (proposed) achieved dice scores of 0.750, 0.906 and 0.846 for enhancing tumour, whole tumour, and tumour core, respectively.
The cascaded U-Net employed by Jiang et al. [51] achieved the best scores of the challenge, to which our results compare favourably, with significant performance gap occurring in terms of the enhancing tumour. Our ensemble gives improved results for the tumour core than the DCNN used by Zhao et al. [52] and just falls short for the enhancing tumour with a minor performance gap. Similarly, it segments the tumour core with more accuracy as compared to CNN developed by McKinley et al. [53].  Table 5 shows the comparison with various state of the art methods (also validated on the BraTS 2019 dataset). Any of the other frameworks did not use additional data during training. Except for the enhancing tumour, the proposed ensemble results in better segmentations than the other available networks for both the whole tumour and tumour core, as evidenced by the dice scores.

B. RESULT COMPARISON WITH DIFFERENT FRAMEWORKS
The promising performance by our simple ensemble of a U-Net and CNN is indicative of its efficiency and potential usability to achieve comparable and often better segmentation accuracy than its contemporaries.

V. DISCUSSION
We propose an ensemble of a 3D U-Net and CNN for the task of brain tumour segmentation on multimodal MRI data. We combine the outputs of the two networks VOLUME 8, 2020 through variable ensembling to attain competitive classification accuracy on the BraTS 2019 validation set. Our proposed method performs favourably to state of the art methods by achieving mean dice scores of 0.750, 0.906 and 0.846 on enhancing tumour, whole tumour, and tumour core, respectively.
We experimented with a multitude of networks and their different combinations before deciding on the 3D U-Net and CNN. We also worked on different variants of CNN by changing the layers employed in the original architecture, but it did not result in improving the performance.
While our method performs favourably on the whole tumour and tumour core classes, the segmentation accuracy of the enhancing tumour needs improvement. Jiang et al. [51] implemented an interesting thresholding scheme in which if the enhancing tumour is less than the set threshold, the region is substituted with necrosis instead, which might cause a significant improvement in the accuracy of the enhancing tumour class.
Certain limitations still exist in the current work. Firstly, the proposed segmentation ensemble is only evaluated on the official validation set of the challenge. The soundness of the method can be validated further by testing on separate clinical MRI data, independent of the challenge. Secondly, we did not extensively pre-process the dataset and post-process the results. Many reported models prepare their imaging data through intensity normalisation [58], [59] and bias correction [60] schemes to minimise the variability in the data and make it analogous and comparable. Similarly, post-processing methods such as the use of conditional random fields [61] are shown to enhance segmentation accuracy. Nonetheless, the proposed ensemble exhibits efficient and robust tumour segmentation accuracies across multiple regions.
In future, we intend to add image processing (both pre-and post-processing) to the ensemble, along with further tuning of the hyperparameters.

VI. CONCLUSION
In this work, we have described an ensemble of two networks, both of which are individually used frequently on the task of biomedical image segmentation. The ensemble successfully generates highly accurate segmentation of brain tumours from the multimodal MRI scans as provided by the BraTS 2019 challenge, which compares favourably with predictions given from various other state of the art models. We use a method of variable ensembling to combine the respective outputs from the model to achieve the best scores. The proposed ensemble offers an automated and objective method of generating brain tumour segmentation to aid in disease planning and patient management clinically.