Multi-Model Medical Image Segmentation Using Multi-Stage Generative Adversarial Network

—Image segmentation is a new challenge problem in medical application. The use of medical imaging has become an integral part of research, as it allows us to see inside the human body without surgical intervention. Many researcher have studied brain segmentation. One stage method is used to segment the brain tissues. In this paper, we proposed the multi-stage generative adversarial network to solve the problem of information loss in the one-stage. We utilize the coarse-to-fine to improve brain segmentation using multi-stage generative adversarial networks ( GAN ). In the first stage, our model generated a coarse outline for (i) background and (ii) brain tissues. Then, in the second stage, the model generated outline for (i) white matter ( WM ), (ii) gray matter ( GM ) and (iii) cerebrospinal fluid ( CSF ). A good result can be achieved by fusing the coarse outline and refine outline. We conclude that our model is more efficient and accurate in practice for both infant and adult brain segmentation. Moreover, we observe that multi-stage model is faster than prior models. To be more specific, the main goal of multi-stage model is to see the performance of the model in a few shot learning case where a few labeled data are available. For medical image, this proposed model can work in a wide range of image segmentation where the convolution neural networks and one-stage methods have failed.


I. INTRODUCTION
Magnetic resonance imaging (M RI) uses magnetic field to generate detailed images of tissues without using harmful radiation [1] [2].The process of manual segmentation in clinical is time consuming and expensive [3].
Automated segmentation of infant and adult brain has received a substantial research attention [4] [5].Training deep models need for large sets of labeled images [6].Due to the small data sets in the medical application [7] [8], semi supervisor learning approaches solved this problem by using unlabeled image [9] [10].A good segmentation result can be achieved by adopting unable images [11], or images with weak annotation like image level tag [12].In the object detection, one-stage is used to predict the class probability and the position information [13] [14].To take advantage of the recent success of two-stage method, many models proposed for semantic segmentation.Xiaohao et al. proposed a two-Stage image segmentation method using a convex variant of The authors are with the School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.E-mail: afifakhaied@tju.edu.cn,asonhan@hust.edu.cn the Mumford-Shah model and thresholding.In the computer vision tasks, two-stage method use to generate global information in the first stage and local information in the second stage [15] [16].A good result can be achieved by fusing the global information and local information [17] [18].
In this paper, we utilize the coarse-to-fine to improve brain segmentation using multi-stage generative adversarial networks.The main contributions of this paper as follows: 1-In the first stage, our model generated a coarse outline for (i) background, (ii) brain tissues.The main role of the first G is to generate coarse segmentation to be used as guidance information for the third G.
2-In the second generator, the generator takes two input, image x and random vector z.The main idea is to encourage the generator to generate as many different values for each x as there are values of z.
3-The third generator consists of the encoder and decoder.In the encoder and decoder we also used the dense skip connection to combine the features from different scales.The third generator generated outline for (i) white matter (W M ), (ii) gray matter (GM ) and (iii) cerebrospinal fluid (CSF ).Similar with the process of human learning in clinical practice.To be more specific, the main role of the third G is to generate more detailed results by using the corase segmentation from the first G.
We evaluate the proposed multi-stage generative adversarial model on two datasets of brain tissues, including infant and adult brain.Empirically, our model achieves a good result compare with the state-of-the-art models.
The rest of this paper is organized as follows.Section 2 presents prior studies related to brain segmentation.Section 3 presents the methodologies used in our paper.Section 4 presents our experimental results.Section 5 discusses threats to the validity of our results.Finally, Section 6 concludes the conclusion and discusses directions for future work.

II. RELATED WORK
The following subsections present the prior studies and technique related to brain segmentation.We start by giving a more detailed description of semi supervised learning.In the subsection B, we introduced generative adversarial network.Then, we show how developed loss functions for GAN improved the stability of training GAN model.To be more specific, the idea in the semantic segmentation is to take a image and output a segmentation map [16] [24].Then creating the target by one-hot encoding the class [25].Fig. 1 and Fig. 2 show the semantic segmentation labeles and semantic segmentation classes respectively.
Up to date, many researchers applied Semantic segmentation for brain, and in particular segmentation of brain tissues.[26] proposed ROAM, a random layer mix up, which allows the network to be less confident for interpolated data points at any selected space.[27] proposed two novel architecture for brain tumor segmentation.Their results have been evaluated using the pinnacle BraTS confront2017 datasets.Similar to the above model, [28] proposed rethinking atrous convolution for semantic image.Different from the above model, rethinking atrous convolution model targets long range context.The model does not requires convolution layers.Instead, it utilizes s atrous convolution with upsampled filters to extract dense feature maps.The model evaluate on the PASCAL VOC 2012 semantic image segmentation benchmark including the 3475 finely annotated images and the extra 20000 coarsely annotated images.Their experimental results of the sentiment task show that atrous convolution is necessary when building more blocks cascadedly.The authors also show that the performance improves as more blocks are added.CycleGAN Chuquicusma et al. [37] DCGAN Frid-Adar et al. [38] DCGAN/ACGAN

B. Generative Adversarial Network
Generative adversarial network has shown great promise for medical image diagnostics [29].To be more specific, in brain segmentation [25] [21].Fig. 3 shows an overview of generative adversarial networks.
Up to date, many researchers applied generative adversarial network for brain segmentation.[30] proposed a 3D volumeto-volume generative adversarial network (GAN ) for segmentation of brian tumors.Their model achieved a good result when the generator loss is weighted 5 times higher than the discriminator loss.The proposed model has been evaluated on the BraTS 2018 datasets.Their models outperformed previous models with an overall 0.66%.A Super resolution and segmentation using generative adversarial networks is a framework introduced by [31] to neonatal brain M RI.It consists training a generating network that estimates for a given input image to its corresponding HR, and a discriminator network D is designed to distinguish real HR and segmentation images.In Table I, we provide some models of generative adversarial network applied to medical applications.

C. loss functions
To improve the stability of training GAN model, many researchers have developed the loss function [39] [40].Due to the effectiveness of the loss function for a given model implemention, in this section we summarize five loss functions for GAN .

1) Minimax GAN loss:
The discriminator tries to maximize the loss functions and the generator tries to minimize it.
Generator loss function: Discriminator loss function: In this function: D(x) denotes to the discriminator's estimate of the probability that real data x is real.E(x) denotes to the expected value over all real data.G(z) denotes to the generator's output when given noise z.D(G(z)) denotes to the discriminator's estimate of the probability that a fake data is real.E(z) denotes to the expected value over all generated fake data G(z).Non-saturating loss is used to solve the saturation problem.
Generator loss function: (3) Discriminator loss function: 3) Wassersstein loss (WGAN): Generative adversarial networks (GANs) are useful in the area of computer vision [41] [42], but the main problem is suffering from training instability [28].Many researchers have developed loss functions toward stable training of GAN [35].Wassersstein (W GAN ) achieves a good progress for training stability of GAN , but still suffer from poor result.Many researchers have argu that the problems of poor result due to the use of weight clipping.Hence, [43] proposed a way to clipping weights.This model is modification of the standard GAN .The output of the discriminator is a number.The discriminator training tries to make the output bigger for real data than for fake data.
Discriminator Loss: Generator Loss: In these functions: D(x) denotes to the discriminator's output for a real data.G(z) denotes to the generator's output when given noise z.D(G(z)) denotes to the discriminator's output for a fake data.
To be more specific, the output of the discriminator does not have to be between 0 and 1. for more information we refer reader to read [43].

4) Least-squares loss (LSGAN):
This model proposed a − b coding scheme for the discriminator where a, b denote to the labels of fake and real data.
Discriminator Loss: Generator Loss: 5) Wassersstein Gradient Penalty loss (AC-GAN): AC-GAN uses the noise z and sample has class label c ∼ p.This model is modification of the standard GAN .In the standard GAN, X fake = G(Z), but in AC − GAN X fake =G(c, z).Moreover, the outputs of standard GAN is a probability distribution P (s, x) = D(x), but in AC − GAN the output is two probability distribution.Probability distribution over source P (s, x) and probability distribution over the class labels P (c, x).P (s, x), P (c, x) = D(x).Arnab et al. introduce a model that use generative adversarial network to brain segmentation.The authors use a dataset of 43 subjects.First, the authors generat fake images by using generator.Second, the authors use the labeled data, unlabled data and fake data to train the discriminator to distinguish between generated data and true data.While the encoder is used to compute the predicted noise mean and log-varaince.The approach proposed by Arnab et al. supports only onestage, while our model supports multi-stage.Our paper aims to solve the problem of loss informations in the one-stage.The first generator generated coarse outline to be use in the third generator.Moreover, the encoder and decoder generated fine outline.We also used dense skip connection to combine the features from different scales.To validate the idea of multistage, we used Dice coefficient.

III. METHODOLOGY
In this section, we present the design of multi-stage model.In the subsection A, a more detailed description of the generative adversarial network approach that is used in our work.In the subsection B, a more detailed description of the loss functions for the discriminator and the generator that is used in our work.Table II shows the list of defined symbols in the paper.image.D is trained to differential between true data x and generated data G(z).The core idea of a GAN is to play a two player min − max game. 4 shows an overview of our proposed network, which mainly consists of three stages: the three stages generator networks and the discriminator network.
In this work, the first generator G is mainly used to generate outline for the background and brain tissues from the input images.The second generator takes two input, image x and random vector z.The idea is to encourage the generator to generate as many different values for each x as there are values of z.To be more specific, training network with random vector z and image x encourage network to act better in the output.
While the third G is used to generate outline for (i) white matter (W M ), (ii) gray matter (GM ) and (iii) cerebrospinal fluid (CSF ).On the other hand, the discriminator is used to distinguish between true and generated data.The main role of the first G is to generate coarse segmentation to be used as guidance information for the third G.The main role of the third G is to generate more detailed results by using the coarse segmentation from the first G.The third G consists of the encoder and decoder.In the encoder and decoder we also used the dense skip connection to combine the features from different scales.Fig. 5 shows the network architecture of encoder decoder and the dense skip connection.Furthermore, the dense skip connection is used to combine the features from different layers and used to help each other.

B. Loss function
Discriminator loss function: The discriminator in our model has unlabeled data loss, labeled data loss and refined prediction loss.And the overall loss function is as follows: For labeled data, we use the same loss function in the standard segmentation network.[35] it was shown that by using l i,k+1 as substracted function, the softmax function changed as follows: Where To be more specific, the idea is to introduce unlabeled loss and fake loss, which have analogues to the two components of discriminator loss in the standard GAN and the labeled loss is the cross-entropy.for more information we refer reader to read [35].

Generator loss function:
The loss function of the generator as follows:

IV. EXPERIMENTS
The following subsections present our experimental design and evaluation.In the subsection A, we present the evaluations and discussions.We start by giving a more detailed description of the datasets.Then, we show the experimental setup of our work.Finally, we explain the Dice coefficient of the segmentation evaluation.

A. Evaluations and Discussions 1) Datasets:
In our work, we use two different datasets of brain images: the M ICCAIiSEG dataset and MRBrains dataset.We describe each of these datasets in the following.3) MRBrains Dataset: The MRBrainsdataset contains 20 subjects for adults for segmentation of (a) cortical gray matter, (b) basal ganglia, (c) white matter, (d) white matter lesions, (e) peripheral cerebrospinal fluid, (f) lateral ventricles, (g) cerebellum, and (h) brain stem on T 1, T 2, and FLAIR.Five (i.e., 2 male and 3 female) subjects are provided to the training set and 15 subjects are provided for the testing set.On the evaluation of the segmentation, these structures merged into gray matter (a−b), white matter (c−d), and cerebrospinal fluid (e−f ).The cerebellum and brainstem were excluded from the evaluation.

4) Experimental Setup:
The proposed model was performed on Python on a P C

B. Segmentation Evaluation 1) Dice Coefficient (DC):
To better demonstrate the significance of our model, we have used Dice Coefficient (DC) metric to evaluate our model.Dice Coefficient (DC) has been considered as a baseline (i.e., benchmark) to compare segmentation models in the literature to compare brain segmentation models.We use V ref for the reference segmentation and V auto for the automated segmentation.The DC is given by the following equation: where DC values are given in this range [0, 1]. 1 corresponding to the perfect overlap and 0 indicating the total mismatch.

C. Evaluating the hyper-parameters in multi-stage
To evaluate the effectiveness of our model, we evaluated different hyper parameters.The model has different hyper parameters, e.g., batch size, learning Rate, etc. Table III, Table IV and Table V show

V. RESULT AND DISCUSSION
To better demonstrate the significance of our model, We train and test multi-stage GAN model on two datasets of different ages (i.e., infants and adults).To train multi-stage GAN model, we used the 13 unlabeled of the testing dataset to train the GAN and for the 10 labeled subjects of the iSEG − 2017 dataset, we used two labeled subjects for training set, one labeled subject used for validation set and 7 labeled subjects used for testing set.Similarly, for the 5 labeled subjects of the M R Brains 2013 dataset, we used one labeled subject for training set, one labeled subject used   on GM segmentation, and 88% on W M segmentation.Such results are superior to the results obtained using the state-ofthe-art models.Therefore, we argue that our mode can perform better in a few-shot learning case.Fig. 7 shows Visualization results of our models on the subject used as a validation set.We observe that segmentation results obtained by multi-stage model is fairly close to the manual reference contour provided by the MICCAI iSEG organizers.

VI. THREATS TO VALIDITY
Threats to external validity are related to the generalizability of our results.In our work, we used the two datasets that belong to two organizers.The total number of the subjects in the two datasets are 43 subjects.One could argue that the datasets do not have enough samples.We mitigate such threat by using two datasets that (a) contain both infant and adult brain data and (b) were previously used by prior studies.Our model obtains higher performance than prior models.We believe that our model similar with the process of human learning in clinical practice.Moreover, we have only targeted three tissues in our work.However, our proposed model can be easily extended to more tissues segmentation as it does not require more labeled data.
The intuition behind this model is that multi-stage compares the perform in a few-shot learning case where a few labeled data are available.
Threats to internal validity are related to experimental errors and bias.Our model is constructed using data extracted from medical images in which contracts might be low.We use the small-size kernels, deconvolution layer (to upsample the input), PReLU, dropout and normalization methods to reduce the risk of overfitting.Hence, any potential deficiency in the data should deficiency all the implemented models.Nevertheless, our model obtains higher performance than prior models.

VII. CONCLUSION
In this study, we propose multi-stage generative adversarial networks(GAN ) model for brain segmentation supported by (i) generate a coarse outline for (i) background and (ii) brain tissues.Then, generate outline for (i) white matter (W M ), (ii) gray matter (GM ) and (iii) cerebrospinal fluid (CSF ).
Our results are evaluated by using the infant and adult data sets and found to be fairly close to the manual reference.In addition, we compare our model with three baseline stateof-the-art models and observe that our model achieves an improvement of up to 4%.In particular, we obtain dice coefficients that range between 88% and 95%.Such results indicate that the adoption of the multi-stage generative adversarial networks has significantly improved segmentation results.We argue that our model is more efficient and accurate in practice for both infant and adult brain segmentation.
Despite the promising results obtained from our proposed model, we believe that further improvements can be achieved in the future.We aim in the future to include more datasets in our study.Furthermore, we aim to expand our multi-stage model to investigate more number of brain tissues.Lastly, we would like to point out that, we will expand our model and look how multi-stage model can segment brain tissues.In addition, the adoption of multi-stage generative adversarial network in medical imaging is still in its infancy.

2 .
The aim of the evaluation framework introduced by the M ICCAIiSEG organizers is to compare segmentation models of W M , GM and CSF on T 1 and T 2. The MICCAI iSEG dataset contains 10 images, named subject-1 through subject-10, subject T 1 : T 1 -weighted image, subject T 2 : T 2weighted, and a manual segmentation label used as a training set.The dataset also contains 13 images, named subject-11 through subject-23, used as a testing set.An example of the M ICCAIiSEG dataset (T 1, T 2, and manual reference contour) is shown in Fig.6.On the other hand, TableIIshows the parameters used to generate T 1 and T 2. The dataset has two different times (i.e., longitudinal relaxation time and transverse relaxation time), which are used to generate T 1 and T The dataset has been interpolated, registered, and skullremoved by the M ICCAIiSEG organizers.We present the evaluation equations in subsection B.

TABLE I .
Some models of generative adversarial network applied to medical applications

TABLE II .
List of defined symbols in the paper

TABLE III .
The parameters used to generate T 1 and T 2.

TABLE IV .
Experiments on Training epoch obtained on the M RBRAIN S datasets.The best performance for each tissue class is highlighted in bold.

TABLE V .
Experiments on Learning Rate obtained on the M RBRAIN S datasets.The best performance for each tissue class is highlighted in bold.We find that the batch size of 30 is 95%, 94% and 92% for CSF GM W M respectively.A large training epochs caused the over fitting problem and a small training epochs caused under fitting problem.To mitigate this problem and validate whether the training epochs will be significantly impacted the network performance.Training epoch involving 20, 40, 60, 80 epoch is conducted.In the 80 epochs, we find that the network performance was good.We followed the same approach to choose the learning rate values.A large learning rate caused the parameters of network are updated quickly.A small learning rate caused the parameters are updated slowly.First, we randomly start with a learning rate is 1 × 10 1 .Second, we use multiple runs by changing the learning rate value.Finally, Our experimental results show that multi-stage model achieves a good result in the learning rate 1 × 10 − 4.

TABLE VI .
Experiments on batch size obtained on the M RBRAIN S datasets.The best performance for each tissue class is highlighted in bold.

TABLE VII .
Segmentation performance in Dice Coefficient (DC) obtained on the M ICCAIiSEG dataset.The best performance for each tissue class is highlighted in bold.

TABLE VIII .
Segmentation performance in Dice Coefficient (DC) obtained on the M RBrains dataset.The best performance for each tissue class is highlighted in bold.And 15 unlabeled of the testing dataset used to train the multi-stage GAN model.The main goal of multi-stage GAN model is to see the performance in a few-shot learning case.TableVIpresents the results of our model to segment CSF ,GM , and W M using the M ICCAIiSEG dataset.Our model obtains a DC values of 95% in CSF segmentation.The DC values obtained from segmenting CSF by state-of-the-art models range between 86% and 91%.In addition, our model obtains a DC values of 94% and 92% in segmenting GM and W M , respectively.The state-of-the-art models, on the other hand, obtain DC values in the ranges of 80%-93% for GM segmentation and 81%-90% for W M segmentation.Such results highlight the remarkable efficiency gained by using multi-stage GAN .Table VII compares the results obtained using the MRBrains dataset.We observe that our model achieves a DC value of 93% on CSF segmentation, 93%

TABLE IX .
Average execution time (in minutes) and standard deviation (SD) in the MRBrains dataset