Automatic Pancreas Segmentation Using Double Adversarial Networks with Pyramidal Pooling Module

Owing to the irregular shape and high anatomical variability of the pancreas in abdominal CT images, pancreas segmentation is regarded as a challenging task. To address this issue, we propose an automatic segmentation model using double adversarial networks with a pyramidal pooling module. First, we introduce double adversarial networks that double-check whether the obtained segmentation results are similar to their ground truths owing to the special competing mechanism of adversarial learning, which contributes to the capturing of spatial information for segmentation and prompts the obtained samples to be more realistic, to improve the network segmentation performance. Second, we design a pyramidal pooling module to collect multi-level features and retain substantial information for segmentation in order to further boost the network performance. Finally, to assess the segmentation performance of our model, we use several indexes, namely the Dice similarity coefficient (DSC), Jaccard index, precision, and recall, as evaluation indicators. Experimental results show that the proposed model outperforms most existing pancreas segmentation methods.


I. INTRODUCTION
Automatic segmentation of the pancreas from abdominal CT images is a challenging task owing to the limited volume and variable shape of the pancreas in abdominal scans. Accurate pancreas segmentation is crucial for disease diagnosis and clinical treatment. With the development and success of deep learning, several methods [1][2][3][4][5][6][7][8][9] for pancreas segmentation have emerged in recent years, yielding notable results. However, compared with other bulky abdominal organs, the limited volume of the pancreas makes it difficult to segment it from CT images using only simple deep -learning-based methods. There is considerable scope as well as a growing demand to upgrade existing methods and further improve the segmentation performance. A new architecture proposed by Goodfellow et al. [10], namely the generative adversarial network, is widely used in the field of organ segmentation [11][12][13][14][15][16][17][18][19]. A generative adversarial network consists of two competing networks, i.e., a discriminator and a generator, where the generator attempts to produce samples to deceive the discriminator while the discriminator aims to distinguish the synthetic samples from the real images regardless of how similar they are. This special competing mechanism helps capture high-dimensional dataset distributions, thus preserving more useful information for segmentation. In this paper, we propose an automatic segmentation model using double adversarial networks and a pyramidal pooling module to segment the pancreas from abdominal CT images. The main contributions of this study can be summarized as follows. (1) We introduce double adversarial networks into a conventional segmentation model to improve the pancreas segmentation performance. The first involvement of adversarial learning helps preserve the spatial information that is beneficial to segmentation by capturing highdimensional data distributions and checks for nuances between the produced probability maps and the corresponding ground truths, thus prompting the obtained maps to resemble the original images. Accordingly, the second involvement of adversarial learning further accelerates the preservation of spatial information and causes the output segmentation results to be more realistic with respect to the standard volumes, i.e., the involved double adversarial networks repeat the above-mentioned process to further boost the segmentation network performance. (2) In addition, we include a novel pyramidal pooling module in the proposed architecture to help capture 2 contextual information and obtain greater clues for segmentation. More specifically, we employ three different pooling structures in the pyramidal pooling module and verify their performance. Finally, we select the optimal architecture to obtain a segmentation model with the optimal performance. (3) To the best of our knowledge, this is the first application of double adversarial networks with a pyramidal pooling module to pancreas segmentation.
The remainder of this paper is organized as follows. Section Ⅱ reviews related studies. Section Ⅲ elaborates on the primary principles and network architecture of the proposed model, and introduces the network implementation details. Sections Ⅳ and V present and discuss the experimental results, respectively. Finally, Section Ⅵ summarizes our findings.

II. RELATED WORK
With the development of deep learning, most approaches have been designed to deal with the challenging task of segmenting the pancreas from abdominal images [1][2][3][4][5][6][7][8][9]. Roth et al. [1] proposed an automated segmentation method that consists of two steps: pancreas localization and pancreas segmentation. First, holistically-nested convolutional networks are introduced to localize the pancreas in 3D CT scans, which helps refine the segmentation by providing a reliable 3D bounding box. Second, mid-level cues are integrated within the obtained bounding box to generate boundary-preserving pixel-wise class label maps for pancreas segmentation. Tian et al. [2] proposed a Markov chain Monte Carlo (MCMC)-guided convolutional neural network (CNN) that involves three steps for pancreas segmentation. First, registration is introduced to address the issues of body weight and location variability. Second, an MCMC method is proposed to guide the adaptive selection of 3D patches that are used in the training process of the CNN. Finally, the same MCMC method guides the pancreas segmentation process together with patch-wise predictions from a Bayesian voting scheme. Li et al. [3] proposed UDCGAN, a novel segmentation algorithm that introduces two-tier constraints into a conventional network through adversarial learning for pancreas segmentation. Wang et al. [4] proposed a fully 3D cascaded framework for pancreas segmentation, where a 3D detection network is used to regress locations of the pancreas regions and two cascaded 3D segmentation networks are used to segment the pancreas on the basis of the obtained detection results. Zhou et al. [5] proposed a fixed-point model for pancreas segmentation that employs predicted segmentation masks to shrink the input regions. This approach is based on the idea that a small region tends to result in high segmentation performance. Cai et al. [6] designed a deep convolutional sub-network whose outputs are connected to recurrent layers in the proposed recurrent neural network architecture and obtained further refined results for contextual learning. Then, they employed a novel Jaccard loss and trained the deep networks under the Jaccard index directly for accurate pancreas segmentation. Although the above-mentioned models have achieved notable results for pancreas segmentation, there remains considerable scope for improvement to address this challenging problem.
The novel generative adversarial network proposed by Goodfellow et al. [10] is a powerful structure that has been applied to several tasks, yielding significant results [20][21][22][23][24][25]. Moreover, various studies have applied generative adversarial networks to organ segmentation tasks [11][12][13][14][15][16][17][18][19][26][27][28][29][30][31][32][33][34][35]. Zhang et al. [11] developed a CNN-based adversarial multi-residual and multi-scale pooling MRFenhanced network for multi-organ segmentation from CT images as well as for accurate contour generation in pelvic CT images. Le et al. [12] proposed a network architecture in which a conditional generative adversarial network is used as the generative network and an FCN acts as the discriminative network for automated whole heart segmentation from CT images. Pang et al. [13] proposed CTumorGAN, a unified adversarial framework for automatic tumor segmentation from CT scans. It consists of a generator network and a discriminator network. The generator produces samples that are similar to the corresponding ground truths while the discriminator attempts to differentiate the synthetic volumes from the original images regardless of how similar they are. Brion et al. [14] adopted two strategies, namely adversarial networks and intensity-based data augmentation, to train neural networks for male pelvic organ segmentation from cone beam CT images. Wang et al. [26] proposed a semisymmetric structure based on a novel multi-level adversarial feature method to maintain the segmentation performance during domain adaptation, and experiments have shown that this model achieves state-of-the-art performance in meningioma segmentation. Conze et al. [27] introduced cascaded partially pre-trained convolutional encoder-decoders as generators into standard conditional generative adversarial networks in order to alleviate data scarcity limitations. End-to-end training of such networks is useful for simultaneous multi-level segmentation refinements using auto-context features. This model won the first prize in three competition categories, namely liver CT, liver MR, and multi-organ MR segmentation, in the Combined Healthy Abdominal Organ Segmentation (CHAOS) challenge held at the IEEE International Symposium on Biomedical Imaging in 2019. Gao et al. [28] proposed a novel two-stage deep neural network for smallorgan localization and segmentation sub-networks for automatically locating, ROI-pooling, and segmenting small organs while maintaining the network segmentation performance for large organs. Accordingly, they imposed an adversarial shape constraint on small organs to further ensure that the output segmentation results are similar to the standard ground truths. This framework outperformed stateof-the-art head and neck OAR segmentation methods on both a self-collected dataset and the MICCAI Head and Neck Auto Segmentation Challenge 2015 dataset. The above-mentioned results indicate that the generative adversarial network is an efficient tool for organ segmentation tasks.

A. DOUBLE ADVERSARIAL NETWORKS
Inspired by previous studies that applied generative adversarial networks to organ segmentation, we introduce adversarial learning into a conventional segmentation network (i.e., U-Net) to obtain an adversarial U-Net that is capable of prompting the obtained segmentation outputs to be more similar to their corresponding ground truths and thus achieving better segmentation results. The U-Net proposed by Ronneberger et al. [36] is used as the segmentation network in the proposed adversarial U-Net. To further boost the segmentation performance on the basis of the adversarial U-Net, we include adversarial learning in the upgraded segmentation network once again to obtain double adversarial networks. The proposed double adversarial networks help retain more information for segmentation and further prompt the output probability maps to resemble their corresponding ground truths in order to achieve a better segmentor. The network architecture is shown in Fig. 1. Specifically, in the double adversarial networks, the segmentation outputs from the last deconvolutional layer in the segmentation network and their corresponding ground truths are used as the input images for the discriminator D1; this segmentation network is defined as G1. Accordingly, the segmentation outputs from the penultimate deconvolutional layer in the segmentation network and their corresponding ground truths are used as the input images for the discriminator D2; this segmentation network is defined as G2. The two discriminators D1 and D2 have identical structures, which include five convolutional layers with 7, 5, 4, 4, and 4 kernels, and each layer is followed by a stride of 2. The loss functions of D1 and D2 are given by Eq. (1) and Eq. (2), respectively. D(x) represents the probability that the input samples for D are from the original images while D(G(z)) represents the probability that the input samples for D are the produced images obtained from the generator.
The loss functions from the original segmentation network of the corresponding generators G1 and G2 are given by Eq.
(3) and Eq. (4), while the additional constraints from the adversarial networks for G1 and G2 are given by Eq. (5) and Eq. (6). P represents the output probability maps from the segmentation network while T represents the corresponding ground truths.
The total loss function lg of the generator in the proposed model is given by Eq. (7). The constant terms α, β, θ, and δ are empirically set to 1, 0.1, 0.004, and 0.0004, respectively.

B. DOUBLE ADVERSARIAL NETWORKS WITH PYRAMIDAL POOLING MODULE
To further improve the segmentor performance, we introduce a pyramidal pooling module into the double adversarial networks to replace the original pooling layers in the U-Net [36]. Specifically, we build a pyramidal pooling block that consists of three parallel pooling layers to extract more features. The structure of this block is shown in Fig. 2. More specifically, each feature map input into the pyramidal pooling block is equally divided into 4, 16, and 64 blocks; then, these blocks are fused and used as the output of this pooling layer to obtain more information from different scales. Furthermore, we design three different frameworks for the proposed pyramidal pooling module to verify the effectiveness of the proposed pyramidal pooling block. First, we substitute the last pooling layer in the double adversarial networks with a pyramidal pooling block; this framework is defined as DAN_P1. Then, we substitute the last two pooling layers in the double adversarial networks with two pyramidal pooling blocks; this framework is defined as DAN_P2.
Finally, we substitute the last three pooling layers with three pyramidal pooling blocks; this framework is defined as DAN_P3.

C. IMPLEMENTATION DETAILS
The public Pancreas-CT dataset [37,38] collected by the National Institutes of Health (NIH) Clinical Center was used to evaluate the segmentation performance of the proposed model. According to the training protocol in [3], [39], [40], and [41], we split the dataset via four-fold cross-validation, where three parts were used as training samples and the remaining part was used for testing. Then, we resized the original images into [208,208] according to the label regions to ensure that the pancreas parts were unbroken in each slice. The Dice similarity coefficient (DSC), Jaccard index, precision, and recall were used as metrics to evaluate the segmentation performance of the proposed model. In the training phase, the batch size in the proposed network was set to 1. Adam was used as an optimizer with a learning rate of 0.0001 and momentum values of 0.9 and 0.99. Our experiments were developed in the PyTorch [42] environment on a Windows system with an NVIDIA GeForce GTX 10080Ti graphics card having 11 GB memory.

IV. EXPERIMENTAL DESIGN AND RESULTS
We conducted two sets of experiments to verify the availability of the proposed double adversarial networks and pyramidal pooling module. Then, we compared our model with state-of-the-art pancreas segmentation methods.

A. DOUBLE ADVERSARIAL NETWORKS
First, we compared three segmentation models, namely U-Net, adversarial U-Net, and double adversarial networks, on the NIH pancreas dataset. Fig. 3 shows the boxplots of the DSC, Jaccard index, precision, and recall for these three models. The visualization results of these models are shown in Fig. 4, and the corresponding numerical values are listed in Table 1.

B. DOUBLE ADVERSARIAL NETWORKS WITH PYRAMIDAL POOLING MODULE
Next, we designed three groups of structures with different numbers of pyramidal pooling blocks in the proposed pyramidal pooling module and compared the segmentation performances of these models to assess the effectiveness of the proposed pyramidal pooling block. Fig. 5 shows the boxplots of the double adversarial networks as well as the double adversarial networks with one, two, and three pyramidal pooling blocks. The corresponding segmentation results are shown in Fig. 6. The numerical values of the DSC, Jaccard index, precision, and recall for these four models are listed in Table 2.

C. COMPARISON WITH STATE-OF-THE-ART MODELS
To assess the validity of the proposed model, we consulted several papers in the literature on organ segmentation and compared the proposed model with state-of-the-art methods for pancreas segmentation. The specific comparison results are summarized in Table 3.

A. DOUBLE ADVERSARIAL NETWORKS
The average DSC scores of the U-Net, adversarial U-Net, and double adversarial networks were 77.37%, 80.83%, and 82.38%, respectively, as listed in Table 1. Thus, our double adversarial networks achieved improvements of 5.01% and 1.55% over the U-Net and adversarial U-Net, respectively, which demonstrates their effectiveness. As for the DSC shown in Fig. 3, the mean values of the boxplots for the double adversarial networks indicate an advantage over the U-Net and adversarial U-Net. Thus, our double adversarial networks possess more stable segmentation states compared to the other two models, which further verifies their availability. Similarly, the boxplots of the Jaccard index, precision, and recall for the double adversarial networks are all higher than those for the U-Net and adversarial U-Net, which confirms the effectiveness of the proposed double adversarial networks. As shown in Fig. 4, the outputs obtained from the U-Net tend to lose large chunks while the segmentation results from the double adversarial networks capture substantial details and achieve better visualization results compared to the U-Net and adversarial U-Net.

B. DOUBLE ADVERSARIAL NETWORKS WITH PYRAMIDAL POOLING MODULE
To further improve the segmentation performance, we introduced a pyramidal pooling module into the proposed double adversarial networks. Specifically, we designed three different structures that employ one, two, and three pyramidal pooling blocks, namely DAN_P1, DAN_P2, and DAN_P3, respectively. As shown in Table 2, the DSC score of DAN_P3 is higher than that of DAN, DAN_P1, and DAN_P2 by 0.93%, 0.66%, and 0.37%, respectively. Fig. 5 shows boxplots of the DSC, Jaccard index, precision, and recall for DAN, DAN_P1, DAN_P2, and DAN_P3. It is obvious that the obtained mean scores of the DSC, Jaccard index, precision, and recall for DAN_P3 are the best among these four models, which indicates that DAN_P3 is the optimal segmentor with the most stable performance. As shown in Fig. 6, the segmentation results obtained from DAN_P3 present more smooth outlines in small locations, which indicates that this model can capture more information for segmentation than the other models.

C. COMPARISON WITH STATE-OF-THE-ART MODELS
As can be seen from Table 3, the DSC, precision, and recall achieved by the MCMC -guided CNN [2] are 5.18%, 9.45%, and 0.65% lower than those achieved by our model. Furthermore, the DSC and Jaccard index achieved by the automatic pancreas segmentor with morphological and multilevel geometrical descriptor analysis proposed by Asaturyan et al. [9] are 4.01% and 5.66% lower than those achieved by our method, while those achieved by the recurrent neural network architecture proposed in [6] are 0.91% and 1.16% lower than those achieved by our method. Meanwhile, the DSC, Jaccard index, and precision of the UDCGAN proposed in [3] are 0.25%, 0.35%, and 0.89% lower than those of our model. In addition, the DSC of our model is 2.04% higher than that of the holistically-nested convolutional neural network proposed by Roth et al. [1], 0.94% higher than that of the fixed-point model proposed by Zhou et al. [5], 0.84% higher than that of the iterative 3D feature enhancement network proposed by Mo et al. [43], and 0.2% higher than that of the global context and boundary structure-guided network proposed by Guo et al. [8]. The automatic pancreas segmentor with a combination of coarse locations and ensemble learning [44] achieved a DSC score of 84.10%, which is 0.79% higher than that of our model, while its optimal precision score of 83.60% is 0.49% lower than that of our model (84.09%). The above-mentioned instances indicate that the proposed double adversarial networks with a pyramidal pooling module are efficient and promising models for pancreas segmentation.
Although our model has achieved satisfactory results, it still has some limitations. As can be seen from Table 3, the deep learning framework with multi-atlas registration and 3D level-set [39] outperforms our model by 1.16% in terms of the DSC. Further, the end-to-end model with lightweight DCNN modules and spatial prior propagation proposed by Zhang et al. [40] outperforms our model by 2.25%. Future studies should attempt to adopt better solutions to upgrade the network architecture in order to collect more information and thus improve the segmentation performance of our model.

VI. CONCLUSION
We proposed an automatic segmentation model using double adversarial networks with a pyramidal pooling module for pancreas segmentation. The double adversarial networks prompt the probability maps obtained from the segmentation network to resemble the corresponding ground truths in order to improve the network segmentation performance. Then, the pyramidal pooling module contributes toward the preservation of the contextual information and further boosts the segmentation ability of the proposed model. In summary, the proposed automatic segmentation model outperforms most pancreas segmentation methods, which demonstrates its potential for pancreas segmentation.