A Novel Adaptive Weighted Loss Design in Adversarial Learning for Retinal Nerve Fiber Layer Defect Segmentation

Glaucoma is a chronic eye disease that can cause permanent visual loss and is difficult to detect early. Retinal nerve fiber layer defect (RNFLD) is clinical evidence for the diagnosis of glaucoma. Classical deep learning based methods can be used to segment RNFLD from fundus images. However, the segmentation results of these methods do not have the specific geometry of RNFLD, and the segmentation errors of fundus images with special styles are large. In this paper, we present a novel conditional adversarial shuffle U-shaped network (CASU-Net) to segment RNFLD, which consists of a generator and a discriminator. For the generator, a mixed loss is designed, which consists of an adaptive weighted segmentation loss and an adversarial loss. This adaptive weighted segmentation loss can balance the segmentation accuracy of the target and background region, and assign more attention to the hard samples, thus ensuring the consistent improvement of the segmentation accuracy of all fundus images. The adversarial loss not only helps to improve the pixel-wise segmentation accuracy but also makes the geometry of the RNFLD segmentation closer to the ground truth. In addition, in the generator, a shuffle module was designed to fully mine the information of all channels to improve the feature extraction capability of the model. The proposed CASU-Net is verified on a RNFLD dataset from Beijing Tongren Hospital. The experiments show that the CASU-Net achieves state-of-the-art results on this dataset.


I. INTRODUCTION
Glaucoma is the second causing blindness disease in the world, and it will cause irreversible visual loss to patients. By 2040, the number of glaucoma patients will reach 110 Million [1]. Patients usually have no obvious symptoms at the beginning of glaucoma until the visual loss appears [2]. If initial glaucoma patients can be found in the glaucoma screening, prompt treatments can be adopted to decrease the vision loss effectively. Therefore, early diagnosis and treatment of glaucoma are important to protect the vision of the patients. Generally, the diagnosis of glaucoma requires rich clinical experience. However, a small number of glaucoma professional physicians cannot meet the needs of The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei . large-scale glaucoma screening. Therefore, there is an urgent need for automatic and accurate diagnosis methods for glaucoma.
Optical coherence tomography (OCT) and color fundus images are two methods of glaucoma screening [3]- [6]. Because OCT is expensive, and color fundus images are highly efficient and economical, color fundus images are more suitable for large-scale initial screening work [6]. If the retinal nerve fiber layer defect (RNFLD) is detected in the color fundus image, it can be used as an indicator for glaucoma diagnosis. In the color fundus image, the optic disc is a bright yellow oval region, and the RNFLD is a wedge-shaped dark region close to the optic disc [7], as shown in Figure. 1.
A number of works have been proposed to segment the RNFLD. These methods mainly include image segmentation methods based on traditional techniques and segmentation FIGURE 1. An example of a color fundus image with retinal nerve fiber layer defect (RNFLD). The green arrow points to the optic disc region, the white arrow points to the RNFLD, and the blue arrow points to the blood vessels in the fundus image. methods based on deep learning. However, traditional techniques are mainly based on hand-crafted features, which lack effective representations and are susceptible to low contrast quality. Although existing deep learning methods of RNFLD segmentation can automatically extract features, they also have three main problems: 1) The RNFLD segmentation obtained by the classic CNNs cannot be trusted by doctors, because the segmentation results neither have a specific morphology nor conform to the intuitive perception of the doctor.
In contrast, RNFLD marked by the doctor generally has a wedge-shaped geometry and smooth boundaries. Therefore, a method that meets the morphological characteristics of true RNFLD needs to be proposed. 2) The prediction accuracy of a few fundus images with special styles is not high enough. In the case in which there are multiple pieces of RNFLD or the contrast between the inner and outer areas of the RNFLD outline is not obvious, the RNFLD extracted by the above deep learning methods have large errors compared to the RNFLD marked by glaucoma experts.
3) The information in red and blue channels of color fundus images is not fully exploited, since the RNFLD information is mainly distributed in the green channel of the color fundus images, and the green channel is easy to completely dominate the training of the CNN. A method that can fully mine feature information needs to be proposed. To solve the above three problems in segmenting RNFLD in fundus images, we proposed a novel method based on adversarial learning. The main contributions of this work include: 1) We designed a novel conditional adversarial shuffle U-shaped network (CASU-Net), which consists of a generator and a discriminator. The discriminator is used to supervise the segmentation results of the generator. This design not only improves the pixel-wise segmentation accuracy, but also makes the geometry of the target area obtained by the generator closer to the ground truth, which helps to strengthen the doctor's trust in the segmentation result. 2) In the CASU-Net, an adaptive weighted (AW) segmentation loss was designed for the generator. This AW loss can adaptively adjust the weight of each fundus image, so that the fundus image with large segmentation error gets more attention in the training. The AW loss tends to improve the segmentation accuracy of all fundus images simultaneously during the entire training process, instead of improving the majority in the early stage and then improving the remaining few in the later stage, so as to avoid overfitting for fundus images with special styles. 3) In the CASU-Net, a channel shuffle module was designed for the generator. Through feature rearrangement and random inactivation of connections between feature channels, the shuffle module can not only focus on the information of the green channel, but also fully explores the information of the red and blue channels, thus enhancing the feature extraction capability of the model. 4) We evaluate the effectiveness and generalization capability of the proposed CASU-Net and existing methods on a RNFLD dataset from Beijing Tongren Hospital. The proposed CASU-Net achieves state-of-the-art segmentation performance. The rest of this paper is organized as follows. We first review techniques related to the RNFLD segmentation in the second part. The framework of CASU-Net is described in the third part, and the experimental setup and results are presented in the fourth part. We further obtain the conclusion in the fifth part.

II. RELATED WORK
Some traditional methods have been proposed for the detection of RNFLD in the fundus image. Muramatsu et al. applied the Gabor filters to the enhancement of RNFLDs after the removal of the major blood vessels. By using LDA and ANN classifier, true RNFLDs were identified from the darker bandlike regions [8]. In [9], Oh et al. applied Hough transformation to detect the candidates after illumination correction and polar transformation. Knowledge-based rules were used to reduce false detection candidates for RNFLD. Lamani et al. proposed a method based on texture and fractal description for glaucomatous retina detection and used a support vector machine classifier for classification [10]. In [11], Panda et al. classified RNFLD boundary pixels using random forest with cost-effective red-free fundus images. However, there is a large gap between the RNFLDs predicted by these methods and those marked by ophthalmologists.
In recent years, deep learning has developed rapidly in the field of computer vision. It has made great progress in image classification [12]- [18], object detection [19], [20] and image segmentation [22]- [27]. Compared with traditional methods, deep neural networks can automatically extract features from the input data and achieve higher accuracy. There have also been breakthroughs in the detection of RNFLDs. Panda et al. [28] proposed a deep learning method to detect RNFLD boundaries. In this method, the visibility of the RNFLD region was further enhanced by contrastlimited adaptive histogram equalization after the removal of VOLUME 8, 2020 FIGURE 2. Structure of the proposed CASU-Net. G represents generator, D represents discriminator, x represents a retinal fundus image, y represents ground truth, and D(x, y ) and D(x, G(x)) represent the probability of true samples and false samples predicted as ground truth by the discriminator, respectively. Loss D represents the discriminator loss, Loss adv represents the adversarial loss of the generator, and Loss seg represents the segmentation loss of the generator. blood vessels. The RNFLD boundary pixels were selected for training and testing. A patch-based deep convolutional neural network (DCNN) was initially used to detect RNFLD boundaries. The detected RNFLD boundary pixels were fitted into lines by the random sample consensus algorithm. In [29], Watanabe et al. proposed a DCNN with deconvolutional layers to detect RNFLD. DCNN training was carried out using different input image sets, such as original images of abnormal cases, original images of both normal and abnormal cases, and transformed half images.

III. METHODS
We propose a novel CASU-Net framework for the segmentation of RNFLD in fundus images. The CASU-Net consists of a generator and a discriminator, as shown in Figure. 2. The generator is a U-shaped convolutional neural network whose input is the fundus image (x) with three channels, and the output is the probability map (y) of the RNFLD segmentation.
The overall loss function of the generator Loss G includes an AW segmentation loss Loss seg and an adversarial loss Loss adv .The parameters of the generator network are optimized by minimizing Loss G . The discriminator is a convolutional classification network. The input of the discriminator network consists of the fundus image (x) and the RNFLD segmentation probability map (y or G(x)). Herein, the RNFLD segmentation probability map is divided into ground truth (y) annotated by glaucoma experts and the probability map (ŷ = G(x)) generated by the generator network. The output of the discriminator is the probability of the network classifying the input as ground truth. The loss function of the discriminator Loss D can be calculated based on the output, and parameters of the discriminator network are optimized by minimizing Loss D .
During training, the optimization and update of parameters of generator and discriminator are implemented alternately. Optimizing the parameters of the generator network can Examples of the pixel intensity distribution in the three channels of RGB. Image represents the color fundus image, GT represents the RNFLD area marked by the doctor, and R, G, and B represent the pixel intensity distribution of the fundus image on the three channels of red, green, and blue, respectively.
obtain segmentation with higher accuracy, and it makes the discriminator difficult to distinguish the source of the RNFLD segmentation probability map. Optimizing the parameters of the discriminator network can improve the discriminator's ability to distinguish between the ground truth and the RNFLD segmentation generated by the generator network. In the process of alternate optimization and update of generator parameters and discriminator parameters, the performance of the discriminator network and the generator network are both enhanced. Finally, RNFLD segmentation results with high segmentation accuracy and highly consistent with the geometry of ground truth is obtained.

A. DISCRIMINATIVE NETWORK
The discriminator network of the proposed CASU-Net is a classification network, which consists of ten convolutional layers, four pooling layers, and a global pooling layer. The convolutional layers are composed of 3 × 3 convolution kernels. Each convolution layer is followed by a rectified linear unit (ReLU) activation function. By using the padding operation, the convolutional operation does not change the size of the feature map. After the feature map goes through the pooling layer, the height and width of the feature map are reduced to half of the original size. In addition, we use the global pooling layer instead of the conventional fullyconnected layer. The network structure is shown in Figure. 2. The input of the discriminator network is a four-channel structure composed of the fundus image (x) and the RNFLD segmentation probability map (y or G(x)). Herein, the sample labeled by the glaucoma expert is represented as (x, y), and the sample generated by the generator is represented as (x, G (x)). The sizes of x, y, and G(x) are (H , W , 3), (H , W , 1), and (H , W , 1), respectively. Here, H and W represent the height and width of the feature, respectively. The output of discriminator is a real value mapped by Sigmoid function to [0, 1], which means that the discriminator network judges the input as the probability of ground truth. The loss function of the discriminator is: During discriminator training, the parameters of generator G remain unchanged, and the parameters of discriminator D are optimized and updated by minimizing the loss function Loss D of discriminator. The optimization objective of the discriminator is to distinguish between the RNFLD probability map generated by the generator and the ground truth of RNFLD.

B. GENERATIVE NETWORK
The generator of the proposed CASU-Net is an end-to-end shuffle U-shaped network (SU-Net), which consists of three components, as shown in Figure. 2. The first part is the shuffle module, which is mainly used to enhance the generalization ability of the network. The second part is an encoder network and a decoder network, which are used to generate multi-level representations and the final prediction. The third part is a mixed loss, which is used to optimize the generator. Figure. 3. shows two fundus images and their pixel intensity distribution in the three channels of RGB. As can be seen from Figure. 3, the pixel intensity distribution of the three channels of RGB in an image has strong consistency and certain difference. For most fundus images, the RNFLD is more obvious on the green channel, as shown in Figure. 3 (a); for a few fundus images, the RNFLD is more obvious VOLUME 8, 2020 on the blue or red channel, as shown in Figure. 3 (b). Due to the higher correlation between the green channel and the RNFLD, the parameter update of the traditional CNN will be dominated by the information of the green channel and ignore the information of the blue and red channels. In order to avoid the network from over-reliance on green channels, we designed a shuffle module to mine the overall correlation information and detail difference information of the three channels at the same time.

1) SHUFFLE MODULE
In the shuffle module, the order of three channels in RGB is randomly changed to achieve reordering. Let X = [x 1 , x 2 , x 3 ] be the original feature map of a fundus image. X is transformed intoX by a feature rearrangement with a probability of 0.5. Here,X is defined asX = 1} is a random binary weight and satisfies the following condition: i e ij = 1, j e ij = 1(i = 1, 2, 3; j = 1, 2, 3). At the same time, we are inspired by dropout [30], connections between feature channels of the shuffle module and feature channels of the encoder network are randomly deactivated according to a certain probability. Let Y = [y 1 , y 2 , . . . , y n−1 , y n ] represent a feature vector of the first layer in the encoder network and n represent the number of features of this layer. Here, the formula of y j is expressed as follows: where w ij is the weight of the convolution kernel, r i satisfies the Bernoulli distribution, b j is the bias of the j-th feature, and f is the ReLU activation function. When there holds i r i ≥ 2(i = 1, 2, 3), r i can be used for network training, otherwise r i is generated randomly again. The shuffle module only works during the training process and not during the prediction process.

2) FEATURE ENCODER AND DECODER MODULE
In our work, we modify the U-shape convolutional network (U-Net) in [31] as the main part of the proposed SU-Net. The modified U-Net serves as the base model in our paper. Baseline contains an encoder network and a decoder network similar to U-Net. The last features of the decoder are processed by 1 × 1 convolution and sigmoid function operations to obtain the prediction of the RNFLD. Compared with U-Net, we make the following improvements in the feature encoder network and decoder network. Our baseline replaces conventional convolutions and pooling layers with deep separable convolutions [32] and strided convolutions, respectively. Each separable convolution is followed by a ReLU activation function and a batch normalization. Skip connections are introduced between two adjacent separable convolutions for residual correction. Second, multiple layers of information are combined for the final feature prediction.

3) ADVERSARIAL LOSS AND ADAPTIVE WEIGHTED LOSS
In order to balance the background and target area, and optimize the network from both the pixel level and the picture level, we propose the mixed loss function of the generator (Loss G ), which consists of an adversarial loss and a novel adaptive weighted segmentation loss: In the above expression, λ is a parameter used to balance Loss adv and L seg , and the adversarial loss is defined based on predictions by the discriminator: And, the adaptive weighted segmentation loss L seg is defined based on the difference in pixel level between the probability map generated by the generator and ground truth. In the expression of adversarial loss Loss adv , 1 − D (x, G (x)) represents the probability that discriminator judges G (x) as a fake sample. The network parameters of the generator are adjusted by minimizing Loss adv . The segmentation loss L seg (G) is composed of a RNFLD segmentation loss and a background segmentation loss. The background and the RNFLD is defined as class 1 and 2, respectively. First of all, the true positive, false negative and false positive of the two classes are defined for a fundus image: Here, k represents the label of the object (k = 1, 2), i represents the sequence number of the fundus image, j is the sequence number of the pixel, and N represents the total number of pixels in a fundus image. And, p i j (k) ∈ [0, 1] represents the probability that the j-th pixel of the i-th fundus image predicted by the generator network belongs to the objective region of the k-th class feature; g i j (k) ∈ {0, 1} represents the true label of the j-th pixel of the i-th fundus image of the k-th class feature. Based on the above definitions, the Dice value of the k-th class feature of the i-th fundus image is defined as: Considering that there is no RNFLD in some fundus images, it is likely to appear that TP i 2 , FN i 2 and FP i 2 are all 0. In order to avoid the situation where the denominator is 0 in the expression of DE i k , we add a small value 1 to its denominator. Herein, α k and β k are penalties for false negatives and false positives for class k. The proposed DE i k is an improved Dice coefficient index, which is used to describe the similarity between the prediction result of the k-th feature of the i-th fundus images and the corresponding ground truth. Furthermore, we can define the adaptive weighted segmentation loss of the generator: where w i k represents the weight of the k-th class feature of the i-th sample. When there is no true RNFLD in a fundus image, In the 132352 VOLUME 8, 2020 is a novel adaptive weighted method for sample weights. Here, θ and γ are parameters for balancing weights of hard and easy samples.
In the training process of traditional deep neural networks, simple samples dominate the updating of network parameters, while hard samples cannot receive enough attention. The adaptive weighted loss can automatically increase the weight of hard samples while reducing the weight of easy samples, thereby uniformly improving the prediction accuracy of all samples.

4) COMPARISON WITH RELATED WORKS a: COMPARISON WITH CONDITIONAL ADVERSARIAL NETWORK
The proposed method has the general architecture of a conditional adversarial network. The loss functions of the discriminator and generator can be unified as the following objective function: In this objective function, the objective of discriminator D is to maximize the objective function to accurately distinguish the RNFLD probability map generated by the generator and ground truth, and the objective of generator G is to minimize the objective function to generate RNFLD probability map that is indistinguishable by the discriminator and reduce the pixel-level deviation between the RNFLD probability map and ground truth. The training of discriminator and generator is performed alternately. The optimization process of the model is as follows: first, the parameters in the discriminator network and generator network are assigned using a random initialization method. Then, the network parameters of the discriminator and the generator alternately perform n rounds of optimization. In each round of optimization, the network parameters of the discriminator are optimized k 1 times, and then the network parameters of the generator are optimized k 2 times. The detailed optimization process is shown in Algorithm 1. Compared with the classic conditional adversarial network [33], the proposed CASU-Net has the following advantages: 1) The objective function contains not only the objective function of conditional GAN, but also the segmentation loss of the RNFLD. It pays attention to the difference between the prediction result of RNFLD and ground truth in the overall geometry and pixel level. 2) A strategy is designed to adaptively adjust the sample weight based on the segmentation accuracy of samples. As a result, the fundus image difficult to segment is focused more by the model. The final model is effectively improved in predicting both the hard and easy samples. 3) The segmentation loss L seg (G) takes the segmentation performances of the RNFLD region and the background region as two optimization objectives to solve the absence of RNFLD and the imbalance data of pixel-wise segmentation.

b: COMPARISON WITH FOCAL LOSS
The focal loss is a loss function designed to prevent easy samples from dominating CNN training [34]. It can be applied in the field of target detection and image segmentation. For image segmentation, the formula for focal loss can be described as follows: (7) where γ represents the focusing parameter, i represents the sequence number of the fundus image, j represents the sequence number of the pixel, N represents the total number of pixels in a fundus image, m represents the number of fundus images used in one iteration, p i j (k) ∈ [0, 1] represents the probability that the j-th pixel of the i-th fundus image predicted by the CNN to the RNFLD region, and g i j ∈ {0, 1} represents the RNFLD label of the j-th pixel of the i-th fundus image.
The basic design idea of focal loss and the proposed AW loss are the same: both give more attention to difficult objects in the process of training the network. However, there are obvious differences between the two loss functions. First of all, in the calculation of the focal loss, the weight is given to pixels, and more attention is paid to pixels with large prediction deviations during the training process; but in the calculation of the AW loss, the weight is given to images, and more attention to is paid to images with large prediction deviations. In addition, in order to overcome the problem of unbalanced pixel distribution in the target area and the background area, we designed a weighting strategy for the segmentation accuracy of the two types of areas in AW loss.

IV. EXPERIMENTS A. DATASETS AND EVALUATION METHOD
In the experiment of this paper, we evaluated the proposed and compared algorithms in a dataset from Beijing Tongren hospital, which included 474 fundus images with a resolution of 1924 × 1924. A glaucoma expert judged whether there was RNFLD in the fundus image, and manually marks the boundary line of RNFLD for the fundus image with RNFLD. We subtract the fundus image from the labeled fundus image according to the pixel position to obtain the RNFLD boundary. The boundary line divides a fundus image into multiple areas, the largest one is marked as the background, and the remaining areas are marked as RNFLD. There were 223 fundus images with RNFLD and 251 fundus images without RNFLD.
In order to comprehensively evaluate performances of the proposed and compared methods, we used the following evaluation metrics to compare different methods, including F-score, sensitivity, specificity, the Receiver Operating Characteristic (ROC) curves and the area under ROC curve (AUC). Here, F-score, sensitivity, and specificity are defined as F score = . , x m } from retinal images dataset • Optimize the convolution kernel parameter in the discriminator by ascending its gradient: end for for k 2 steps do y 1 ), . . . , (x m , y m )) from retinal images and corresponding ground truth dataset • Optimize the convolution kernel parameter in the generator by descending its gradient: end for end for specificity = TN TN +FP , where TP, FN, and FP are the true positives, false negatives, and false positives. In addition, considering that the area of the objective segmentation is much smaller than that of the background area, we also used Mean Intersection over Union (MIoU) and Mean Average Precision (MAP) [35] to evaluate different methods. Herein, MAP and MIoU are defined as: Here, p ij represents the number of pixels that belong to the i-th class and are predicted into the j-th class, and p ii represents the number of pixels that belong to the i-th class and are predicted into the i-th class. We used the two-fold cross-validation method to evaluate the performance of the proposed method and the compared methods. The fundus image dataset was randomly divided into two folds. One fold contained 237 fundus images, which included 111 fundus images with RNFLD. The other fold contained 237 fundus images, which included 112 fundus images with RNFLD. The average value of the two-fold cross-validation was used as the evaluation metric. Five replicate experiments were performed on the two-fold crossvalidation. Finally, the average value of five repeated experiments was used as the metric evaluation value.

B. EXPERIMENTAL SETUP
In the experiments, both the original fundus image and the RNFLD segmentation image marked by the doctor were compressed to a resolution of 256 × 256 pixels. Standard techniques were used to perform data enhancement on the training set, such as random rotation, flipping, and translation. The proposed CASU-Net model was implemented based on the Keras framework with Tensorflow backend. During the network training process, Adam optimization method was used to optimize the model parameters. The initial learning rate was set to 10 −4 . For the parameters of the shuffle module, p was set to be 0.8. The parameters in the loss function were set as follows: α 1 = 4, α 2 = 1, β 1 = 1, β 2 = 1, 1 = 10 −7 , 2 = 10 −7 , λ = 10, θ = 20, and γ = 2. When the parameters are set as α 1 = α 2 = β 1 = β 2 = 1, the generalized DICE is the standard Dice: Dice coefficient = 2TP 2TP+FN +FP . In order to emphasize the false positives of the defect area, we set α 1 to be slightly higher than the other three parameters: α 1 = 4 and α 2 = β 1 = β 2 = 1. The parameters 1 and 2 are  two infinite quantities set to avoid meaningless expressions (the denominator is zero and ln0). We refer to the default infinitesimal value recommended in the Keras deep learning framework and set these two parameters to 10 −7 . The parameters θ and γ are parameters in the proposed AW Loss, which are used to reflect the importance of the sample with large prediction error. When θ = 20 is satisfied, for the sample with DICE = 0, the value of the loss is roughly as 10 times of the original value. Therefore, in this study, we set θ = 20. We refer to the γ parameter setting of focal loss in [34]. The parameter λ is the weight used to balance the adversarial loss and the AW loss in the total loss function. The parameter λ is set to 10 according to the weighted ratio of the multiple losses in the reference network [27]. In the comparative experiments in this paper, the hyperparameters of three methods (FCN, SegNet, and U-Net) are consistent with the literature [21], [22], and [29]. In this paper, 0.5 was used as the threshold to convert the probability map predicted by the network into a binary image.

C. EXPERIMENTAL RESULTS
In the experiments, we compared the proposed CASU-Net with the following methods: FCN [29], SegNet [21], U-Net [22], M-Net [6], and CE-Net [37]. Experimental results are presented in Table. 1. As shown in Table.1, on four metrics of F-score, MAP, MIoU, and sensitivity, the proposed CASU-Net achieved the highest value. For specificity, the proposed CASU-Net is slightly lower than FCN and CE-Net. Additionally, compared with the three classical image segmentation methods (FCN, SegNet, and U-Net), the proposed CASU-Net obtains obvious prediction advantages. Receiver Operating Characteristic (ROC) curves predicted by different methods in the first experiment of the first-fold dataset is shown in Figure. 4.
As can be seen from Table.1, the sensitivity differences of various methods are more obvious than the specificity. In other words, compared with the background region, different methods have a more significant difference in the segmentation accuracy of the objective region (RNFLD). In order to show more details of the prediction accuracy of different methods for fundus images with RNFLD, we made boxplots of MAP and Dice predicted by different methods in the first experiment of the first-fold dataset. As can be seen in Figure. Further, we showed visualization examples of methods proposed in this paper and other methods to predict the RNFLD in the first experiment of the first-fold dataset, as shown in Figure.6. For fundus images with RNFLD (fundus images labeled A-D), the segmentation of CASU-Net for the objective area is the most complete compared with those of other methods. The FCN, SegNet, and U-Net have obvious problems in incomplete segmentation for the objective area. The M-Net and CE-Net are improved compared with the above three methods, but still have an obvious gap compared with the ground truth. Additionally, the RNFLD region predicted by the proposed CASU-Net is closer to ground truth in morphology: on the one hand, the outline of the RNFLD region segmented by CASU-Net is smoother; on the other hand, the shape also conforms to the wedge structure. Especially for a few fundus images with special styles, the comparison methods have poor prediction performance. In the case in which there are multiple pieces of RNFLD or the contrast between the inner and outer areas of the RNFLD outline is not obvious, CASU-Net has a more obvious improvement, as shown in fundus images labeled A and D. For the fundus images without RNFLD (fundus images labeled E-G), the proposed CASU-Net has slightly wrong predictions, but predicts almost the entire fundus image as the background area.
In order to verify the effectiveness of the proposed shuffle module, AW loss, and GAN framework, we conducted further comparative experiments. The experimental results are presented in Table.2. The baseline (BL) is a modified U-shaped network with separable convolutions. Four BLs use four images of red channel (R), green channel (G), blue channel (B), and color fundus image (RGB) as their input for training. The experimental results show that when the green channel is used as the input, the BL can obtain better performance than when the red channel or the blue channel is used as the input. When the color fundus image is used  as the input of BL, compared with when the green channel is used as the input, the prediction results are improved in all the five metrics, but the improvement is not obvious. This is consistent with the results reported in [29]. When the proposed shuffle module is combined with the BL, the SU-Net has achieved significant improvements on four metrics: F-score, MAP, MIoU, and sensitivity. The effectiveness of the proposed GAN framework and AW loss is further verified. After SU-Net is combined with the AW loss, the performance of SU-Net has been improved. When the GAN with SU-Net is combined with the AW loss, the proposed CASU-Net (SU-Net + GAN + AW) has achieved the best prediction performance on four metrics: F-score, MAP, MIoU, and sensitivity. Additionally, the improvement on F-score is obvious.
We further compare the impact of different loss functions on the BL model, and the results are shown in Table.3. From Table.3, we can see that the BL model with the proposed AW loss achieves the best prediction results on four metrics: F-score, MAP, MIoU, and sensitivity, and the advantages are obvious. On the specificity metric, the BL model with the focal loss achieves the best prediction result, but the advantage is slight.

V. CONCLUSION
We designed a novel conditional adversarial shuffle U-shaped network to segment RNFLD from fundus images. The proposed CASU-Net consists of a generator and a discriminator. The SU-Net was first proposed as a generator, which employed a U-shape network with separable convolutions as the main structure. To achieve more efficient feature extraction, a shuffle module is constructed in SU-Net to make full use of the feature information in RGB three channels. For obtaining prediction results that are closer to the doctor's annotation morphologically, a discriminator was employed to supervise the spatial structure and geometry of the RNFLD predicted by the SU-Net. Furthermore, an adaptive weighted segmentation loss was designed to deal with the imbalance data of pixel-wise segmentation. Besides, the loss adaptively adjusts the weight of each fundus image in training, so that fundus images with large segmentation errors get more attention in training and generalization performance of the model can be enhanced. In the experiment, the proposed method obtained the state-of-the-art result, which can promote the automatic positioning of the RNFLD for glaucoma screening and alleviate the urgent need for professional glaucoma physicians.
RUIRUI LI (Member, IEEE) received the Ph.D. degree in computer science and technology from Tsinghua University, Beijing, China, in 2014. She joined the College of Information Science and Technology, Beijing University of Chemical Technology, as a Postdoctoral Researcher, where she was appointed as a Lecturer, in 2017. She has published more than 20 peer-reviewed articles. She is the Inventor or Co-Inventor of two patents. Her research interests include image processing in remote sensing, machine learning and pattern analysis, and computer vision. She is a Peer Reviewer of the IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS and Remote Sensing.
YONGLI XU received the B.E. and Ph.D. degrees from the Beijing University of Aeronautics and Astronautics, Beijing, China, in 2005 and 2010, respectively. He has been with the Department of Mathematics, Beijing University of Chemical Technology, Beijing, since 2010, where he is currently an Associate Professor. His current research interests include machine learning, system identification, and biomedical image analysis. He is a Reviewer for a number of international journals and conferences, such as the IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, the IEEE TRANSACTIONS ON AUTOMATIC CONTROL, Neurocomputing, and the American Control Conference. VOLUME 8, 2020