SSCV-GANs: Semi-Supervised Complex-Valued GANs for PolSAR Image Classification

Polarimetric synthetic aperture radar (PolSAR) image classification has been widely applied in many fields, such as agriculture, meteorology and military. However, some problems, such as the deficiency of labeled data and the underutilization of data information, are always the challenges that can not be ignored in PolSAR image classification. In this paper, a semi-supervised complex-valued generative adversarial networks (SSCV-GANs) is proposed for the first time to address the two issues mentioned above simultaneously. On the one hand, the complex-valued model conforms with the physical mechanism of PolSAR data and it plays an important role for retaining and utilizing amplitude and phase information of PolSAR data. On the other hand, we also present a new complex-valued GANs together with semi-supervised learning to alleviate the problem of insufficient labeled data. Specifically, our complex-valued GANs expands the training data set by generating fake data. Flevoland data and San Francisco data are used to validate the effectiveness of our model. Experimental results show that our model outperforms existing state-of-the-art models in terms of classification accuracy, especially for conditions with fewer labeled data. In particular, the analysis of the statistical distribution of the generated fake data and the real data further demonstrate the effectiveness of the proposed SSCV-GANs.


I. INTRODUCTION
With the rapid development of satellite sensor, polarimetric synthetic aperture radar (PolSAR), as one advanced and representative sensor in the remote sensing field, acquires a large amount of data with high spatial resolution and adequate information from the observed area regardless of time and weather conditions. It has been widely applied in military, agriculture, forestry, and natural disaster measurement and so on. The scatter echoes of PolSAR on land contain abundant ground characteristics. In addition, the multi-Polarimetric measurement of these echoes results in The associate editor coordinating the review of this manuscript and approving it for publication was Krishna Kant Singh. more feature information and can be used for further ground exploration. Due to these advantages, the interpretation and analysis of PolSAR data become paramount important. However, PolSAR data represented by the complex vectors are always computed in real number in conventional methods due to the limitation of existing networks and algorithms. This easily leads to the information loss during classification or recognition. Support vector machine (SVM), as an example, is applied to the classification of PolSAR images [1]. Similarly, principal component analysis (PCA) is popular for the processing of PolSAR data because of its advantages of simpleness and high accuracy [2], [3]. Besides, a large number of research has been done on PolSAR data classification, such as [4]- [8]. Compared to conventional algorithms, machine learning methods can product better classification accuracies. Whereas, the high dependency on data and parameter number of machine learning have limited its widespread application in remote sensing images interpretation. As early as the 1970s, it has been realized that covariance matrix and coherency matrix follow the complex Wishart distribution [9] and this conclusion has facilitated the analysis and application of PolSAR data. For example, a maximum likelihood classifier, which applies to the covariance matrix, is proposed to classify PolSAR data [10]. However, the breakthrough development of PolSAR classification benefits from the development and application of deep learning [11]- [13]. On the basis of typical network models of deep learning, researchers pay their attentions to the reasonable extraction and utilization of PolSAR data features. Such as [14] exacted PolSAR image features by neighborhood preserved deep neural network, Similarly, [15] also applied the spatial relationship of pixels by superpixel. Besides local spatial information, Liu et al. [16] also made full use of the prior knowledge of PolSAR data and proposed a novel unsupervised deep model, Wishart DBN (W-DBN). This model is staked by modified RBMs (WBRBM), which can model data distribution to corresponding training layers and achieve better classification results with very few labeled pixels. Similarly, [17]- [19] are also the typical deep learning models for PolSAR images classification. However, these operations mentioned above are all accomplished in the real number domain. They disregard the polarizatric correlation of phase and amplitude or directly ignore phase information which are extremely important in PolSAR images interpretation. Mandic et al. [20] proved that the application of real-valued data causes more information loss than direct operate in the complex domain. Lately, Wisdom et al. [21] and Arjovsky et al. [22] also proved that complex-valued form has a richer representational capacity in recurrent neural networks and highlighted the advantages of complex-valued data. Besides the representational capacity, complex-valued neural networks also have the characteristics of faster learning and easier optimization [21], [23].
Inspired by the extension of the real-valued convolutional neural network to complex-valued convolutional neural networks (CV-CNN) [24] and in order to make full use of the PolSAR data information, Zhang et al. [25] proposed the application of CV-CNN on PolSAR data classification and obtained sound results. This is the beginning of CV-CNN to classify PolSAR data. In the CV-CNN model, all elements including input-output layers, convolution layers, pooling layers and activation layers in this model are all in the complex field, and this network model is trained by supervised learning, which leads to high demands on labeled samples. However, the labeled samples are extremely deficient in PolSAR data because of its difficulty in artificial demarcation. Before the application of complex-valued neural network, this issue has been addressed in PolSAR images classification in the real-valued neural network by unsupervised or semi-supervised learning [14], [16]. Meanwhile, generate adversarial networks (GANs) [26] can expand data and has been widely used in many fields, such as the generation of natural image [27] and neural dialogue [28], because of its capability of learning the potential distribution of actual data and generate fake data, which has the same distribution with actual data.
Based on the studies above, in order to relieve the shortage of labeled data and utilize data information effectively, we propose a semi-supervised complex-valued GANs framework to classify PolSAR data. On the one hand, in order to retain the physical properties and rich information of PolSAR image, our model is based upon the raw complex-valued matrix rather than transformed real-valued vector. Besides normal complex-valued features extraction and classification, a complex-valued GANs framework is also used to generate complex-valued data (consist of amplitude and phase) as fake PolSAR data. On the other hand, the labeled data, unlabeled data, and the generated fake data are used to train and optimize the network by semi-supervised learning. In other words, besides labeled data, unlabeled data and generated fake data are also used to improve the classification performance of model by semi-supervised learning. Preliminary results of this paper appeared in [29]. In our experiments, two benchmark data sets of Flevoland and San Francisco are used to verify the effectiveness of our method. The results indicate that our model achieves superior classification accuracies than existing state-of-the-art models, especially when labeled samples are insufficient.
The remainder of this paper is organized as follows. In Section II, we introduce the related works on complexvalued networks and GANs. Section III describes our novel network framework and the classification process. In Section IV, we presents the experimental results and analysis on different data sets in the first part, and then confirm the role of CBN, semi-supervised GAN and activation function in promoting the model classification ability through ablation experiments and compare the similarity between the generated data and the actual data through data distribution. Finally, Section V concludes this paper.

II. RELATED WORKS A. COMPLEX-VALUED NEURAL NETWORKS
All classical deep neural networks such as the convolutional neural network (CNN), deep belief network (DBN), recurrent neural network (RNN), fully convolutional network (FCN), residual network (ResNet), and generative adversarial networks (GANs) are initially proposed in the real number field. However, in nature, many objects are expressed by complex numbers with amplitude and phase information which are indivisible, such as electromagnetic, ultrasonic, quantum waves, etc. Because of their unique physical properties, existing real-value neural networks are incapable of extracting data features effectively. To address this issue, the theories and application of complex-valued neural networks (CVNNs) VOLUME 8, 2020 have been proposed [30]. Before that, there is a great many research concentrating on how to make full use of complexvalued data by neural network. Noest proposed a phasor neuron [31] in 1987 and a discrete-state phasor neural network [32] in 1988, respectively. Those models are applied to process unit-length 2-vectors (phasors), which can be treated as complex numbers. In 1988, Chua presented cellular neural network [33], which obtains sound results in images processing and recognition. However, this model is restricted to exclusive binary signals. And then, Aizenberg [34] proposed a multi-valued neural network to process multi-valued signals. Meanwhile, the dynamics of fully complex-valued neural network are presented [35], Hirose also studied a backpropagation learning method to optimize the proposed fully complex-valued neural network [36]. Nitta made many outstanding researches on optimizing CVNNs [37]- [39]. In [40], Suksmono et al. used an adaptive noise reduction method based on complex-valued Markov random field (CMRF) to address phase unwrapping problem in radar system. Hirose presented the use of complex-valued self-organizing map (CSOM) on dealing with multiple-frequency interferometric images for plastic mine detecting [41]. In [42], Tanaka presented two alternative complex-valued activation functions, which combined with complex-valued Hopfield neural network (CHNN) to improve the performance of multistate associative memory for gray-level image reconstruction. In addition, Hansch et al. made some researches on PolSAR images interpretation using complex value neural network [43], [44].
After more than ten years of development, great progress has been made in the theory and application of the CVNNs [30], [45], [46]. In order to further study the performance of CVNNs, [47], [48] explored the stability of CVNNs with asynchronous time delays, and [49], [50] studied the stability of complex-valued recurrent neural networks (CRNNs). Simultaneously, complex-valued convolutional neural network (CV-CNN) has also been proposed [24], and then been used for classification of PolSAR images [25].

B. GENERATIVE ADVERSARIAL NETWORKS
Inspired by two play game, generative adversarial networks (GANs) are proposed by Goodfellow et al. [26]. This model has a great performance in many applications including the generation of images by noise. Specially, the two adversarial modules, generative module (G) and discriminative module (D), are the kernels of GANs. The generative module (G) learns the distribution of actual image P(X ) and generates analogical images G(x). The discriminative module (D), combined with discriminating network, is trained to discriminate the facticity of all input images including P(X ) and G(x). After a series of training and learning, the generative ability of generative module (G) and the discriminative ability of discriminative module (D) are improved simultaneously. The minimax game is used to train this network as follows: After the alternate game of those two adversarial modules, G can generate more realistic data and D has better discrimination. By this method, data generated by G are used to train model and improve the classification ability of model, especially when training data are not enough. With the excellent performance in generating data, GAN has been rapidly developing in recent years.
Based on GANs, more and more optimization strategies are proposed to improve network performance. Mirza and Osindero introduced an ideology, which synthesizes image samples based on conditional class labels and obtains perfect generated samples that could be used to train classifier [51]. Besides class labels, modified GANs can also apply other information to generate samples, such as text description and object location [52]. In addition, some other variants of GANs also are proposed. Chen et al. [53] proposed an Information Maximizing Generative Adversarial Networks (InfoGAN). This model can learn interpretable and disentangled representations by completely unsupervised manner, and then successfully disentangle target from complex data. Odena et al. [54] proposed auxiliary classifier GAN (AC-GAN). It utilizes label conditioning to generate images with high resolution, and proposes two evaluation criterions to distinguish the discriminability and diversity of samples from class-conditional image synthesis models. Some other generative models, just like [55], present conditional image generation method which uses descriptive labels, tags or latent embeddings to improve the quality of generated samples. In addition, Salimans et al. [56] proposed a semi-supervised learning method, in this model, GANs are trained and obtained promising classification results with fewer labeled samples.

A. NETWORK ARCHITECTURE
As we all know, labeled samples are insufficient in supervised learning for interpretation of remote sensing images. In order to make up for the scarcity of labeled samples in PolSAR images, GANs as alternative network are selected to generate fake samples for training network because of its ability of generating data. Without doubt, the conventional real-value GANs can generate real-value data which are different from the features and distribution of PolSAR data. Therefore, for the sake of complying with the physical mechanism of PolSAR data and effective taking advantage of PolSAR data information for classification, GANs are extended to the complex-valued field, which are called complex-valued GANs. In our model, the entire complexvalue network is trained and optimized by semi-supervised learning mechanism for final classification. In the process of training network, besides generated fake samples and real labeled samples, a large number of unlabeled samples are introduced into network training. In other words, all generated fake samples, a large number of real unlabeled samples and quantitative labeled samples are used to train network by semi-supervised learning, by which the insufficiency of labeled data is relieved to some extent. The more details of network training will be introduced in Section IV.
Different from other generate models, our method is operated in the complex-valued field, and it can generate complexvalued data by random noise vectors. Figure 1 illustrated the main framework of our model, which is mainly composed of ''Complex-valued Generator'' and ''Complex-valued Discriminator''. This framework consists of complex-valued full connection, complex-valued deconvolution, complexvalued convolution, complex-valued activation function and complex-valued batch normalization, which are represented by ''CFC'', ''CDeConv'', ''CConv'', ''CA'' and ''CBN'', respectively. The input data of ''Complex-valued Generator'' are two randomly generated vectors, shown as the green block and blue block in Figure 1. After a list of complexvalued operations, the two vectors are converted to a complex value matrix, which has the same shape and distribution as PolSAR data. 1 , 2 , and 3 denote generated fake data, real labeled and unlabeled data, respectively, which are applied to train ''Complex-valued Discriminator''. In our ''Complexvalued Discriminator'', similarly, a series of complex-valued operations are used to extract complex-valued features. The features extracted by the network for classification are in the form of complex numbers. Then, its real part and imaginary part are concatenated to a real number vector for final classification. As mentioned above, we use semi-supervised learning mechanism to alternately train complex-valued GANs with generated fake samples, actual unlabeled samples and actual labeled samples. Finally, our classifier can effectively identify that whether the input data is fake or real and distinguish the class label of real data. Table 1 shows an example of ''Complex-valued Generator'' where we list the operation name and output data shape in each layer. When the output data are in complexvalued form, their description includes real and imaginary part shapes. Due to limited space, we use ''FC'' to represent full connection, ''DC'' represents deconvolution. They are the traditional real-valued operations. ''+'' and ''−'' denote matrix adds and minus in element-wise, respectively. Each gray part represents a complex-valued operation, described as ''CFC'' and ''CDeConv'' in Figure 1. Two vectors (length 100) are randomly selected as input data, and they are transformed into two complex-valued matrixes with the size of 9 × 32 × 32 after a series of complex-value operations. In the description of size, 9 is the channel number while 32 × 32 represents the patch size.
In this model, ''Complex-valued Discriminator'' mainly includes complex-valued convolution, complex-valued activation function, and complex-valued batch normalization. Through these complex-valued operations, final complexvalued features are extracted for classification. In Table 2, the main framework of ''Complex-valued Discriminator'' is listed. The input shape of ''Complex-valued Discriminator'' is the same as the output shape of ''Complex-valued Generator'', which are two 9 × 32 × 32 complex-valued matrixes. At the first complex-valued convolution layer, we set a series of complex-valued convolution filters, in which each part has the size of 4 × 4 with stride 2 and padding 1. After three homologous complex-valued convolution flows, the real part and imaginary part of obtained features are concatenated to a 3-dimensional vector feature with channel number of 128 and patch size of 4 × 4. Then, those feature maps are flatten to a vector as the input of fully connected layer with the probabilities of c + 1. c represents the number of categories.
The details of each complex-valued operation and optimization will be discussed in following sections.

B. COMPLEX-VALUED OPERATION MASK
The complex number is usually represented in algebraic form or other modulus and argument forms. In the algebraic form, the numbers in real part and imaginary part are one dimensional real number. Here, z 1 and z 2 are used to denote two complex numbers, and their multiplication and addition are redefined as follows: where √ −1 = i. Through the formula above, it is found that the multiplication of two complex numbers can be decomposed to four multiplication operations and two addition operations of two real numbers, and the addition of two complex numbers can be decomposed to two addition operations of two real numbers.
Besides algebraic form, modulus and argument can also be used to denote complex numbers, such as formula 6 and 7. r 1 and r 2 are the modulus, and θ 1 and θ 2 represent the arguments. They can also be converted to algebraic form by Euler's formula.
From formula 8, the multiplication of z 3 and z 4 can be represented by a multiplication operation of two moduli and an addition of two arguments. The Computational complexity is reduced by at least three times. However, more complicated computation is needed to compute the addition of z 3 and z 4 . The modulus r 0 is the non-negative square root of the real part and the imaginary part after being converted to algebraic form. The denote θ 0 is computed by arctan function, formulated as formula 11.
where, r 0 = (r 1 cosθ 1 ± r 2 cosθ 2 ) 2 + (r 1 sinθ 1 ± r 2 sinθ 2 ) 2 (10) The calculations above have high computational complexity and may cause calculation error because of the existence of many trigonometric functions. Therefore, we usually try to avoid such low-efficiency calculation in computer technology. In this model, the algebraic form is used to represent complex-valued data, states, connection weights, and operations in this paper. Different from the traditional operation, the input data and weights are complex-valued and the corresponding numeration is more intricate in network. For a particular indication of experiment process, we defined some complex-valued operations. This operation is the multiplication of complex number as shown in formula 4. In order to more specifically represent those complex number computation mode, we execute the complex-valued computation model by proposing a complex-valued operation mask shown in Figure 2. In this Figure, the green block represents the real part and the blue block represents the imaginary part. This mask shows the calculation process of complexvalued number, whose input data (IN _r, IN _i), weight (W _r, W _i) and output data (OUT _r, OUT _i) are all composed by a real part and an imaginary part. Due to i 2 = −1, there is a ''−'' in the real part (OUT _r) of output data. In a word, this type of operation can be decomposed into four traditional real number operations, one addition operation and one subtraction operation.
As mentioned above, there are many complex-valued layers in our model such as complex-valued full connection, complex-valued convolution, and complex-valued deconvolution, all operations in the complex-valued layers above match with our proposed complex-valued operation mask.

C. COMPLEX-VALUED BATCH NORMALIZATION
Batch Normalization [57] is used to unify data and enhance the convergence pace of network, which has been widely used in deep neural networks. In other words, this operation accelerates the network convergence by translating and scaling input data's means to 0 and variance to 1. Furthermore, it has also been proved that batch normalization can stabilize the performance of GANs. However, batch normalization is first applied to real-valued network model, and it can not be applied in the complex-valued domain directly. This limitation is broken until the emergence of complex-valued batch normalization [58] presentation and application. With the similar principle, complex-valued batch normalization can effectively accelerate the convergence of complex-valued network. However, insufficient training samples and less batch size will gradually cause the significant difference among batches. The expectation and covariance matrix are independent between two batches. Therefore, the normalization of two batches are equally independent. In summary, rare training samples will inevitably influence the effect of batch normalization.
Instead, average expectation and covariance matrix are used to calculate batch normalization due to their ability to track the state of the model with the training going on. In other words, this method estimates the global samples, which is very effective when training samples are not enough. The following formulation gives the normalization of the tth batch x t :x whereσ t andV t represent the average expectation and covariance matrix from t − m to t batch respectively. They are computed as follows: where m denotes the length of memory state, and Cov() is variance function.V ri equals toV ir in the formula above. The square root of a 2 by 2 matrixV t is computed as [59].
VOLUME 8, 2020 Similar to two-dimensional normalization, this operation translates the mean value and variance of data to 0 and 1 respectively. Finally, the following formula is used to denote complex-valued batch normalization: In formula (19), γ and β are defined as two parameters to reconstruct the distribution. Unlike real-valued batch normalization, γ is a 2 × 2 matrix which has three learnable components, and β is a complex value parameter.

D. COMPLEX-VALUED ACTIVATION FUNCTION
In a neural network, activation function is usually used to enhance the nonlinear expression of network. The frequently-used activation functions are sigmoid, tanh, ReLU, LeakyReLU, Maxout, Softplus, Softsign and so on. However, those activation functions are unavailable for the complex-valued data. Up to now, there are some researches which have been done to deal with the complex-valued representations of activation functions. The most direct way is to separately process the real and imaginary part by traditional real-valued activation functions, such as CReLU [38], [58].
In addition, zReLU [24] and modReLU [22] as typical complex-valued activation function are used in complexvalued neural network. Different activation functions have their unique advantages in specific networks, the literate [58] verified the superiority of the CReLU over modReLU and zReLU in the complex valued networks what it uses. A large number of experiments show that LeakyReLU has better effect on GANs than other activation functions [53], [54], [60], because it can effectively improve the convergence rate. Based on the advantages of CReLU and LeakyReLU in complex-valued neural networks and GANs, in this paper, we defined a new complex-valued activation function called CLeakyReLU in which LeakyReLU is used to activate real part and imaginary part individually as follows: where f (·) represents complex-valued activation function, R(z) and I(z) denote the real part and imaginary part of data (z) respectively. σ denotes slope, which can solve the problem of ''dying ReLU''.

E. SEMI-SUPERVISED LEARNING METHOD
Semi-supervised GANs [56] are used to train and optimize our model. Besides labeled data, real unlabeled data and generated fake data are also used to train network model by special loss function. Softmax as the classifier of discriminator is used in our model, and its output is a K +1 dimensional vector {p 1 , p 2 , p 3 , . . . , p K , p K +1 }. In this K + 1 dimensional vector, every item from p 1 to p K is the probability of input data being category K and p K +1 is the probability of input image being fake. In order to train and optimize entire network (the generator (G) and discriminator (D)), the loss functions are redefined as follows: where L labeled represents the loss of labeled samples in the process of classification. L unlabeled denotes the loss of unlabeled samples, and it is used to update network by discriminating the authenticity of samples. L generated represents the loss of generated samples, and the generated samples are judged to be spurious category (C = K + 1) in formula (24). It is extraordinarily easy to acquire the negative log probability of labeled samples and generated samples by traditional classifier because of their clear labels needed. When training network by unlabeled data, it is not easy to express the loss function precisely because of the deficiency of the ground truth. In order to address this inevitable problem, the output probability of softmax is operated as the following step: where p max denotes the max value in p i (i < K + 1), and logistic regression as a binary classification is utilized to classify (p sum − p K +1 ). When the output is approaching 1, the probability of p K +1 is smaller than the p sum accordingly, which represents the facticity of data has been discriminated. By the method above, unlabeled data can be used to update the parameters of the discriminator. In our model, the overall optimization contains four parts. Firstly, we keep the parameters of discriminator unchanged, and then we use the generated fake data to optimize the generator with the purpose of producing more realistic data. Next, we use unlabeled actual data to optimize the discriminator. In the third step, with the identical ideology to the first step, the parameters of discriminator remain unchanged and a large number of real unlabeled samples are introduced for network training. Finally, the generated fake samples and labeled data are employed to optimize the discriminator through training the softmax classifier. Through applying fake samples generated by complex-valued GANs, a large number of real unlabeled samples are introduced to the network training. In other words, generated fake samples, real unlabeled samples and real labeled samples are used to train network. This semi-supervised learning mechanism improves the classification performance of network.

IV. EXPERIMENTS
In this section, we introduce the characteristics of PolSAR image in detail and list two widely used PolSAR datasets which are adapted to validate our method. We utilize four methods to achieve the classification of PolSAR datasets, including our proposed semi-supervised complex-valued GAN, complex-valued convolutional neural network [24], real-valued convolutional neural network [17] and Polarimetric convolutional network [19] mentioned in the introudction. Here, CC, RC and PCN are used to express the three comparative experiments. We set CC a similar architecture with complex-valued discriminator which is used to accomplish classification in our model. RC which has the same architecture and degree of freedom with CC is used as another comparison. In addition, a Polarimetric convolutional network (PCN) also is used to realize the classification of PolSAR data by a polarimetric scattering coding method. The overall accuracy (OA), average accuracy (AA), and Kappa coefficient are used to measure the performance of experiments. Finally, we will analyse the similarity of generated fake data with actual PolSAR data. Our experiments depend on HP workstation with Ubuntu 14.04 LTS system with GeForce GTX TITAN X Graphics. All the projects are implemented by MXNet [61].

A. PolSAR IMAGES DESCRIPTION
For exploiting and utilizing PolSAR image, it is necessary to consider the distribution and expression of data. Polarimetric scattering matrix [62], composed by complex value elements, contains full polarimetric information of every single pixel in PolSAR image. This matrix is widely used for investigating PolSAR image and shown in equation (26): In this complex-valued matrix, S HH and S VV are the results of copolarized measurement, while S VH and S HV are the results of cross-polarized measurement. Due to reciprocity and backscattering, scattering matrix is in symmetrical form, where S VH = S HV . Then scattering matrix is replaced by three-dimensional vector k, which has a complex Gaussian distribution. Based on this theory and Pauil decomposition, polarimetric vector k and corresponding n-look (n 2) measured coherent matrix which could be shown as: where 1/n is used to make sure the conservation of power. n denotes the quantity of look. We will acquire coherent matrix T , which is a 3 × 3 conjugate symmetrical complexvalued matrix. For the sake of brevity, it is expressed as: It is well known that this matrix follows complex Wishart distribution, the nine complex value in the coherent matrix as nine channels of a pixel in a PolSAR image are applied to express pixel features. Besides its conjugate symmetry and Wishart distribution, the three real parts in leading diagonal also decrease the complexity of the algorithm on image procession. Referring to mentioned conclusions, the data in the upper triangular of this matrix are adequate for investigating pixel specialty. In order to use more abundant data relationship, a new column vector is applied to express the full information of a pixel. This column vector is transformed and shown asT = [T 11 , T 12 , T 13 , T 21 , T 22 , T 23 , T 31 , T 32 , T 33 ] T . Specifically, this column vector comprises the detailed information of each pixel with coordinate point (i, j) in PolSAR data combined by m × n pixels, shown as Figure 3.

B. EXPERIMENTS ON FLEVOLAND DATA SET
Benefitting from the attribution of the NASA/Jet Propulsion Laboratory, Flevoland data, a full polarimetric data, is widely used for PolSAR data classification. A visual intensity RGB image of Flevoland by Pauli decomposition is shown in Figure 4, corresponding to ground truth and legend [63]. As a PolSAR data set of agriculture, this image has 750×1024 pixels and 15 classes ground truth such as forest, bare soil, grass and the like. In order to evaluate the performance of our proposed method in different sampling rate, we randomly select 0.2%, 0.8%, 1.2%, 2.0% labeled samples as training data and the remaining labeled samples are used as testing sets in our experiments. For our proposed novel method, 10% real unlabeled samples are used to train our semi-supervised complexvalued GANs. VOLUME 8, 2020  The experiment parameters are set as follows: the patch size is 32 × 32, the learning rate is 0.0005, and the optimization method is adam with β1 = 0.5 and β2 = 0.999. Aiming to make a fair comparison, except for the c rather than c + 1 neurons in the last fully connected layer, the realvalued convolutional neural network has the similar architecture with our complex-valued discriminator and can extract more reasonable features for PolSAR image classification. The input shape has the size of 32 × 32 × 18, in which the image patch size is 32×32 and the input channel is 18, which is concatenated by real and imaginary part. Figure 5 shows the change of OA, AA, and Kappa with the increase of sampling rate. Our proposed method has obvious advantages when training samples are less than 3.0%, such as when the sample rate is 0.2%, the OA is 91.59%, which outperforms OA of 86.82% in Polarimetric convolutional network, the OA of 81.42% in complex-valued convolutional neural network and the OA of 81.85% in real-valued convolutional neural network. In addition, besides OA, our novel model has marked superiorities in AA and kappa coefficient as well. With the increase of sampling rate, our model is still superior to other models by reduced advantage. Thus it can be seen that our method can get superior performance with less labeled training samples. Both the complex-valued convolutional neural network and Polarimetric convolutional network also have preferable performance when the training samples rate are larger than 3.0% due to their powerful representational ability. In a word, our novel model obtains the best classification accuracy and the real-valued convolutional neural network reaches the worst classification results during the whole sampling range.
In order to further demonstrate the performance of our model, the classification accuracy of each category and OA, AA, Kappa are shown in Table III when sampling rate are 0.2%,0.8%,1.5%, and 2.0%. The bold numbers represent the best results among the four models. In order to visualize the performance advantage of our model in PolSAR data classification, the classification maps obtained by real-valued convolution neural network(RC), complex-valued convolutional neural network(CC), Polarimetric convolutional network(PCN) and our model for Flevoland data are shown in Figure 6.
From the Table 3, we can find that when the sampling rate is less than 2%, complex-valued GANs obtains better classification accuracies in most categories. It has a great improvement in the fifteenth category ''Buildings'' by applying a complex-valued network, especially for complex-valued GANs since this category is scarce in labeled samples. Except ''Buildings'', nearly all other classes have a quite high accuracy which are not lower than 95% when the sampling rate is not smaller than 0.8%. ''Peas'', ''Forest'', ''Potatoes'' and ''Bare soil'' basically maintained at 99+%,while the accuracy of ''Water'' can reach 100%. From the classification map in figure 6, the superiority of our model is proved once again. Compared to RC and PCN, CC and our model can identify different categories well in this complex environment, such as the identification of ''Beet'', ''Potatoes'' and ''Water''. In addition, from these classification maps, we can find that the classification results of CC have more misclassification points and blurred edge than our model.

C. EXPERIMENTS ON SAN FRANCISCO DATA SET
In order to confirm the superiority of complex-valued GANs, a series of experiments have also been performed over NASA/JPL L-band four-look PolSAR image of San Francisco. As a material widely used in PolSAR images researches [64], [65], [66]. This San Francisco data obtained from radar incidence angle 5∼60 degree has 1800 × 1380 pixels with spatial resolution of 10 × 10.
Its Pauil RGB image is divided into five classes consisting of ''Water'', ''Vegetation'', ''High-Density Urban'', ''Low-Density Urban'', and ''Developed'' as shown in Figure 7. Different from Flevoland Data Set, we chose a certain number labeled samples in each class to train models, enumerated as 20, 50, 120, 300. For our main semi-supervised complexvalued GANs model, we randomly selected 10% real unlabeled samples to train network.
For the San Francisco data set, we apply the same contrast experiments and network frameworks with the Flevoland data. We also use the same initialization parameters, such as patch size, learning rate, and optimization method in this data set for validating the advantages of our proposed model. Figure 8 reveals the separate curves of OA, AA, and Kappa   Especially when the number is 10, the OA, AA, and Kappa of our model are about 89.23%, 85.41%, and 84.48%. These three evaluating indicators of semi-supervised GAN improve 10.39%, 21.13%, 16.67% than CC, 10.6%,7.0%,9.7% than PCN and increase 19.99%, 43.91%, 33.31% than RC, respectively. We can find that CC and PCN has preferable results than RC. When the training sample number of each class is more than 50, the classification performance of our model stable at a higher level and have more stable preponderance over CC, PCN and RC. Table 4 shows the detailed classification accuracies of each class including OAs, AAs, and Kappa coefficients and the best result in these four methods is emphasized by overstriking. In order to visualize the performance advantage of our model in PolSAR data classification with fewer labeled samples, the classification maps obtained by real-valued convolution neural network (RC), complexvalued convolutional neural network (CC), Polarimetric convolutional network (PCN) and our model on Flevoland data are shown in Figure 9 while the number of labeled sample used for training is 20. From this table and all classification maps, we can definitely realize our model contributes to better classification results. Because of holding obvious characteristics, the classification accuracies of ''Water'' is always higher than 99% regardless of the number of labeled training samples. Due to the similar features, ''High-Density Urban'' and ''Low-Density Urban'' can't be distinguished well by the first three models except for our model when training samples are insufficient. Besides, with fewer training samples, real-valued convolution neural network (RC), complex-valued convolutional neural network (CC), Polarimetric convolutional network (PCN) are hard to result in satisfied classification performance on ''Low-Density Urban'' and ''Vegetation''. Compared to other three methods, our semi-supervised GANs obtained better classification results in each category. As shown in the Table 4, we can find that while training samples of each class are 20, compared to RC, our proposed model can improve the accuracy of ''Low-Density Urban'' from 58.67% to 93.99%, and increasing accuracy in other each category more than 10%. Form Figure 8, we can find that our model has more obvious advantages in improving classification accuracy when training sample is 10. In addition, with the increase in training samples, our proposed model still shows significant advantages.

D. ABLATION EXPERIMENT ANALYSIS
In this paper, we extend the real-valued neural network to the complex valued domain, and defined the complexvalued convolution (CC), complex-valued batch normalization (CBN), complex-valued activation function. Meanwhile, we proposed semi-supervised GANs to solve the small sample problem in PolSAR data classification task. In order to prove the effectiveness of the above operators and strategies, we take Flevoland Data Set as an example and select 0.8% of the data samples as training data set to carry out the following 6 experiments as shown in Table 5.
Semi-supervised GANs as a strategy to train classification model through generated false data, labeled real data and unlabeled real data. By comparing experiments E1, E3 or E2, E4, we can find that adding GANs to the complexvalued neural network can significantly improve the ability of the model to classify data with fewer labels. For example, the indexes OA, AA and Kappa are increased from 95.36%, 92.47% and 94.95% in Experiment E2 to 97.28%, 95.91% and 97.04% in Experiment E4. By comparing experiments E3, E5 or E4, E6, adding a semi-supervised mechanism based on complex-valued GANs and introducing real unlabeled data into the training process can further improve the robustness and classification ability of the model. The indexes OA, AA and Kappa are increased from 97.28%, 95.91% and 97.04% in Experiment E4 to 98.42%, 97.67% and 98.28% in Experiment E6. It can be seen from the experiment that our semi-supervised GANs mechanism has a significant improvement for PolSAR classification task with fewer labeled data.
The proposed CBN considers the relationship between data in multiple batches, and use the complex-valued average expectation and covariance matrix to normalize complexvalued features in our network. We use three group comparative experiments to verify the effectiveness of the CBN. Through these comparative experiments, we can find that the proposed batch normalization can promote the classification performance of models with different network structures. Taking E5 and E6 as examples, compared with complex-valued semi-supervised GANs without CBN, complex-valued semi-supervised GANs with CBN can increase OA, AA, and kappa by 0.9%, 2.95%, 0.99%. It can be seen that CBN can make better use of the relationship between data in different batches and alleviate the problem of insufficient network training caused by fewer labeled samples.
The proposed activation function CLeakyReLU in this paper is based on the principle of CReLU and LeakyReLU. We learn that LeakyReLU can improve the convergence speed of GANs [53], [54], [60], and CReLU can improve the classification accuracy of complex neural network by literatures [58]. In order to verify the effect of CLeakyReLU, we performance the contrast experiments with the common complex-valued activation modReLU [22] and CReLU. The convergence curves of complex-valued neural network with different activation function are shown in Figure 10. From Figure 10, we can see that the convergence speed of CReLU is faster and the convergence curve is stable, but the classification accuracy is general. As an activation function, modReLU has the worst convergence and classification accuracy in PloSAR classification task. The red line is the convergence curve of the activation function of CLeakyReLU. Although the initial convergence speed is not comparable to that of CReLU, it has obvious advantages over other activation functions after the 20th epoch, and the classification accuracy reaches the highest.

E. GENERATED DATA ANALYSIS
In order to analyze the effectiveness of generated data in our complex-valued GANs, we will discuss and analyze the availability of output data in our ''Complex-valued Generator''. The discussion will be carried out in two aspects: the surface similarity of actual data and generated data after visualized, and the distribution of those two types of data.
The coherent matrix T of PolSAR data contains nine channels with complex structure. We use Flevoland data as an example. In order to visually represent generated data, we display the false-color of the real part of diagonal elements in T, as shown in Figure 11. We randomly select 100 image patches from training samples to constitute 10 × 10 image matrix, shown as (a1)-(a4). The size of each image patch is 32 × 32. Meanwhile, images (b1)-(b4) represent the visualization of generated data, which are generated by two random vectors. This process and principle are described in detail in Section III. Generally, the generated images should have high similarity compared with actual data. From Figure 11, actually, we can clearly see that the generated data have higher surface similarity with the actual data.
As shown in [16], all the real parts of diagonal elements in coherent matrix T follow the Gamma distribution, and all the elements of the imaginary part are 0. However, other non-diagonal elements, no matter the real part or imaginary part, are approximated by Gaussian distribution. These distribution laws aggravated the complexity of PolSAR data, and enhanced the difficulty of analyzing PolSAR data in nature. In order to analyze the similarity of data distribution, we respectively count the real part and imaginary part of T 11 and T 12 from the coherent matrix T of Flevoland data set. Figure 12 shows the histograms of these variables, in which, (a1) and (a2) respectively represent the statistics of the real part and imaginary part in T 11 . There are clear characteristics to indicate that the real part of diagonal elements follows Gamma distribution, shown in Figure 12 (a1). Through T 11 does't have any information in the imaginary part, (a2) denotes a column at 0. The statistics of real part and imaginary part of T 12 are shown as (a3) and (a4), respectively. They are approximated by the Gaussian distribution with the expectation 0.
Simultaneously, (b1) -(b4) in Figure 12 represent the histograms of elements generated by ''Complex-valued Generator''. It has the same output shape with actual data, and each element represents the corresponding attribute. We use T g to represent the coherent matrix of generated fake data. The histogram of the real part of T g 11 in (b1), follows the similar distribution with (a1) and is approximate to Gamma distribution. From (b2), we can see nearly all the elements of the imaginary part of T g 11 are close to 0, not completely equal to 0. From all histogram in Figure 12, we can find that there is slight difference between the actual data and the generated data because the process of generating data in ''Complexvalued Generator'' is a regression problem, and there is no a hard threshold which constraints the variable. In addition, for T g 12 , the real part and imaginary part are following Gaussian distribution, shown as (b3) and (b4). They basically coincide with the distribution of actual data in (a3) and (a4).

V. CONCLUSION AND FUTURE WORK
In this paper, we proposed the semi-supervised complexvalued GANs to solve the problem of lack of training samples in classification of PolSAR images. Nearly all the computations are operated in the complex valued domain, which obeys the physical mechanism of phase and amplitude in PolSAR data. We can obtain generated data similar to actual data by complex-valued GANs. To our best knowledge, this is the first time that a network generates complex-valued data, in which its real part and imaginary part follow different distributions. In order to utilize information of labeled data, VOLUME 8, 2020 unlabeled data and generated data, semi-supervised learning method is used to train and optimize the network. In our novel model, the generalization ability and classification accuracy of the complex-valued discriminator are obviously better than the other models, especially when labeled samples are insufficient. It opens up a new way of solving the problem of lack of complex-valued samples.
In this paper, We develop a series of complex-valued operations. However, the studies on the complex-valued neural networks are still limited. In the future work, we will explore more effective complex-valued neural networks and their theoretical explanations. He is in charge of about 40 important scientific research projects and has published more than 20 monographs and a hundred papers in international journals and conferences. His research interests include image processing, natural computation, machine learning, and intelligent information processing. He is a member of the IEEE Xi'an Section Execution Committee, the Chairman of the Awards and Recognition Committee, the Vice Board Chairperson of the Chinese Association of Artificial Intelligence, a Councilor of the Chinese Institute of Electronics, a committee member of the Chinese Committee of Neural Networks, and an Expert of the Academic Degrees Committee of the State Council.
FANG LIU (Senior Member, IEEE) received the B.S. degree in computer science and technology from Xian Jiaotong University, Xian, China, in 1984, and the M.S. degree in computer science and technology from Xidian University, Xian, in 1995. She is currently a Professor with the School of Computer Science, Xidian University. She is the author or coauthor of five books and more than 80 papers in journals and conferences. Her research interests include signal and image processing, synthetic aperture radar image processing, multiscale geometry analysis, learning theory and algorithms, optimization problems, and data mining.