Change Detection from Synthetic Aperture Radar Images via Graph-Based Knowledge Supplement Network

Synthetic aperture radar (SAR) image change detection is a vital yet challenging task in the field of remote sensing image analysis. Most previous works adopt a self-supervised method which uses pseudo-labeled samples to guide subsequent training and testing. However, deep networks commonly require many high-quality samples for parameter optimization. The noise in pseudo-labels inevitably affects the final change detection performance. To solve the problem, we propose a Graph-based Knowledge Supplement Network (GKSNet). To be more specific, we extract discriminative information from the existing labeled dataset as additional knowledge, to suppress the adverse effects of noisy samples to some extent. Afterwards, we design a graph transfer module to distill contextual information attentively from the labeled dataset to the target dataset, which bridges feature correlation between datasets. To validate the proposed method, we conducted extensive experiments on four SAR datasets, which demonstrated the superiority of the proposed GKSNet as compared to several state-of-the-art baselines. Our codes are available at https://github.com/summitgao/SAR_CD_GKSNet.


I. INTRODUCTION
O WING to the rapid development of earth observation programs, more multitemporal synthetic aperture radar (SAR) images are available, and they are captured over the same geographical area at different times. Since SAR images can be acquired under all-weather and all-time conditions, they have become the most important data source for change detection. SAR image change detection aims to accurately detect the changed information by analyzing two images captured at different times. It is of high practical value to a large number of applications, such as flood detection [1], disaster monitoring [2], urban planning [3], land cover data monitoring [4], and so on.
SAR images are inherently contaminated by multiplicative speckle noise, and this phenomenon makes the SAR image change detection a very challenging task. Therefore, it is essential to develop robust change detection techniques, which can cope with speckle noise. To solve the problem, researchers have devoted great efforts to put forward robust change detection methods. These methods can be broadly categorized into two main streams: supervised methods and unsupervised methods. A supervised method requires prior knowledge about land cover types or a large number of highquality labeled samples [5] [6]. In theory, supervised methods may offer better performance since many detailed descriptions of the changed region are provided. However, unsupervised methods are more popular since high-quality labeled samples are generally difficult to obtain in real applications [7] [8]. Therefore, most existing methods are unsupervised methods.
Unsupervised SAR image change detection methods are commonly composed of the following two steps: difference image (DI) generation and DI classification. In DI generation, the log-ratio [10], Gauss-ratio [9] and neighborhood-based ratio [11] operators are generally used, since these methods are considered robust to calibration errors [12]. In the DI classification step, clustering methods are widely employed to classify pixels into changed and unchanged classes, such as the fuzzy c-means (FCM) [14], k-means [15] and multiple kernel clustering [16].
To enhance the performance of DI classification, researchers incorporate deep neural networks into the traditional unsupervised DI classification model. Hou et al. [18] presented a change detection method by combining deep features and saliency map computation using low-rank method. Zhan et al. [19] proposed a deep siamese CNN model to extract discriminant features. Wang et al. [21] proposed end-to-end 2D CNN framework for change detection. Mixed-affinity matrix was employed for DI analysis, and then CNNs were used to exploit the discriminant features. Du et al. [23] proposed a slow feature analysis-based (SFA) method for change detection. Two deep networks were established to extract multitemporal features, and SFA was employed to extract the most invariant component of multitemporal features. Chen et al. [25] presented a deep siamese multiple-layers recurrent neural network (RNN) for change detection. Multiple-layers RNN was designed to handle the features extracted by CNN, which mapped features into a new space. In [26], a pretrained deep fully convolutional network is used for DI analysis, and on this basis, multiscale superpixel segmentation was employed to for robust change map generation. Zhao et al. [27] proposed a metric learning-based generative adversarial arXiv:2201.08954v2 [cs.CV] 9 Feb 2022 network (GAN) for change detection, where metric learning was incorporated to enhance the stability of GAN model when the number of training samples was limited. Besides these methods, generative adversarial networks [28] [29], fully convolution network [30], bipartite differential network [31], and local restricted CNNs [32] are also employed to solve the problem of remote sensing image change detection.
Existing deep learning-based change detection methods commonly require many training samples to optimize network parameters. These samples are obtained by the self-learning strategy, which generates pseudo-labels from unlabeled DI pixels. Gong et al. [17] assigned pseudo-labels to pixels in DI by an FCM-based joint classifier, then restricted Boltzmann machines (RBMs) were trained to generate the final change map. In [20], a PCA-based neural network was introduced for SAR image change detection. PCA was employed as the cascaded filter for multitemporal feature analysis. Gao et al. [22] utilized convolutional-wavelet neural networks for SAR image change detection. Dual-tree complex wavelet transform is introduced for DI analysis, and the speckle noise can be suppressed effectively. In [24], a nonnegative and Fisherconstrained autoencoder was designed to discover changed information from the DI. However, due to the limitations of clustering algorithms, these noisy pseudo-labeled samples contain error, and the error will be amplified during training [33]. To solve the problem, we are dedicated to developing transfer learning-based change detection methods which can adapt additional knowledge from existing data with labels to new data without labels.
Some efforts have been made in transfer learning for change detection. Gao et al. [34] presented a CNN-based transfer learning model for SAR image change detection. Liu et al. [35] trained a U-Net from the source dataset and then transferred the pretrained model to the target dataset by minimizing a new designed loss function. Yang et al. [36] proposed a multitask transfer learning scheme for change detection. Two tasks were learned simultaneously: One for the source domain with labels, and the other for the unlabeled target data reconstruction. The aforementioned change detection methods use CNN or autoencoder model for knowledge transfer. We argue that if the knowledge from the source domain were distilled to target unlabeled data in more structured format, the change detection performance can be further improved.
Recently, there has been a surge of interest in graphbased methods. Graph reasoning has shown to have substantial practical merits for object detection [37] [38] [39], image classification [40] [41] [42], semantic segmentation [43] [44] and change detection [45] [46] [47]. Graph neural networks are powerful tools that can perform relational inference through message passing. The domain knowledge modeled in a single graph can be transferred to other graphs. Therefore, the graphbased method is natural to be adopted in the transfer learningbased change detection task. However, there are two problems regarding the following aspects: 1) How to suppress noisy samples in the target dataset via graph-based model? In the target dataset, the pseudo-labeled samples selected from the DI inevitably contain some errors. If the model is blindly confident of these incorrect samples, the error will be amplified during training. 2) How to transfer knowledge among datasets with different characteristics? SAR images captured by different satellite sensors have disparate feature representations of ground objects. It is challenging to transfer knowledge between datasets acquired by different sensors directly.
To handle the above-mentioned problems, we propose a Graph-based Knowledge Supplement Network (GKSNet) for SAR image change detection. On the one hand, the extracted image features from existing labeled dataset are projected into a graph. After message propagation via graph convolutions, the obtained features are more discriminative, and these features are employed as additional knowledge for the target dataset. By knowledge supplement, more reliable information is introduced, and the adverse effects of noisy samples can be suppressed to some extent. On the other hand, a graph transfer module is proposed to distill contextual information attentively from the labeled dataset to the target dataset as supplementary knowledge. The knowledge bridges feature correlation from different datasets.
In summary, the main contributions of this article are threefold: 1) We perform SAR image change detection via a graphbased knowledge supplement network. Existing SAR image change detection methods cannot handle well errors in pseudo-labeled samples. The proposed network can suppress adverse effects of noisy samples by adding discriminative information from a labeled dataset. 2) In order to better integrate the supplementary knowledge, we propose a graph transfer module. Through feature fusion, the model can exploit the common knowledge and bridge the feature correlation between different datasets. Then, evolved features can be obtained to improve change detection performance. 3) We conducted extensive experiments on five SAR datasets to validate the effectiveness of GKSNet and the superiority of evolved features. As a side contribution, we have released our codes to benefit other researchers. The remainder of this paper is organized as follows. Section II presents the details of the proposed GKSNet, including intra-graph reasoning and inter-graph fusion. Section III provides the experimental results together with the corresponding analysis and discussion. Finally, conclusions are drawn in Section IV.

II. METHODOLOGY
To alleviate the impact of noisy samples and enhance features, we aim at incorporating underlying knowledge from the labeled dataset via a graph-based network. Fig. 1 gives an overview of the proposed GSKNet. The proposed model can be embedded in any CNN-based classification model by enhancing its original convolution features vis graph transfer learning. Firstly, features extracted by CNNs are projected into a graph, and then the compact graph representation is learnt and propagated via intra-graph reasoning. Further, we transfer the graph representations using an inter-graph fusion module across different datasets.

A. Preclassification and Reliable Samples Selection
Given a pair of SAR images I 1 and I 2 that are captured over the same geographical area where an event of change happens, we aim to generate a binary change map I cm : I cm (i, j) ∈ {0, 1}, where 1 denotes that the position (i, j) is changed, and 0 denotes that the position (i, j) is unchanged. To this end, we need to create an initial change map with pseudo-labels via pre-classification.
The first step of pre-classification is to generate a DI using the log-ratio operation. It is widely acknowledged that the log-ratio operator can reduce the influence of speckle noise, since it can transform multiplicative noise into additive noise and compress the range of values. After obtaining the DI, a pre-classification operation needs to be performed to obtain the pseudo-labels and training samples. Compared to other methods, a hierarchical clustering algorithm was proposed in [20], which can better obtain enough representative samples for subsequent network training. Therefore, the hierarchical clustering algorithm [20] is employed to divide DI into three clusters {ω c , ω u , ω i }, where ω c and ω u represent the changed and unchanged classes, respectively, and ω i represents the uncertain class. Pixels belonging to ω c have high probabilities to be changed, while pixels belonging to ω u have high probabilities to be unchanged. Therefore, ω c and ω u are selected as reliable training samples for GKSNet training. Pixels from ω i will be further classified by the GKSNet. To suppress the adverse effects of noisy samples, we reduce the number of training samples, which will be discussed in detail later.
The contextual information is critical for robust feature representation. Therefore, image patches centered at ω c and ω u are extracted from the original SAR images, and these patches are fed into the GKSNet as training samples. Let R 1 k denote the image patch centered at pixel k in I 1 , and R 2 k denote the corresponding image patch in I 2 . The size of image patch is r × r. Two patches are combined to form a training sample R k with the size of r × r × 2. It should be noted that r is an important parameter, which will be discussed in Section III.

B. Intra-Graph Reasoning
Recently, many deep learning-based methods have been proposed for SAR image change detection. However, these methods commonly use a self-learning strategy to generate pseudo-labeled samples from unlabeled SAR data. However, these pseudo-labeled samples contain errors. These errors may be amplified during training [48]. Therefore, ensuring robust feature representation while suppressing the errors in pseudolabeled samples is the key to improving the change detection performance.
To address this issue, we propose a Graph-based Knowledge Supplement Network (GKSNet) which can extract the common knowledge existing in the labeled dataset as feature supplement to ensure robust parameter optimization, as illustrated in Fig.1. Firstly, we extract features from the labeled dataset and target dataset through CNNs, respectively. Features from the labeled dataset are defined as X l ∈ R h×w×c , and features from the target dataset are defined as X t ∈ R h×w×c , where c is the number of channels, h and w are the height and width of the feature map, respectively. Then, the feature maps X l and X t are projected into high-level graph representation Y l ∈ R N ×d and Y t ∈ R N ×d . Here N = h × w denotes the vertices of the graph, and d denotes the desired feature dimension. The projection can be defined as the function f (·) as: Subsequently, we leverage a learnable adjacency matrix to encode feature relations by graph reasoning. The graph representations focus on local features, so we carry out graph propagation over the representations Y l and Y t to generate the evolved feature Y n l and Y n t by following graph convolution [49] as: where W e l ∈ R d×d and W e t ∈ R d×d are trainable weight matrices, σ is the ReLU function, n = 1, 2, 3 represents the first, second, and third graph convolution, respectively. It should be noted that Y 0 l = Y l and Y 0 t = Y t . The node adjacency weight matrices A e l and A e t are learnable matrices. They are capable of learning the correlation between different nodes in the graph. We utilize a learnable matrix as the node adjacency weight matrix. In this way, the adjacency matrices A e l and A e t are randomly initialized, which can be learned during training.
The evolved features Y n l and Y n t are fused through the inter-graph fusion, resulting in the new target graph feature. To sufficiently propagate global information and produce hierarchical features, graph convolution is implemented several times as shown in Fig. 1. In our implementation, three graph convolutions are utilized.
Finally, the evolved features are utilized to boost the image representation. Similar to Eq. 1 and 2, the final graph representation is reprojected to image features. Residual connections [50] are used to further enhance the visual representation with the original feature map to obtain the enhanced feature. The implementation details are shown in Algorithm 1.

Algorithm 1
The workflow of computing enhanced features by graph-based knowledge supplement network Input: Convolution features X t and X l . Output: A couple of enhanced features X e . 1: Apply projection to get the high-level graph representation Y t and Y l 2: for n = 1, 2, 3 do 3: Get the evolved feature Y n l and Y n t : Fuse the evolved feature through inter-graph fusion to replace the original Y n t Y n t = fusion(Y n l , Y n t ) 5: end for 6: Adding Y n t to the original feature map X t to form the final enhanced features X e

C. Inter-Graph Fusion
To effectively supplement the knowledge extracted from the labeled dataset to the target dataset, a fusion module is essential to distill relevant semantics attentively from one source graph to another target graph. The straightforward solution is to pose them as different branches and combine them directly. However, the underlying contextual information and feature correlations are ignored.
In this paper, we design a graph dependency fusion module to bridge the features of different datasets, as shown in Fig Fig. 2. Illustration of Inter-graph fusion.
Let G l = (V l , E l ) denote the labeled graph and G t = (V t , E t ) denote the target graph. The graph is represented by a matrix Y ∈ R N ×d , where N is the number of vertices in the graph, and d is the feature dimension. After graph convolution, the evolved graph features Y n t and Y n l can be obtained from the graph representation Y t and Y l , respectively. Subsequently, graph dependency fusion is used to fuse the evolved graph features which can be formulated as: where FC represents the fully connected layer, and σ is the ReLU function. Here Y n i is the intermediate graph, which represents a transition from the labeled graph to the target graph. The direct connection between the labeled graph to the target graph may ignore or dilute feature correlations to some extent. Therefore, the intermediate graph is introduced to enhance the feature correlations, which can be calculated as: where is a transfer matrix, which is defined according to feature similarity between vertices of Y t and Y l , where N t and N l represents the number of vertices of Y t and Y l , respectively. The node adjacency weight a i,j can be calculated as: where cos(v i , v j ) is the cosine similarity between v i and v j . v i is the feature of the i th node of corresponding graph, and v j is the feature of the j th node.
With the well-defined dependency matrix, the labeled graph knowledge and target graph features can be fused and propagated by graph convolution, as expressed in Eq. 3 and 4. Accordingly, the supplementary knowledge and extracted features can be associated and propagated via the inter-graph fusion, which promotes the whole network to generate enhanced features for change detection. After the enhanced features are  acquired, they are fed into a classifier consisting of two fully connected layers. The first fully connected layer is used to map features into a low-dimensional feature space, followed by a fully connected layer to map features into changed or unchanged classes, so as to obtain the final change detection results.

D. Embedded Feature Enhancement Model
As shown in Fig 3, the proposed model can be combined with other CNN-based models. Since it do not change the size of input, the proposed model can be embedded in any convolutional layer. To eliminate the influence of noisy samples, the number of training samples may not meet the needs of parameter optimization when only the features of target dataset are used. With the help of intra-graph reasoning and inter-graph fusion, the proposed GKSNet can alleviate the problem of noisy samples and stabilize the parameter optimization during joint training. When GKSNet is combined with an existing CNN-based network, the extracted features will be enhanced through intra-graph reasoning and inter-graph fusion.
Another merit of the proposed GKSNet is the capability of training on two datasets simultaneously in an end-to-end way. Benefiting from the use of intra-graph reasoning and inter-graph fusion, features from two datasets can be trained simultaneously, rather than fine-tuning on the target dataset after training on the labeled dataset, which is different from other transfer learning-based studies.

III. EXPERIMENTAL RESULTS AND DISCUSSIONS A. Dataset Description and Evaluation Criteria
The proposed method is validated on four multitemporal SAR datasets. Since the ground truth data is essential for accuracy assessment, the ground truth change maps were manually annotated carefully with expert knowledge. It should be noted that geometric corrections and coregistration have been conducted on these datasets.
The first dataset is the Rome dataset. As illustrated in Fig. 4, the images are captured over an area near Rome, Italy, by the European Remote Sensing (ERS-2) satellite SAR sensor and has the size of 256×256 pixels. The images were collected in April 2003 and June 2003, respectively. The spatial resolution of the dataset is 25 m × 25 m. The ground-truth change map is annotated by experts with prior knowledge and photo interpretation, as shown in Fig. 4(c). The second dataset is the Ottawa dataset. The images were captured by the Radarsat sensor in May 1997 and August 1997, respectively. As illustrated in Fig. 5, the dataset contains images captured over the city of Ottawa. The images were provided by the National Defense Research and Development Canada, and it shows the changed information in areas affected by floods. The available ground truth image is generated by integrating rich knowledge and photo interpretation. The spatial resolution of Ottawa dataset is 10m, and the size is 290× 350 pixels.  The third dataset is the Seoul dataset, which is shown in Fig. 6. Both images were captured around Seoul by the ERS-2 satellite in August 2002 and October 2002, respectively. One typical region of 256× 256 pixels is chosen to demonstrate the efficacy of the proposed GKSNet. The spatial resolution of Seoul dataset is 25m, which reflects the change of river level before and after precipitation. The fourth dataset is the Florence dataset (Fig. 7), which was captured over the city of Florence, Italy with 25m spatial resolution. Some parts of the river have changed during the acquisition time. Both images are captured in July 2004 and September 2004, respectively, by the ERS-2 satellite SAR sensor. To better display the change information, an area with the size of 256× 256 pixels is selected. The ground truth image is shown in Fig. 7(c). The last dataset is the Bern dataset (Fig. 8), which was captured over an area near the city of Bern, Switzerland in April and May 1999, respectively. Two images show the geomorphic changes after the river Aare flooded parts of the cities of Thun and Bern and the airport of Bern entirely. A section (301× 301 pixels) of two SAR images acquired by the ERS-2 satellite SAR sensor was chosen to verify the effectiveness of various changed detection methods. The ground truth image is shown in Fig. 8(c).
Quantitative evaluation indices including false positives (FP), false negatives (FN), overall errors (OE), percentage correct classification (PCC), Kappa coefficient (KC) and F1 score (F1) are used to evaluate the proposed method. The FP is the number of pixels which are unchanged in the ground truth image but falsely identified as changed in the change detection result. The FN denotes the number of pixels which are changed in the ground truth but falsely identified as unchanged in the change detection result. The TP is the number of pixels which are changed in the ground truth image and truly classified as changed. N u represents the number of unchanged pixels in the ground truth image, and N c represents the number of changed pixels in the ground truth image. Then, the OE can be computed by using OE = FP+FN. The PCC can be computed by: KC can be formulated as: (10) F1 can be formulated as: All our experiments are implemented on one NVIDIA GeForce 2080Ti GPU. The model is trained for 300 epochs. The first 100 epochs maintained a learning rate of 0.0001, and the learning rate is decayed by a factor of 0.5 every 50 epochs.

B. Number of Graph Convolution
In the proposed GKSNet, Intar-graph reasoning is a critical component. To sufficiently propagate global information and extract hierarchical features, we need to analyze the number of graph convolutions. The first experiment tests the graph convolution number n in the GKSNet. The performance of change detection is evaluated by taking n = 1, 2, 3, 4, and 5. The corresponding PCC values are employed as the validation criterion. Fig. 9 shows the quantitative analysis result of the graph convolution number n in the proposed GKSNet. It can be observed that when n = 3, the best performance is achieved on the Ottawa, Seoul, Florence and Bern datasets. On the Rome dataset, the best results are achieved when n = 4. With the increase of the graph convolution number, the changed information can be better identified with more global information. However, in most cases, when n > 3, stacking more layers in the GKSNet leads to the over smoothing of the output features. This is because graph convolution, as a low-pass filter, tends to homogenize the features of different nodes when multiple layers of graph convolution are stacked. In addition, fewer training samples are more likely to cause such problems. Therefore, considering the computational efficiency and accuracy, n is set to 3 in our following experiments. Visual comparison of different n values on the Seoul dataset is illustrated in Fig. 10. It can be observed that when n = 3, the generated change map is the most similar to the ground truth. When n > 3, multiple graph convolution will lead to excessive dissemination of information, which results in the confusion of detailed features. The regions marked in the blue box in Fig. 10 demonstrate the improvements when n = 3.

C. Number of Training Samples
Deep learning-based methods commonly require a large number of samples for parameter optimization. Therefore, the number of training samples is a critical parameter in the proposed model. In this paper, we intend to suppress noisy samples while ensuring robust features. Hence, unlike other methods [17][20] [52] which generally take about 10% of the total pixels in the dataset as training samples, the proposed GKSNet requires fewer training samples. Accordingly, the proposed GKSNet effectively reduces the impact of less reliable samples.
We selected 1%, 2%, 3%, 4% and 5% pixels as training samples. Fig. 11 shows the relationship between PCC values and training sample numbers on five datasets. It can be observed that the PCC values can reach a satisfying level when only 3% training samples are used for most datasets. For the Rome and Bern dataset, the best result is achieved when the training sample number ratio is 2%. After that, the PCC value tends to be stable when the number of training samples grows. Thus, we select 2% pixels as the training samples on the Rome and Bern dataset, and 3% on the other datasets. This ratio can not only obtain good performance, but also help to achieve computational efficiency. As mentioned before, the training samples of some existing methods are generally 10%, which is 3 to 5 times greater than the proposed GKSNet. It is evident that the proposed GKSNet is capable of exploiting the common knowledge and does not require a large number of training samples.

D. Analysis of the Patch Size
The size of the patch is an important parameter that controls the spatial contextual information contained in the input data. When the size is small, the spatial information contained in the data is insufficient, which leads to the lack of discrimination of the samples; When the size is large, additional interference information will inevitably be introduced, which affects the final result. Therefore, to verify the effect of different sizes of patches on the final change detection result, relevant experiments are carried out in this subsection to select the optimal patch size. Let r denote the size of patches taken from the original image for feature extraction. We evaluate the change detection performance by taking r = 3, 5, 7, 9, 11, and 13.
As illustrated in Fig.12, we can see that the PCC value is not satisfying when r <= 7, because small patches may not contain enough contextual information for discrimination. On the Florence and Bern datasets, the GKSNet achieves the best result when r = 7. While on the Rome, Ottawa, and Seoul datasets, the proposed method achieves the best performance when r = 9. It indicates that different image scenes require different receptive fields for feature extraction. Therefore, in our implementation, we set r = 7 on the Florence and Bern datasets and r = 9 on the Rome, Ottawa, and Seoul datasets.

E. Combinations of Labeled and Target Datasets
The proposed GKSNet aims to extract discriminative information from the labeled dataset, and then this information is employed to supplement knowledge to the target dataset. However, due to different data distribution, different combinations of labeled and target datasets will generate different results.   Table I, it can be observed that different combination of the labeled datasets and target datasets can produce various change detection results. Since image feature distribution varies among different datasets, datasets with similar distributions commonly perform better in knowledge supplements. The Ottawa, Florence and Bern datasets reflect the surrounding areas of the city, while the Seoul and Rome datasets account for more natural landforms, which provides a basis for their complementary features. In addition, it can be seen that GSKNet still produces competitive results with fewer training samples, even if it is not the optimal combination, which also proves that the proposed method can effectively transfer knowledge among datasets.

F. Experimental Results and Discussion
To validate the performance of the proposed GKSNet, we compare it with several state-of-the-art methods, including PCAKM [15], NR-ELM [51], GaborPCANet [20], LR-CNN [13], MLFN [34], DBN [17] and DCNet [52]. For PCAKM, the contextual information is analyzed by principal component analysis, and the extracted features are clustered by k-means algorithm. NR-ELM utilizes the neighborhood-based ratio operator to obtain reliable training samples. Then, ELM is employed to train a model by using these samples. GaborP-CANet is a simplified deep learning model which is comprised of several PCA layers and binary hashing layers. LR-CNN is formed by imposing a spatial constraint on the output layer of CNN. MLFN proposed a transferred multilevel fusion network, which trained on a large dataset to transfer deep knowledge from the data set to the limited training data. In DBN, a deep belief network is utilized for SAR image change detection task. DCNet establishes a very deep cascade network to exploit discriminative features and introduce a fusion mechanism to combine the output of different hierarchical layers to further alleviate the exploding gradient problem. For fair comparison, compared methods are implemented by using the default parameters. Both visual and quantitative analyses are made to reflect the results of various methods more intuitively and effectively. 1) Results on the Rome Dataset: From Fig. 13, we can see that the dataset contains complex spatial structures, especially there are some unchanged pixels within the changed area. Besides, the Rome dataset is also seriously interfered by speckle noise. Thus, it is a challenging task to identify changed pixels accurately in this dataset. From Table II, we can see that in addition to NR-ELM and the proposed GKSNet, other  2) Results on the Ottawa Dataset: Fig.14 illustrates the change detection result by different methods on the Ottawa dataset. The corresponding evaluation metrics are listed in Table III, where it can be observed that for PCAKM, NR-ELM, GaborPCANet, LR-CNN, MLFN and DBN suffer from high FP values, resulting in some noisy areas as marked in red circles. Moreover, deep learning-based methods (MLFN, DBN, DCNet and the proposed GKSNet) achieve better per- formance than shallow models. Based on visual comparisons, it is obvious that the proposed GKSNet provides more similar results to the ground truth in the marked area. Specifically, in the marked area, PCAKM, NR-ELM, GaborPCANet, LR-CNN and DBN generate extra changed region. DCNet ignores some detailed information, and many changed pixels are missed. However, the changed detection result by the proposed GKSNet is the most similar to the ground truth change map. Table III shows that GKSNet yields the best PCC value of 98.37%, which has increased by 0.17% at least compared with other methods. It is evident that the proposed GKSNet can exploit the structure similarity and underlying common knowledge from different datasets. The comparisons demonstrate the superior performance of the proposed method on the Ottawa dataset.
3) Results on the Seoul Dataset: Fig. 15 presents the change detection results on the Seoul dataset. The corresponding quantitative metrics are listed in Table IV. The results of PCAKM, NR-ELM and GaborPCANet miss many changed regions, thus these methods suffer from very high FN values. For LR-CNN, MLFN, DBN and DCNet, the values of FP are relatively high since some unchanged regions are generally divided into changed regions, which can be partly attributed to noisy samples. In contrast, GKSNet effectively avoids the influence of noisy samples and improves the ex-  perimental performance by introducing additional knowledge from other data. Moreover, it can be seen that deep learningbased methods (MLFN, DBN, DCNet and GKSNet) perform better than classical shallow models. Compared with DBN, the KC value of the proposed GKSNet has increased by 2.25%. Compared with DCNet, the KC value of the proposed GKSNet has increased by 0.68%. This demonstrates that the proposed GKSNet is suitable for Seoul dataset. From the marked regions in Fig. 15, we can observe that the changed area of the results of PCAKM, NR-ELM, GaborPCANet and LR-CNN is clearly reduced, which is consistent with the evaluation metrics in Table IV. Besides, the results of MLFN, DBN and DCNet generate many changed pixels, which results in contour inconsistent with the ground-truth map. Among all methods, the result of GKSNet is much closer to the ground-truth map. This further shows that the GKSNet can well integrate the features of different datasets to enhance the ability of feature representation for satisfying change detection results.   Fig.16. It can be noticed that there are many false alarms in the results of PCAKM, NR-ELM, GaborPCANet and DBN, which is consistent with the higher FP values in Table V. For LR-CNN and DCNet, they missed many changed pixels. Among these methods, the proposed GKSNet has the best PCC, KC and F1 values and generates the best change map, which is very similar to the reference ground truth map. The comparison shows that the proposed GKSNet is powerful in discriminative feature extraction and is effective on the Florence dataset.

5) Results on the Bern Dataset:
The visualization of change detection results on the Bern dataset is shown in Fig.17, and the quantitative evaluation metrics are listed in Table VI. From  Table VI, we can see that GKSNet has the smallest OE value, which means that it can produce result with the least misclassification. Moreover, the satisfactory results achieved by the proposed method on this open source dataset demonstrate the effectiveness of GKSNet. The visualisation results also confirm this conclusion. The generated change detection map is similar to the ground truth map, which proves that the proposed method can effectively deal with the noise in pseudo- Based on the experiments on these real SAR datasets, the proposed GKSNet offers better performance over classical models. Besides, by combining with the supplementary knowledge, the proposed GKSNet yields superior performance over other deep learning-based methods in most cases with less training samples. Moreover, the inter-graph fusion exploits the feature similarity and underlying common knowledge among different datasets, which further improves the change detection performance.

G. Ablation Studies
The proposed GKSNet is employed to obtain enhanced features conditioned on a simple but complete network, called basic network. The basic network is illustrated in Table. VII which is used to extract the initial features. To further discuss and validate the effectiveness of the two components of GK-SNet, we have made ablation studies on the five SAR datasets. We designed three variants: 1) Basic Network denotes the backbone, which does not absorb the knowledge supplement provided in additional datasets, but trains and tests on one dataset. 2) w/o Inter-Graph Fusion refers to the GKSNet using intra-graph convolution to generate the evolved features but combining the graph convolutional features directly, instead of using inter-graph fusion. 3) Gaussian Kernel refers to using Gaussian kernel to measure the similarity between data, instead of consine similarity. 4) GKSNet denotes the complete model.  As reported in Table VIII, by combining the supplementary knowledge distilled from other datasets, the result acquires approximately 0.5% improvements compared with the basic network. The first row shows that without knowledge supplement, a small number of training samples are insufficient to meet the needs of parameter optimization. Furthermore, the proposed GKSNet provides inter-graph fusion mechanism to combine the features from different datasets. By employing the inter-graph fusion, the PCC values increased at least 0.18%. It is evident that inter-graph fusion promotes change detection performance. It is reasonable that direct feature fusion from different datasets cannot fully explore their correlations, so we introduce the inter-graph fusion which can learn proper feature dependency and knowledge integration among different datasets. Besides, in order to find a more appropriate dependency for inter-graph fusion, we use Gaussian kernel instead of cosine similarity to explore the impact of different data similarity on the experimental results. As can be seen from the Table VIII, Gaussian kernel can also achieve relatively  satisfying performance, but the cosine similarity performs  better. Finally, by comparing the result of basic network with  Table I, it can be observed that even if the data distribution of the labeled dataset is not very similar to the target dataset, it still improve the change detection performance.
To show the changes brought by GKSNet more intuitively, the results of basic network, w/o Inter-Graph Fusion, Gaussian Kernel and GKSNet are illustrated in Fig. 18. Some obvious differences are marked with red circles. In the case of the Rome dataset, we can see that the result of the basic network contains many noisy regions, since the features are not enough for parameter optimization. By contrast, w/o inter-graph fusion enriches the training features by introducing additional supplementary knowledge, which can better determine the change information of some complex areas, and thus obtain better results. When Gaussian kernel is introduced, it also achieves a good performance. Moreover, the proposed GKSNet not only adds additional knowledge, but combines supplementary knowledge with extracted features. It improves the feature representation and obtains better change detection results.  Fig. 19 visualizes the features generated by four variants. The samples of the same class distribute more closely after intra-graph reasoning and inter-graph fusion. It indicates that the proposed GKSNet achieves the best performance.

H. Runtime Comparisons
Time consumption is one of the important factors that restricts the application of deep learning-based methods in change detection. Table IX shows the runtime of the proposed GKSNet with other methods. We can see that traditional methods cost less time because of their relative simple models. For DBN and DCNet, due to the fact that they have complex models and require many training samples for parameter optimization, the time consumption is greater than other methods. However, compared with other deep learning-based methods, the proposed GKSNet is superior in computational time. The reason is that GKSNet only takes 20 to 30 percent of the training samples compared with other methods, which greatly reduces the training time. It should be noted that notwithstanding the establishment and propagation of graph network commonly takes about 20s, the proposed GKSNet exhibits high efficiency.
IV. CONCLUSIONS In this paper, we improve the SAR image change detection by alleviating the effect of noisy samples and utilizing the common knowledge hidden in other datasets. To this end, we proposed a graph-based knowledge supplement network, termed GKSNet. On the one hand, image features from a labeled dataset are projected into a graph. After message propagation via graph convolution, the obtained features are employed as additional knowledge for the target dataset. On the other hand, a graph transfer module is proposed to distill related contextual information attentively from the labeled dataset to the target dataset as supplementary knowledge. The promising experimental results on five real SAR datasets verify the effectiveness of the proposed GKSNet in enhancing feature extraction. In future, attempts will be made to enhance the interpretability of dependencies between mutlitemporal images, and we will to try to reduce the computational burden caused by the graph convolution.