Multitarget Domain Adaptation for Remote Sensing Classiﬁcation Using Graph Neural Network

—Remote sensing deals with huge variations in geography, acquisition season, and a plethora of sensors. Considering the difﬁculty of collecting labeled data uniformly representing all scenarios, data-hungry deep learning models are often trained with labeled data in a source domain that is limited in the above-mentioned aspects. Domain adaptation (DA) methods can adapt such model for applying on target domains with different distributions from the source domain. However, most remote sensing DA methods are designed for single-target, thus requiring a separate target classiﬁer to be trained for each target domain. To mitigate this, we propose multitarget DA in which a single classiﬁer is learned for multiple unlabeled target domains. To build a multitarget classiﬁer, it may be beneﬁcial to effectively aggregate features from the labeled source and different unlabeled target domains. Toward this, we exploit coteaching based on the graph neural network that is capable of leveraging unlabeled data. We use a sequential adaptation strategy that ﬁrst adapts on the easier target domains assuming that the network ﬁnds it easier to adapt to the closest target domain. We validate the proposed method on two different datasets, representing geographical and seasonal variation. Code is available at https://gitlab.lrz.de/ai4eo/da-multitarget-gnn/.


Multitarget Domain Adaptation for Remote Sensing Classification Using Graph Neural Network
Sudipan Saha , Member, IEEE, Shan Zhao , and Xiao Xiang Zhu , Fellow, IEEE Abstract-Remote sensing deals with huge variations in geography, acquisition season, and a plethora of sensors.Considering the difficulty of collecting labeled data uniformly representing all scenarios, data-hungry deep learning models are often trained with labeled data in a source domain that is limited in the above-mentioned aspects.Domain adaptation (DA) methods can adapt such model for applying on target domains with different distributions from the source domain.However, most remote sensing DA methods are designed for single-target, thus requiring a separate target classifier to be trained for each target domain.To mitigate this, we propose multitarget DA in which a single classifier is learned for multiple unlabeled target domains.To build a multitarget classifier, it may be beneficial to effectively aggregate features from the labeled source and different unlabeled target domains.Toward this, we exploit coteaching based on the graph neural network that is capable of leveraging unlabeled data.We use a sequential adaptation strategy that first adapts on the easier target domains assuming that the network finds it easier to adapt to the closest target domain.We validate the proposed method on two different datasets, representing geographical and seasonal variation.Code is available at https://gitlab.lrz.de/ai4eo/da-multitarget-gnn/.Sudipan Saha and Shan Zhao are with the Department of Aerospace and Geodesy, Data Science in Earth Observation, Technical University of Munich, 85521 Ottobrunn, Germany (e-mail: sudipan.saha@tum.de;shan.zhao@tum.de).

Index
Xiao Xiang Zhu is with the Department of Aerospace and Geodesy, Data Science in Earth Observation, Technical University of Munich, 85521 Ottobrunn, Germany, and also with the Remote Sensing Technology Institute (IMF), German Aerospace Center (DLR), 82234 Weßling, Germany (e-mail: xiaoxiang.zhu@dlr.de).
Digital Object Identifier 10.1109/LGRS.2022.3149950distribution.However, such assumption often does not hold in remote sensing as differences are induced by geographic variation, differences in acquisition season, and sensor.There are works related to domain adaptation (DA) [1] that try to align the target distribution with the source distribution.Most DA methods adapt a single unlabeled target from a single labeled source domain.Such methods include those based on generative modeling [2], adversarial training [3], and statistical alignment [4], [5].Such models are not suitable for practical setting in remote sensing as we may come across many target domains and separate model needs to be trained for each target domain.E.g., if the training data consists of images corresponding to a city, every other city can be considered as a different domain.Recently, some works in the computer vision literature have addressed this issue by designing methods to adapt to multiple target domains simultaneously from a single source domain [6], a setting called as multitarget domain adaptation (MTDA).
In multitarget setting, it is important to learn a classifier that generalizes across multiple target domains.Given the intrinsic nature of this task, we argue that learning robust features in a unified space may be beneficial.Thus, feature aggregation can be a suitable direction for multitarget adaptation.Toward this, graph neural networks (GNNs) have been found effective [7].This motivated us to design a GNN-based incremental approach for MTDA in the context of remote sensing image classification.A GNN with episodic training is integrated to mitigate underlying domain shifts and the adversarial learning is further adopted to close the gap between the source and target distributions.Inspired by coteaching [8] that can exploit noisy labels, we design a dual-head classifier network that consists of a base feature extractor followed by two classifier heads, a multilayer perceptron (MLP) head and a GNN head.The network is first trained with the labeled source samples.Following this, one target domain at a time is processed to further train the network, ordered by their level of difficulty.Training is accomplished with mini-batches comprising of samples from both source domain and target domains.While the MLP-based classifier focuses on individual samples, the GNN-based classifier aggregates feature from different samples in the minibatch.Similar to coteaching [8], they help each other in an iterative manner to learn more effective target classifier.An incremental training scheme aligns conditional distributions across domains by gradually obtaining pseudolabeled target data.Classspecific representations are further learned by aggregating the source and target features and passing through deep GNNs.
The proposed MLP-GNN classifier is seamlessly equipped with a domain discriminator, which further closes the gap between the source and target distributions.
The contributions of this work are as follows.
1) We propose a GNN-based method for MTDA that starts by learning a classifier on the source domain and incrementally updates it on the target domains.2) We introduce the coteaching [7], [8] in the context of remote sensing DA. 3) We experimented on two multitarget scenarios, one with geographic variation (multiple cities) and the other with season variation (multiple seasons).The datasets are derived from LCZ42 dataset [9] and Sen12-MS dataset [10], respectively.Our experiments indicate the efficacy of the proposed method.We present the related works in Section II and the proposed method in Section III.Experimental results are presented in Section IV.Finally, the letter is concluded in Section V.
MTDA is to learn a robust predictor for all the target domains given one labeled source dataset and multiple unlabeled target datasets that differ in data distributions [6], [17].Few single-target DA methods can be applied in the multitarget setting.Multiteacher MTDA (MT-MTDA) proposed by Nguyen-Meidine et al. [6] used knowledge distillation to iteratively distill domain knowledge from multiple target domains to a common classifier.
Adapting the source classifier to each target domain sequentially for multitarget adaptation is related to incremental learning.Incremental learning [19] gradually extends the existing model's capacity by digesting sequentially available data for the upcoming new tasks.GNNs have the advantage over other neural networks by allowing to capture the interaction between data.Nodes of graphs can represent objects/images and the relationship between nodes are encoded in edges.GNNs can leverage unlabeled data to improve the performance of supervised learning by label propagation and message passing [7], [20].Many DA problems have also been addressed by hierarchical [21], active learning [22], and semisupervised [23] methods.
Coteaching is a deep learning paradigm first introduced in [8] that simultaneously trains two deep neural networks and let them teach each other.

III. PROPOSED METHOD
Let us assume, we have a labeled source domain dataset S containing n s labeled samples (x s,i , y s,i ) n s i=1 .Our goal is to exploit this dataset to train a classifier for N target datasets T = {T j } N j =1 each having n t j unlabeled samples (x t j,k , y t j,k ) n t j k=1 .We assume that source domain has sufficient samples to train an initial model (Section III-B).Furthermore, we do not assume any labeled samples from the target domains, however, we assume that the target domains share the same label space as source domain consisting of n c classes.

A. Model Key Components
Given an input image x, a feature extractor base network F is used to extract the features from the input image f = F(x).Extracted features are fed simultaneously to a MLP classifier G mlp and a GNN-based network G gnn .The G gnn consists of an edge-network f edge and a node classifier f node .In addition to the G mlp and G gnn , a domain discriminator network D is used.The weights of networks F, D, G mlp , f edge , and f node are represented as θ, ψ, φ, ϕ, and ϕ , respectively.
We use Resnet-18 as feature extractor, however, any other suitable model can be used.G mlp consist of a fully connected (FC) layer mapping 256-dimensional output from F to n c -dimensional output.The f edge and f node consist of series 1 × 1 convolutional layers (similar to [20]).Network key components are outlined in Table I.

B. Pretraining on Source Dataset
The model is pretrained for K source iterations using only the source (labeled) samples (x s,i , y s,i ) n s i=1 .Cross-entropy loss mlp ce computed on the source samples is used to train F and G mlp and thus updating θ and φ, respectively.

C. Adaptation on Targets and Incremental Learning
The model's generalization ability to the target domains is progressively improved by incorporating more target samples in each step.An incremental learning approach is adopted by processing one target domain at a time, in the order of difficulty levels.The easier target domains are processed first followed by the harder ones to avoid potential negative transfer.The level of domain difficulty can be measured by the dissimilarity to the source domain.In our case, the entropy (H (T j ) for domain T j ) of the predictions returned by the source-trained model is used to determine the level of difficulty of the target domain [7].H (T j ) is computed as mean of cross-entropy for all samples belonging to domain T j .
The labeled samples are denoted by Ŝtotal and initially consists of only samples from the source domain dataset S.
Once one target domain (out of N domains) to be processed is fixed, the adaptation is performed for K iterations on that domain.In each iteration, source samples ( Bn s ) and target samples ( Bn t ) are drawn to form a mini-batch.Each minibatch of images is fed to the feature extractor F to obtain features corresponding to them and then fed to the G mlp and G gnn .While the G mlp does not aggregate features from different samples, rather predict based on only the sample of interest.On the other hand, GNN-based classifier G gnn has a sophisticated structure with an edge network f edge and a node classifier f node , which aggregates the features of the samples in the batch.The edge network f edge encodes the relationship between nodes and allows the messages passing along the edges, thus efficiently aggregate the information carried by nodes.The node classifier f node provides more robust predictions by taking advantage of the context-aware learning paradigm.Instead of merely counting on the current sample x, G gnn is capable of giving the prediction on a global scale based on the entire mini-batch.GNN and MLP capture different aspects of the information and naively using either of them may lead to noisy features or unreliable predictions regarding the classification performance.This motivated a cooperation of the two classifiers to improve each other.The predictions of the MLP head and the GNN head are defined as and ȳ ← softmax G gnn (F(x)) . ( The cross-entropy mlp ce and node ce are minimized over all source samples to train the G mlp classifier and f node of G gnn . 1) Coteaching and Pseudolabeling of the MLP and GNN: Following the concept of coteaching [8], MLP and GNN are trained together to provide feedback to each other.
The first information flow is from MLP to GNN.The aim of f edge is to build an affinity matrix Â that encodes the relationship between nodes, i.e., samples in a mini-batch.The binary representation is a simple but effective choice, where 1 indicates that the i th and j th sample share the same class label and 0 otherwise.Due to the lack of the labels of target samples, Â is a rather sparse matrix that carries little information.To solve this, G mlp provides pseduolabels of the unlabeled target samples and forms a target matrix Âtar in the similar way.Â is learned by minimizing the (binary cross entropy) edge loss edge bce between the elements of the current affinity matrix produced by f edge and the elements of the target matrix Âtar given by G mlp .By far, G gnn is able to encode the pairwise similarity between every two nodes by studying Â under the guidance of Âtar .
The other information flow is from GNN to MLP.Generally, confident predictions produce higher softmax value for one class in comparison to the other classes.Based on this, f node computes the score of each sample in the target domain as the max of softmax value (3) Fig. 1.Proposed multitarget approach with source domain S and assuming three target domains T 1 , T 2 , T 3 (in that order of closeness to the source domain).
Higher value of w j indicates that the pseudolabel is more reliable to be accepted as an element in the source domain.Thus w j is compared to a threshold τ , and if higher, the target sample, together with its pseudolabel, is regarded as an additional source sample, and the elements in the labeled samples Ŝtotal are updated.By doing this, the G gnn creates a series of context-aware pseudosamples after processing a target domain, thus increasing the ability of the G mlp to learn the subsequent target domains.Subsequently, the target adaption is performed on the next target domain, however using the updated Ŝtotal .Considering the next target is more different from the source and thus less likely to contain samples that can be included in pseudosource, τ is increased by τ .
2) Domain Discriminator D: To further close the gap between the source domain and the target domains, a domain discriminator D to predict the domain of the samples is trained using an adversarial loss adv , following [11].The prediction of D is obtained by d computed as sigmoid(D(F(x))).
Fig. 1 shows the proposed multitarget approach assuming three target domains.Furthermore, the training procedure is detailed in Algorithm 1.

D. Using Trained Model for Inference
Once the proposed MLP-GNN dual head model is trained on all target domains, the GNN head can be used during inference to determine the class of a target test samples.Unlike [7], we do not use any final fine-tuning step.

A. Datasets
We conducted experiments on the following datasets.

B. Evaluation Protocol and Settings
We use the classification accuracy to evaluate the performance.The performance for a given source is given by setting the remaining domains as target domains.We use the  Resnet-18 network as the feature extractor.We show quantitative comparisons to conditional adversarial domain adaptation (CDAN) [11], batch-instance normalization (BIN) [13], coteaching [7], and MT-MTDA [6].While CDAN is based on adversarial training, BIN is based on statistical alignment, and thus together, they cover both major approaches in DA, as discussed in Section II.CDAN is modified to exploit information from all target domains by combining them together while adapting.Adaptation of BIN is batch-specific and thus it takes into account one target domain at a time.The other two approaches are designed for MTDA and so no such modifications are required.

1) Multicity:
Table II shows the quantitative result by using Sydney as the labeled source domain and Moscow and Mumbai as the unlabeled target domains.Model simply trained on source domain (Sydney) obtains an average accuracy of 40.64% on the two target cities.While adversarial training-based CDAN significantly improves the result (47.07%), statistical alignment-based BIN does not significantly improve the result.This shows that mere statistical alignment is not sufficient for mitigating distribution gaps between multiple cities.Proposed method outperforms all compared paradigms and it obtains an accuracy of 61.64% for Moscow, 45.18% for Mumbai, and thus an average accuracy of 53.42% over two target cities.We also observe reduction in Kullback-Leibler (KL) divergence after adaptation (Table III).
Table IV shows the quantitative result by using Mumbai as the source domain.Source-trained model obtains an average accuracy of 42.07%.Proposed method obtains an average accuracy of 68.37% and outperforms the source trained model and all compared paradigms.The proposed MTDA improves classification accuracy for the targets by more than 26% over source-trained model.
2) Multiseason: Average accuracies, considering each season as source and other three seasons as targets, are shown in Table V.For spring, summer, and winter seasons (as source), proposed method outperforms source-trained model and both CDAN and BIN.However, for spring and fall as source, proposed method is slightly outperformed by coteaching and CDAN, respectively.Gain of the proposed method over the source-trained models are less (approx.5%, 8%, and 15% for spring, summer, and winter, respectively) in comparison to multicity dataset.A visualization for season variation is shown in Fig. 2.

V. CONCLUSION
We proposed a GNN-based method for remote sensing MTDA.The proposed method incrementally adapts a source-trained classifier for multiple targets.We validated the proposed method on two different types of domain shifts, namely geographic shift and seasonal shift.Experimental results clearly indicate the potentials of the proposed method.MTDA is a comparatively new area in remote sensing and there is still scope of improving this paradigm.In future, we plan to devise a method to identify irrelevant source samples, discarding which may potentially benefit the adaptation process.Furthermore, we plan to extend the method for multisource multitarget adaptation and mixed target domain settings.
Terms-Coteaching, domain adaptation (DA), graph neural network (GNN), multimodal learning, multitarget adaptation.I. INTRODUCTION M OST deep learning-based methods assume that the training data and test data are drawn from the same Manuscript received November 16, 2021; revised January 7, 2022; accepted January 16, 2022.Date of publication February 7, 2022; date of current version March 1, 2022.This work was supported in part by the German Federal Ministry of Education and Research (BMBF) in the framework of the international future AI Laboratory "AI4EO-Artificial Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond" under Grant 01DD20001, in part by the German Federal Ministry of Economics and Technology in the framework of the "National Center of excellence ML4Earth" under Grant 50EE2201C, in part by the European Research Council (ERC) through the European Union's Horizon 2020 Research and Innovation Programme under Grant [ERC-2016-StG-714087], Acronym: So2Sat, in part by the Helmholtz Association through the Framework of Helmholtz AI under Grant ZT-I-PF-5-01-Local Unit "Munich Unit at Aeronautics, Space and Transport (MASTr)," and in part by the Helmholtz Excellent Professorship "Data Science in Earth Observation-Big Data Fusion for Urban Research" under Grant W2-W3-100.(Corresponding author: Xiao Xiang Zhu.)

TABLE I NETWORK
ARCHITECTURE ASSUMING BATCH-SIZE B. CONV(a, b, c) DENOTES A CONVOLUTIONAL FILTER WITH a INPUT FEATURES, b OUTPUT FEATURES, AND KERNEL SIZE c × c Training Procedure of the Proposed MTDA Method require: source dataset S, number of classes n c require: target datasets T = {T j } N j =1 require: networks F, D, G mlp , f edge , f node with parameters θ, ψ, φ, ϕ, ϕ , respectively.The f edge and f node form the G gnn .require: hyperparameters B, τ , τ K source , K .We use B = 32, K source = 1000, and K = 5000.
1) Multicity dataset containing three domains (Moscow, Mumbai, and Sydney) consisting of six classes each.There are approximately 800 images per class per city and are sampled from LCZ42 dataset [9] Sentinel-2 images with 10 m/pixel resolution.Algorithm 1 15 Bn s ← (x s,i , y s,i ) pairs ∼ Ŝtotal

TABLE V MULTISEASON
: AVERAGE PERFORMANCE FOR EACH SEASON IS SHOWN, TAKING THE CONSIDERED SEASON AS SOURCE AND ALL OTHER SEASONS AS TARGETS Fig. 2. Visualization of season variation on class cropland.From left to right: spring, summer, fall, and winter.