Data Augmentation and Few-Shot Change Detection in Forest Remote Sensing

Forest remote sensing change detection provides an important technical support for forest management decisions and analysis of forest disturbance factors. However, lack of data in specialized fields leads to the detection accuracy-improved difficulty. In this study, a forest remote sensing change detection model in the context of few-shot learning is proposed. The proposed model achieves end-to-end change detection algorithm for forest scenes from two perspectives of data augmentation and updated few-shot algorithm. First, forest fragments images are jointly generated by using feature extraction network and generative adversarial network. Then, forest fragments are blended into the original change detection dataset by Poisson blending method to achieve effective augmentation. Furthermore, the end-to-end change detection network is also updated using a few-shot learning. In addition, Metalearning module is added to the slow feature analysis algorithm based on the multiattention mechanism to realize the detection effect improvements. The proposed model improves the F1 score from 86% to 91% on the two datasets. Moreover, it increases the F1 score by 6.52% on average.

prosperity [5], etc. Remote sensing change detection in forest scenes can help in analyzing the deforestation and disasters in a certain area at a certain time, and therefore guide the forest management decisions in that area [6]. Due to long duration and slow change characteristics in forest scenes, automated change detection methods for forest remote sensing are highly required.
Deep learning-based approaches have greatly improved the efficiency of change detection [7], with significant advantages in automation and accuracy. Furthermore, deep learning networks have powerful feature extraction and classification capabilities [8], and also more sensitive for changing pixel features in change detection. In recent years, deep learning-based change detection algorithms have been attracted much attention [9], breaking through the bottleneck of traditional remote sensing technology and making great progress [10]. Generally, accurate deep learning detection models require a large and effective amount of data. The core idea behind deep learning is to learn the corresponding features with a large amount of data, summarize the specific laws for prediction, and then build a more complete prediction model.
However, many scenes in reality often lack effective training samples, such as change detection in forest scenes. The existing available forest remote sensing datasets are limited. The cost of remote sensing image acquisition is very high, and the labeling of forest information requires strong professionalism, which is often difficult to be done by nonprofessionals. In addition, unless affected by man-made disasters, forest cycle change is a very long process and vulnerable to seasonal influence, which makes it difficult to obtain effective forest change information images in a short time, which brings great difficulties to the automated forest change detection. From the current technical means, the few sample problems can be studied from two aspects: data augmentation and few-shot learning algorithms.
Data augmentation [11] expands the training dataset by increasing the amount of image samples, increases the training data, reduces the overfitting phenomenon of network training for deep learning models, and, thus, trains a network model with stronger generalization abilities. GoodFellow et al. [12] suggested that acquiring more valid data is more important than the augmentation of algorithms. Therefore, it is important to conduct research on forest image data augmentation at the data level.
On the other hand, few-shot learning algorithm improvements can solve machine learning problems with limited data [13]. Moreover, deep learning-based models with a small number of learning samples have a great potential interest in numerous remote sensing applications that lack data. Few-shot learning This  algorithms not only save the cost of acquiring data, but also can narrow the gap between artificial intelligence and human intelligence in providing the possibility of change detection for specific special scenarios, which is a topic that must be tackled in the development of deep learning [14].
Most of the current data augmentation algorithms are based on transformation methods of the image itself, such as cropping, rotating, flipping, stitching, or adding noise to expand the samples [15]. However, for forest scenes with little change information, such methods cannot fundamentally solve the problem of increasing the effective information for forest changes. Therefore, thinking in terms of enhancing effective change information and balancing the ratio of positive and negative samples in the image, adding effective forest change samples to the original dataset is the key problem to solve the lack of forest change detection data. Few-shot learning algorithm can be understood as letting the model learn to learn, which is also a manifestation of metalearning. Many scholars are currently conducting research under the benchmark of N-way K-shot [16]. However, for the forest remote sensing change detection problem, few researchers start to use both data augmentation and few-shot algorithm improvement at the same time. Therefore, we employ these two aspects deeply and construct a forest remote sensing change detection model based on data augmentation and few-shot learning algorithm.
In this article, a forest remote sensing change detection model based on data augmentation and few-shot learning is proposed, namely FRSCD_DAFS. First, we apply convolution neural network (CNN) to extract deep features of forest samples. Second, the deep convolution generative adversarial network (DCGAN) is used to generate forest sample fragments images. Third, the obtained forest fragments are synthesized to the second temporal background in the original forest change detection dataset by Poisson blending to achieve effective change sample expansion of the dataset. Finally, the change detection algorithm is also improved based on few-shot learning without changing the number of images in the dataset. Effective training data and algorithms are provided for the change detection model to achieve the few-shot forest remote sensing change detection task. The overall structure of the proposed model is shown in Fig. 1. The main contributions of this study can be summarized as follows.
1) A complete system of forest remote sensing change detection model FRSCD_DAFS under few-shot leaning is constructed, which combines data augmentation and fewshot learning algorithm to fill the gap of data and algorithm co-augmentation under forest scenario and improve the automation efficiency of forest remote sensing change detection. 2) A method of forest remote sensing image augmentation is proposed to increase the number of samples for forest change information, balance the number of positive and negative samples in change detection, and achieve a real and effective information expansion of the dataset with the same number of images in the dataset.
3) A change detection algorithm based on few-shot learning is proposed to jointly compute interclass features and intraclass features, and combine migration learning and data augmentation methods to improve the change detection accuracy of end-to-end forest remote sensing images.

A. Forest Change Detection Methods
Change detection methods in forest scenes have not been fully studied yet and occupy only a small part of many kinds for remote sensing change detection. The traditional detection methods are mainly relied on vegetation indices for detection [17], in addition to machine learning-based detection methods [18]. Due to the severe scarcity of data, there are also few specialized forest scene change detection models among the deep learning detection methods. Therefore, remote sensing change detection for forest scenes still needs to be focused on data and models for research and updating.

B. Remote Sensing Data Augmentation
The change detection model based on deep learning requires learning pixel features of the forest remote sensing images and the change difference features between two temporal images through a large amount of training data [19]. However, limited by the slow change of the forest scenes, less change information is acquired in the images, and even less effective data are transferred to the model for training after image cropping, which greatly reduce the generalization ability of the model, resulting in limited applicability of the model to only test environments with substantially similar training data. Therefore, the focus of effective data augmentation is to enhance the change information in the two temporal images, enhance the sample number of change instances in the training data, enhance the number of change pixels, increase the proportion of change information in the images, and avoid occurrence of "invalid training." To achieve effective data augmentation, we use DCGAN network to generate change instances that need to be added to the images [20]. Hence, to generate real forest instances that provide change forest samples for the dataset. The generated change instances are called forest fragments, and the forest fragment images are mixed into the temporal background images for change detection, which increases the number of change samples and achieves effective positive sample augmentation.

C. Change Detection Based on Deep Learning
Deep learning-based change detection methods are broadly divided into two main categories: one is classification method followed by detection [21], [22], [23], [24], and the other is end-to-end direct detection [25], [26]. With feature extraction capabilities of convolutional neural networks, the method of first classifying image pixels and then realizing pixel as difference is widely used [27], [28], [29]. In order to improve detection efficiency, the end-to-end detection method, which is also known as one-step change detection, has gradually attracted many researchers' attention [30], image transformation methods are used as a mainstream [31], [32], [33], [34]. For special application scenarios with few limited data samples, the few-shot learning concept gradually occupies the hot spot, so it is an inevitable trend of development to study forest remote sensing change detection based on few-shot learning algorithms.
To solve remote sensing change detection problem, Jing et al. [35] introduced a change detection algorithm based on multiattention mechanism. First, a bag-of-words model with automatic clustering was used to preclassify the images, and the classification number was obtained as a channel parameter for the subsequent training network. Second, feature extraction was performed using a residual network, while a 3-D attention mechanism was added to compute and update full-time for the space, time, and channel of remote sensing images, respectively, to obtain more accurate and complex feature information. Finally, the high-dimensional features were passed into the slow feature analysis network, the feature values of pixels were then sorted and compared, and the extraction of change information was realized according to the theorem and threshold calculation. In this article, we construct a feature extraction network and a feature projection network, and propose a change detection algorithm based on few-shot learning, which performs well in forest remote sensing change detection.

III. METHODOLOGY
The accurate prediction effect of the deep learning model relies on a large amount of training data. Consequently, improving number and utilization rate of the samples is the key for solving the problem of few samples remote sensing change detection. The target object of change detection in forest scenes is forest communities, which are affected by human or natural factors with large changes, showing regional changes and concentrated changes under the interpretation of remote sensing images, while the changes occurring due to their growth factors are slow, and the changes are not easy to monitor, resulting in an uneven distribution of positive and negative samples in the real forest datasets. Therefore, in the naturally acquired forest remote sensing images, the detection network learns limited features and the effective training information accounts with a small percentage, which affects the accuracy of the model. Furthermore, to improve the effectiveness of training information, the coverage of positive samples to the whole remote sensing image should be increased. in addition, forest image features should be fully extracted by means of deep learning characteristics to effectively detect forest scene changes in the context of few-shot data.
This section will illustrate the proposed method of few-shot data augmentation in conjunction with the practical application of forest scene change detection, using the adjusted DCGAN network to generate new sample data to form forest image fragments. Then, we fuse the fragments with the temporal images in the forest change detection dataset to reconstruct a change detection dataset with large number of valid samples, which is a good data foundation for subsequent change detection tasks. The overall work is divided into the following three main steps.
1) Forest image fragment generation: The main focus of data augmentation is to increase the diversity of forest image forward samples. We refer to the generated new forest forward samples as forest fragments. Considering the generalization ability of the final change detection model, the diversity of forest image fragments needs to be evaluated. We employ an improved DCGAN network to counteract the generation of forest fragments with diversity and render their color edges by using the improved color transfer (CT) algorithm to enable better fusion reconstruction with the real dataset and avoid inharmonious.

2) Reconstruction of fragments and temporal images:
The generated forest fragments enhance the number of positive samples in change detection, and by fusing these fragments with the real change detection dataset, the coverage of positive samples can be improved and the number of valid information is extended. We use Poisson blending for fusing forest fragments and background data to construct new few-shot dataset, which is suitable for forest scene change detection tasks.

3) Change detection network: The change detection in forest
scenes is one of the important applications for Earth observation. We tend to improve the change detection algorithm proposed by Jing et al. [35]. We improve the slow feature analysis method by using few-shot learning and using the newly generated change detection dataset for change prediction and analysis.

A. Forest Image Fragment Generation
The generation of forest image fragments is achieved by a feature extraction network and a GAN. We use a CNN network to achieve diverse pixel feature extraction of the forest [36], and the extracted features are projected and passed to the discriminative network in the DCGAN for forest fragmentation generation feedback as decision discriminative information [37]. In addition, to increase the diversity of forest fragments and the robustness of the model, the constraint of forest fragment area is added to make the model converge in a controlled range.
1) Forest Image Feature Extraction: In order to avoid the gradient explosion phenomenon of GANs [38], a convolutionbased feature extraction network is added to the generative image first. The essence of convolutional neural network is to fit a more complex function by layer simple activation function, and adjust the training parameters by the feedback information during the fitting process, so that the cost function is minimized and the optimal strategy is obtained. Convolutional neural networks have a strong ability of representation learning and translation invariant for the classification of pixels [39]. In the forest feature extraction network, the normalized 3-D pixel data, i.e., 2-bit pixel points on the plane and RGB channels, are first passed to the input layer. Then, the feature maps are processed by convolutional, pooling, and fully connected layers. Finally, the feature classification of pixels is obtained by the logic function of the output layer. The structure of the forest feature extraction network is shown in Fig. 2.
The convolution part is calculated, as shown in (1). The summation part is equivalent to the primary cross-correlation. Here, b is the deviation amount, Z l and Z l+1 denote the input and output of the convolution in the l and l + 1 layers, respectively, also known as the feature map, L l+1 is the size of Z l+1 , and it assumes that the feature maps are of the same length and width, Z(i, j) and K, respectively, denote the pixels of the feature map and the number of channels of the feature map, and finally, f , s 0 , and p are the parameters of the convolution, which are corresponding to the kernel size, the stride, and the padding layers, respectively.
2) Forest Fragment Generator: A GAN is a powerful network for generating real images, which can generate new images outside the training dataset and similar to the distribution of the training set with the help of random noise as an input.
GAN networks train both generators and discriminators [40], [41], which are very suitable for few-shot forest change detection scenarios where training data are limited and lacking the network performance. Among many variant networks, the performance of DCGAN has been proven to be stable and excellent [42]. Therefore, we select the DCGAN network as the backbone network for generating forest fragments images.
To avoid the repeated blurring samples of the traditional DCGAN network model, we adjust the activation functions of the generator and discriminator by choosing SeLu function [43] for computation. SeLu function retains the computational part of the input value less than 0 when completing nonlinear activation of the convolutional layer and preserves the richer pixel features. The SeLu activation function can automatically correct the cases of different deviations occurring in the neural network, and the network model can self-adapt to the changes in the input and improve the overall robustness of the model.
The expression of the SeLu function is shown in (2). When the parameters λ and α are set to λ ≈ 1.0507 and α ≈ 1.6733 [44], the distribution of the output of each layer will converge to the standard normal distribution under the condition that the model weights obey the standard normal distribution. This adaptive property can avoid the problem of gradient disappearance or explosion when generating forest fragments.
Since we have extracted features from the forest samples in the source dataset before the generative model, it provides more detailed features for the discriminator's discrimination in the GAN network, which makes the relationship between the generator and the discriminator more exciting and rigorous, forcing the generator to generate more realistic images and putting higher quality requirements on the whole generative model. The structure of the improved DCGAN model is shown in Fig. 3.
The random noise is passed into the generative network, and the feature map is obtained through projection and reshape operations in the fully connected layer, followed by the multilayer deconvolution algorithm upsampling to generate the final image G(Z). In the discrimination network, the generated image is first entered, downsampled by convolution, processed in the fully connected layer, and then fed into the SeLu function to discriminate the probability of the image from being true or false. When the probability approaches 0.5, the forest fragment image is considered to be up to standard.
The loss functions of each of the generation and discrimination are shown in (3) and (4). Here, m is the number of samples, D(x) represents the probability of real images, G(Z) represents the false images generated by the generator, and D(G(Z)) represents the probability of the generator generating false images.

3) Color and Style Transfer:
To have diversity in the generated forest fragment images, the selection of the source dataset takes into account the color change of the forest with the seasons, so the color of the background image needs to be considered when synthesizing the new dataset. This part will converge the color and edge of the generated forest fragments toward the pixels of the background image, making the newly generated image looks more realistic and complete.
We employ the CT method as introduced in [45] and [46]. Furthermore, we consider the forest fragment as the target image and the background image as the source image. Consequently, we intend to transform the color and style of the forest fragment into a fragment that is consistent with the background image. Since the original algorithm computes the transformation matrix for the whole target image, while in our scenario, the target of the matrix computation is set within the forest fragment, that is, only the forest sample is computed and hence the image is decoded. in addition, when we train the GAN, the labels of the aged forest fragments are generated correspondingly. Therefore, the actual range of the fragments can be easily controlled.
The CT algorithm takes a statistical view of the pixel values of the image as 3-D random variables and the image as a collection of vectors. The mean of the covariance of the 3-D components is calculated, and the covariance matrix is decomposed to obtain the rotation matrix. The covariance is used to calculate the correlation between the 3-D components. Finally, the pixel data of the target image are spatially translated to obtain the target forest fragments with similar colors.
The process of blending is shown in (5). Here, I is the color of the final result, w ij are the weights calculated based on the distance between the pixel data and the observation center, and a i are the weights that can be adjusted [45]. The I j labels the data of the intermediate image or the response to a separate target sample, and the parameters m and n are the number of samples in the source and target images, respectively. In the experiments, m and n have the same value.

B. Reconstruction of Fragments and Temporal Images
The change detection dataset contains two temporal image sets, and its detection results without considering semantic information are not subject to the image order of the image set, so the change detection model has detection symmetry during the training. In this part of the reconstruction process, we fuse the generated forest fragments into the latter-period images of the real change detection dataset by the Poisson hybrid algorithm. Before fusing the fragments it is necessary to determine where the fragments appear to prevent them from obscuring the entities on the original dataset and to complete the fusion of the fragments at suitable locations. The detection effect of few-shot image augmentation is verified according to the constraint of the fused fragment area.

1) Composition Position Selection:
The whole idea is to be sure that the generated forest fragments are randomly distributed on the background image homogeneously. In that case, the fragments appear in any corner of the image. Here, we combined the function of random and uniform to determine the location range. The training unit, which was cut to unified size, could still obtain more positive training samples.
Another important principle in the selection of the location for the forest fragment synthesis is not to obscure the original physical objects on the background image. To achieve this purpose. We can annotate the background image and generate a label map by semantic segmentation model. When sampling the positions in the image uniformly, the labeled positions are avoided and the updates are recorded until all positions on the image are sampled. We use the mature and simple semantic segmentation model MultiResUnet [47] for label annotation, and its model structure is shown in Fig. 4.
The MultiRes block replaces the block in the Unit model with the MultiRes Block [47]. The MultiRes Block reduces the semantic gap between encoder and decoder features by passing the encoded features through a series of convolutional layers. Residual connections are also introduced and the number of convolutional layer filters within the block is controlled.
2) Image Composition: At this point, we obtain the appropriate positions of the forest fragments in the foreground and the background images, and then, we fuse them utilizing Poisson blending [48], which is a computational method based on equation solving and has an excellent performance in image fusion. By optimizing the equations, the boundaries of the forest fragments and the background images are organically blended to maintain similar brightness and gradient for image synthesis. In the 3-D Cartesian coordinate system, the Poisson equation is expressed as follows: where ∂ is the partial derivative, and f and ϕ are functions of real or complex values on manifolds. The left-hand side operator is called the Laplace operator, and if the right-hand side of the equation is 0, i.e., the regional dispersion is 0, this equation becomes a chi-squared equation called the Laplace equation.
The gradient of asymptotics can be slowed down by restricting the Laplace operator [49], [50]. The scalar function on the forest fragment region can be uniquely determined by the boundary values and the internal Laplace condition, which means that the Poisson equation must have a unique solution. The gradient problem can be solved by interpolation methods and function derivation on the three-channel image as a way to guarantee consistent brightness between the forest fragments and the background image and to make the generated image more natural.
The amount of forest fragments added to the background image is bounded by the coverage area of the pixels of the forest fragments. Experiments have shown that the best detection is achieved when the coverage area of the forest fragments reaches 50% of the whole, i.e., when the positive and negative sample avenues are balanced.

C. Change Detection Network
According to the characteristics of the constructed forest scene change detection dataset, we use the slow feature analysis change detection algorithm with few-shot learning theory combined with multiattention mechanism to update the original change detection model. Based on the few-shot data, the slow change pixels have smaller feature values according to the slow feature analysis theory, and the change pixels with higher feature values are obtained by combining with the threshold algorithm. The feature classification extraction and change detection are realized simultaneously to build an end-to-end detection system.

1) Few-Shot Learning:
Allowing neural networks to get accurate classification results by learning only a small number of training samples is a challenging task for many deep learning models. Few-shot learning enables the use of prior knowledge to quickly generalize the models with new tasks where sample information is limited or scarce.
There are two sets in the few-shot learning structure [51], which are support set and query set. The support set hosts a small number of new labeled category samples, which are used by the pretrained model for generalization, whereas query set holds samples of new categories and known data, and the model needs to be generalized using prior knowledge and information learned in the support set [52], [53]. The number of new categories is denoted by N, and the labeled samples available in each category are denoted by K, constituting an "N-way K-shot" learning scheme. When the number of samples is small, the model often needs to resort to metalearning or transfer learning to complete the training. When N increases, the difficulty of the classification task increases and the accuracy of the prediction decreases, whereas if K is increased, the expected risk will be reduced and better results will be obtained for metric learning [54]. In addition, when the value of K is small, it is more suitable for the implementation of metric learning methods [55].
In this article, we apply the idea of few-shot learning to the change detection of forest scenes. According to the scenes in our dataset, we set N to 4 and K to 20 in the experiments. Meanwhile, we add feature extraction module and feature projection module to the CNN model, which are corresponding to the interclass features and intraclass features of a small amount of sample data in change detection, respectively, to fully exploit the performance of the migration learning [56], [57], [58], [59]. Following the scheme of few-shot learning, the sample dataset is divided into labeled support dataset and unlabeled query dataset. The generalization ability of the model is improved by extracting deep features and other additional features. Furthermore, metric learning is performed on the two sets to calculate the distance similarity between similar points in the postprojection space. By comparing the similarity, the approximate classes in the query dataset are obtained. Then, the optimized feature information is passed into the slow feature analysis network with multiattention mechanism for change detection. The structure diagram for few-shot learning is shown in Fig. 5.
The formula for the goal of metric learning is shown in (7), where W and B denote intraclass features and interclass features, respectively. The objective function sums the intraclass distances, interclass distances, and the relative distances of these two distances.
The backbone of the CNN feature extractor consists of four convolutional blocks and two maximum pooling layers, where each convolutional block consists of one convolutional operation, one batch normalization process, and the ReLU activation function [60]. The feature extraction module consists of four bottleneck layers, each containing three 3 × 3 convolutions, three batch standardization processes, and a ReLU function. The feature projection module consists of three bottleneck layers and one convolutional layer.
2) Multiattention Slow Feature Analysis: The slow feature analysis algorithm provides an end-to-end change detection method, and we add a multidimensional attention mechanism [61] to the base model to learn better feature expressions in space-time and channel to improve change detection. In this article, the change detection model proposed in [35] is further improved by using a few-shot learning approach for preclassification, followed by slow feature analysis change detection with an attention mechanism.
The optimized features are obtained by the classification of few-shot learning, and the number of categories continued to be calculated as the channel parameters. The size and channel of the image are reacquired according to the classified category number, and the channel attention weight, temporal attention weight, and spatial attention weight of the image are calculated, respectively, and the extracted complex features are used as the input of the slow feature analysis network. The attention mechanism focuses more on a certain local feature in the global context and focuses on calculating the probability of the attention distribution for key vectors and calculating the weight value of local feature attention. In addition, the attention mechanism helps in selecting the important information to be passed to the network and selectively trains the information to reduce the network burden. The spatial and channel-based attention mechanisms can be integrated into any convolutional neural network for training to improve network performance [62]. By combining attention mechanisms in both spatial and temporal dimensions, attention weights can be computed more efficiently. In remote sensing images, the channel attention mechanism focuses on the likelihood of useful information, whereas the spatial attention mechanism focuses on the location of useful information [35].
According to the theory of slow feature analysis, pixels with small eigenvalues change slowly or even invariably. By suppressing such invariant components and minimizing their eigenvalues, the change components, which are the pixels that change, are highlighted in reverse. Finally, the change map is obtained by a threshold segmentation.

IV. EXPERIMENTS
We select four real remote sensing image datasets containing forest scene changes and classify the tasks according to scale and size of the images to form two change detection datasets and two forest debris generation datasets. The effectiveness of the proposed model is verified by comparing the evaluation indices of the experiment and the ablation study. The presentation diagrams of some images from the datasets are shown in Fig. 6.

A. Origin Dataset for Change Detection
The remote sensing change detection dataset with labels for forest scenes is very scarce, and the dataset with a larger proportion of forest scenes is selected as the experimental dataset to be used as much as possible in the existing public dataset. We select the SZTAKI [63] and LEVIR-CD [62] datasets as the original change detection datasets.
The SZTAKI AirChange Benchmark dataset [63] is an optical remote sensing aerial image of ground change over 23 years completed by Csaba Benedek and Tamás Szirányi (MTA SZTAKI). It contains 13 pairs of two-time images and change reference maps. The resolution of the images is 1.5 m and the size is 952 × 640. The change information in the dataset includes new urban areas, large tree planting, barely arable land, and building changes.
LEVIR-CD is a new large-scale change detection dataset released in 2020 by Chen et al. [62] at the LEVIR lab of Beihang. Both datasets contain vegetation cover and effects of various changes on forest vegetation. These two datasets are suitable as raw datasets for data augmentation of forest scene change detection.

B. Dataset for Forest Generator
The forest fragments are generated by a modified DCGAN network, and the training of the model requires a large amount Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. of labeled forest scene data, but it is difficult to find datasets with individually labeled forest scenes in existing publicly available datasets. We redirect our focus to the change detection dataset with labels, and we find that: in the dataset containing forest changes, the prechanged images and labeled reference maps can form a set of labeled data with masks of the changed part of the forest scene. Therefore, we choose two change detection datasets (i.e., CD_Data_GZ [64] and SYSU-CD [65]) to reorganize the first moment images and labeled maps into a training set for the generation of forest fragments, respectively.
The CD_Data_GZ dataset [64] is a 20-pair high-resolution image of the suburbs of Guangzhou, China, acquired by Peng's team from Google Earth. It contains images for changes from 2006 to 2019. The images have a resolution of 0.55 m and vary in size from 1006 × 1168 pixels for the smallest image to 493 × 5224 pixels for the largest. Changes in forest, farmland, and buildings are recorded in the images, while the reference map mainly shows the changes in buildings. A large number of images are transformed from forest scenes to building scenes, and their binary reference maps are corresponding to forest changes, providing training datasets for forest fragment generation.
The dataset SYSU-CD was established by a team from Sun Yat-sen University that captured the development of Hong Kong between 2007 and 2014. It includes a total of 800 pairs of images for urban architecture, road expansion, and vegetation changes. A total of 20 000 image pairs with a resolution of 0.5 m with size 256 × 256 are generated after flip-rotation. According to our task requirements, some images, such as marine construction and road expansion, are excluded by manual screening, and only data samples of forest changes are selected among them. A total of 7500 pairs are employed (4500/1500/1500 pairs for training/validation/testing, respectively).

C. Synthesis Training Samples
Since change detection is symmetric, that is, changing samples in either temporal phase 1 or temporal phase 2 has no effect on the results of change detection, we choose to synthesize the forest fragments on the second temporal background of the change detection dataset. According to the size and spatial resolution of the dataset and other factors, we set the strategy of synthesizing forest fragments with background images. The forest fragments generated by CD_Data_GZ are synthesized onto the second temporal background of the SZTAKI dataset. The forest fragments generated by SYSU-CD are synthesized onto the second temporal background of LEVIR-CD.
The number of forest fragments to be synthesized is also a factor to be considered. Our aim is to make the positive and negative samples as balanced as possible to obtain more valid training data, so too few fragments or too many fragments will affect the final detection results. We set the constraint of forest fragment synthesis, i.e., the coverage percentage of forest fragments. We set up three sets of experiments, and since the change detection dataset inherently contains a small amount of forest changes, the coverage percentages of forest fragments are set to 15%, 35%, and 50%, respectively.
The SZTAKI dataset is cropped to 120 pairs of images (72/24/24 pairs for training/validation/testing, respectively). The LEVIR-CD dataset is cropped to 10 192 pairs of images (7120/1024/2048 pairs for training/validation/testing, respectively) according to the distribution of the original dataset. The CD_Data_GZ dataset is cropped to 3349 pairs (2009/670/670 pairs for training/validation/testing, respectively). Finally, the SYSU-CD dataset is cropped into a total of 7500 image pairs (4500/1500/1500 pairs for training/validation/testing, respectively). The specific dataset parameters are given in Table I.

D. Experiment Setup
We set up eight experimental strategies as ablation experiments to verify the function and importance of each component in the model separately. The four important components in the model include: feature extraction network, forest fragment synthesis network, few-shot learning network, and multiattention mechanism of slow feature change analysis network. In setting up the experimental strategies, the first three components are arranged and combined to be validated on two datasets, and the model with the best results is selected for comparison with other methods.
To verify the performance of the proposed model and the change detection effect, we choose the mainstream end-to-end detection methods based on image transformation for comparative evaluation, including CVA, PCA, MAD, SFA, and DSFA. As representative methods of unsupervised end-to-end change detection, they are verified to be effective in an amount of applications relied on image transformation theory. The values of forest debris coverage are also verified by setting three sets of experiments with forest debris coverage up to 15%, 35%, and 50%, respectively. Each set of experiments is realized by two datasets.
For all experiments, we use the Pytorch framework. The incoming data are 256 × 256 pixel images, which are trained by momentum stochastic gradient descent [66]. In these models, the weights and bias matrices of each layer are randomly initialized, the initial learning rate is set to 0.01, and the decay coefficient is set to 0.9 [67]. After completing 100 training sessions, the learning rate is reduced to 0. The momentum and weight decay are set to 0.99 and 0.0005, respectively [68]. The batch size is set to 8. Finally, we save the best model as the final result.

E. Evaluation and Metrics
To evaluate the performance of the proposed model quantitatively, we select the following representative metrics [35], including Mean_IoU (MIoU), precision rate, recall rate and F1 score, as shown in the following: Recall = TP TP + FN (10) where TP and FP refer to true positives and false positives, and TN and FN refer to true negatives and false negatives, respectively. MIoU reflects the degree of coincidence between the predicted change map and the reference one. The higher the MIoU value, the more accurate the model prediction. The precision rate represents the samples predicted to be positive, whereas the recall rate considers the positive examples predicted correctly. F1 score is a comprehensive evaluation index, which is the harmonic average of Precision and Recall. F1 score considers both precision and recall, and it measures the model comprehensively when they are contradictory.

F. Experimental Results and Analysis
The proposed few-shot forest remote sensing change detection model contains four main components: forest feature extraction network-CNN, fragment generation and synthesis network-GP, few-shot learning network-FSL, and the improved multiattention mechanism change detection-ASFA. In order to test and verify the function and importance of each component, the relevant components are arranged and combined, and eight experimental strategies are set. Among them, the constraint of fragment synthesis data augmentation is set to 50%, and the experiments are conducted on two datasets (LEVIR-CD and SZTAKI). The experimental results are given in Tables II and III. The best performances are highlighted with bold face.
The validation of the two datasets leads to similar conclusions. The first set of experiments is applied to the ASFA model only, and the values of its evaluation metrics are used as reference values for the other seven sets of experiments. In the second set of experiments, a feature extraction network is added, and the forest features are directly entered to the change detection model. The results show that the training has an overfitting problem with unsatisfactory accuracy. The third group of experiments adds a forest fragment generation and synthesis network, and the results show that the data augmentation can improve the overall detection accuracy. The fourth group of experiments uses only the detection strategy of few-shot learning, and the results show that the improved detection algorithm improves the overall model, but not as significantly as data augmentation. In the fifth group of experiments, the forest fragmentation generation network is improved by feature extraction, and the synthetically enhanced data provides an effective information to the detection model and obtains a large improvement in accuracy. The sixth set of experiments extracts the forest features by convolutional networks and passes them to the few-shot change detection, which also has the risk of overfitting as the second set of experiments. The seventh set of experiments combines data augmentation and few-shot algorithms to achieve better detection accuracy. The eighth group of experiments contains all the components to form a perfect forest change positive sample augmentation network and few-shot change detection algorithm, which improves the F1 index significantly under the condition of balancing positive and negative samples, and also improves the detection performance. Overall, the contribution of GP components in the ablation experiment is more significant from the results. F1 score increases by 2.01% on average after data augmentation. The results also prove that four components need to cooperate and win-win to obtain higher performance.
The experimental results show the model performance of the four important components work together to achieve dataenhanced change detection, and demonstrate that the proposed data-enhanced few-shot multiattention slow feature analysis change detection (FRSCD_DAFS) has better detection results on the forest scene dataset. In the following, the algorithm comparison with other image transforms is developed using two datasets, and the effect of forest fragment coverage percentage on the results is verified. The validation results of the comparison experiments using the two datasets (LEVIR-CD and SZTAKI) are given in Tables IV and V. Furthermore, The change map on the two datasets are shown in Figs. 7 and 8.
To verify the effect of forest fragment cover percentage on the results, the experiments are selected to cover 15%, 35%, and 50% for the two datasets. The comparison graph of the dataset with data augmentation is shown in Fig. 9. The experimental results are given in Tables VI and VII. Generally, the best results are obtained at a forest fragment coverage of 50%, which is compatible with the theoretical results. In addition, we test the performance of the data augmentation model applied to several image transformation detection      algorithms with a forest fragment coverage parameter of 50%. The results demonstrate that our proposed image augmentation method can be transferred to other change detection models and improve the detection results, as given in Tables VIII  and IX. Table VIII gives the result of change detection with augmented dataset on LEVIR-CD dataset. Table IX gives the result of change detection with augmented dataset on SZTAKI dataset.

V. DISCUSSION
We proposed a forest remote sensing change detection model based on data augmentation and few-shot learning. In this section, we will illustrate the uniqueness of our work and the select basis of the method and data; moreover, the limitations and future work are discussed in detail.
Compared with other end-to-end change detection methods, we introduce the idea of few-shot learning and use metric learning to calculate the distance. The detection model is improved at the level of algorithm and data, respectively. The uniqueness of this data augmentation is mainly reflected in the direct increase of the changed samples without enhancing the number of images. Different from data transformation enhancement methods, remote sensing images still keep real in authenticity and integrity in this way. It is somewhat surprising that the model without GP component runs a poor results. The experiment provides a new insight into the relationship between the data augmentation and the feature extraction.
Data selection provides important support for the training and performance of deep learning algorithms. Appropriate datasets can improve the accuracy and the ability of generalization, repeatability of the algorithm. In this model of forest change detection, we chose the appropriate data with a simple background and the main object of detection. The four datasets we selected are diverse, including different categories, angles, and lighting conditions,. This can help the algorithm to better generalize to new scenes. In optical images, RGB data are a standard format for color images that have a relatively higher spatial resolution. We could accurately analyze image elements due to the comprehensive representation of the colors. Combined with the four standard datasets we selected, RGB data are considered to be very suitable for change detection in this scene.
However, our proposed model still has some limitations. As we all know, hyperspectral images contain more spectral and channel information, which is more advantageous in discriminating the seasonal changes of the forest. A feature extraction network is added in the process of generating forest fragments, which improves the detection effect but slightly increases the complexity of the model. Depending on the previously mentioned limitations, our future research can be divided into two directions. First, we may further optimize the forest change detection model, study a lighter model, and generalize it to other detection field scenes. Second, we will study the effects of illumination and shadows on images, extend the model's time-series detection capability, and study the long time-series forest change detection algorithm based on Landsat images, which is more instructive in practical applications.

VI. CONCLUSION
This study provides a complete change detection model for forest remote sensing under few-shot conditions and achieves the improvement of change detection accuracy from both data augmentation and few-shot learning algorithms together. A forest feature extraction network, a forest fragment generation network, and a seamless image synthesis method are proposed in this article. The effective change information of real forest scenes is increased without increasing the number of images, and the data expansion at the positive sample level is achieved. At the same time, a dual-network change detection algorithm for few-shot learning is proposed, and the slow feature analysis network is further improved to achieve end-to-end forest change detection. More importantly, the model we proposed can migrate to other forest and grass scenarios due to the rich remote sensing elements contained in training datasets. It is experimentally verified that this model increases the F1 score by 6.52% on average and has an excellent performance in a two-pronged approach.
ACKNOWLEDGMENT This article was produced by the IEEE Publication Technology Group, Piscataway, NJ, USA.