A High-Performance Multispectral Adaptation GAN for Harmonizing Dense Time Series of Landsat-8 and Sentinel-2 images

—The combination of data acquired by Landsat-8 and Sentinel-2 Earth Observation (EO) missions produces dense Time Series (TSs) of multispectral images that are essential for monitoring the dynamics of land-cover and land-use classes across the Earth’s surface with high temporal resolution. However, the optical sensors of the two missions have different spectral and spatial properties, thus they require a harmonization processing step before they can be exploited in Remote Sensing (RS) applications. In this work, we propose a workﬂow based on a Deep Learning (DL) approach to harmonize these two products developed and deployed on an High-Performance Computing (HPC) environment. In particular, we use a multispectral Gener-ative Adversarial Network (GAN) with a U-Net generator and a PatchGan discriminator to integrate existing Landsat-8 TSs with data sensed by the Sentinel-2 mission. We show a qualitative and quantitative comparison with an existing physical method (National Aeronautics and Space Administration (NASA) Har-monized Landsat and Sentinel (HLS)) and analyze original and generated data in different experimental setups with the support of spectral distortion metrics. To demonstrate the effectiveness of the proposed approach, a crop type mapping task is addressed using the harmonized dense TS of images, which achieved an Overall Accuracy (OA) of 87.83% compared to 81.66% of the state-of-the-art method.


I. INTRODUCTION
T HE availability of multispectral images systematically acquired by Remote Sensing (RS) satellites is pivotal for the observation of land surface change and dynamic processes [1], such as changes resulting from natural calamities [2], expansion of urban areas [3], vegetation anomaly and phenology changes [4], distribution of surface water resources [5], deforestation [6], etc.The time series (TS) of multispectral images acquired by the NASA/United States Geological Survey (USGS)'s Landsat-8 [7] and the European Space Agency (ESA)'s Sentinel-2 [8] missions are the most Rocco Sedona and Morris Riedel are with the Jülich Supercomputing Centre, Wilhelm-Johnen Straße 52428 Jülich, Germany, and the University of Iceland, 107 Reykjavik, Iceland (e-mail: r.sedona@fz-juelich.de;m.riedel@fzjuelich.de).
Claudia Paris and Lorenzo Bruzzone are with the Department of Information Engineering and Computer Science, University of Trento, 38122 Trento, Italy (e-mail: claudia.paris@unitn.it;lorenzo.bruzzone@ing.unitn.it).
widely accessible moderate-to-high spatial resolution RS satellite images.
Landsat-8 was launched in 2013 and carries the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS).It acquires multispectral images at 30 m spatial resolution, which is suitable for a wide variety of tasks.However, Landsat-8 can only revisit the same area every 16 days, which is not sufficient in applications requiring more frequent observations (e.g., near real-time monitoring of continuous processes [9]).The Sentinel-2A and Sentinel-2B are the two polar orbiting satellites of the Sentinel-2 constellation that were launched in 2015 and 2017, respectively.This constellation can reach a revisit time of 5 days at the equator (and even less for areas covered by more than one orbit) and acquire 13 optical bands with 10, 20 and 60 m spatial resolution.
The starting of the Sentinel-2 mission has opened potential opportunities for combining its data with the ones acquired by Landsat-8 to achieve more dense observations.In particular, their integration can densify the acquired TSs and increase the revisit time up to 3-5 days [10] and obtain more frequent cloud-free surface observations.Furthermore, the spatial resolution and spectral configuration (i.e., placement and number of spectral bands) of the Sentinel-2 sensor were designed to be compatible to analogous bands in Satellite Pour l'Observation de la Terre (SPOT) and Landsat sensors [11].Consequently, many research works have exploited virtual constellations of Sentinel-2 and Landsat-8 for addressing different types of applications, for example to assess winter wheat yields at regional scale [12], estimate number and timing of mowing events of grasslands [13], monitor aquatic systems [14], retrieve the temporal variations in biochemical and structural vegetation properties [15], estimate inland water quality [16], detect irrigated areas [17], analyse land productivity and yield assessment [18], map land surface phenology at continental scale [19], determine the spatial distribution of evergreen forest in cloudy and rainy areas [20], etc.
Despite the similarity between Sentinel-2 and Landsat-8 observations, the two missions have different spatial resolution, field of view spectral bandwidth, and spectral response function.Consequently, before using together Sentinel-2 and Landsat-8 images, it is necessary to apply models for cross-sensor data integration [21] [22] [23].Linear regression is the most widely used approach to reduce the spectral differences between the two sensors.The authors in [24] used Bidirectional Reflectance Distribution Function (BRDF) correction and data re-sampling to attenuate the difference introduced by the different field of view and spatial resolution, respectively.Other studies designed regional fixed per-band transformation coefficients for applying reflectance adjustment in Australia [25], southern Africa [26], and United States [27].
Since 2018, NASA is producing a Harmonized Landsat and Sentinel (HLS) dataset1 to further improve the temporal resolution of the combined product [28].NASA proposed a method that creates global fixed per-band transformation coefficients to reduce the reflectance difference between Landsat-8 and Sentinel-2 and generate smooth spectral TSs.In particular, the approach takes into account the differences in spatial resolution, atmospheric correction approaches, view geometry and radiometric characteristics of spectral bands.ESA has considered this approach as a reference work for the definition of the Sen2Like framework [29].The objective of Sen2Like is to generate Sentinel-2 like harmonised/fused surface reflectances with higher periodicity by integrating additional compatible optical mission sensors.The current implementation (November 2020) can harmonize Landsat-8 and Sentinel-2 data products2 .The authors in [30] observed that these methods can reduce the reflectance difference to only some degree.It is possible that the regional or global scale fixed per-band transformation coefficients may not be suitable for all land cover types and at all geographical locations.To mitigate this problem, they proposed a time-series-based approach 3 to improve the consistency of the HLS datasets, which uses the TSs of matched Landsat-8 and Sentinel-2 observations to build linear regression models for each pixel.They then conducted the reflectance adjustment for each individual pixel separately.
Instead of using a physical method or fitting the transformation coefficients of a linear regression, in our work we developed an approach based on Machine Learning (ML), and more specifically on a Generative Adversarial Network (GAN) architecture to harmonize the Sentinel-2 and Landsat-8 products, transforming the data acquired by the Sentinel-2 Multi-Spectral Instrument (MSI) sensor into Landsat-8 OLIlike data.In the last decade DL has enabled a leap in the quality of a wide variety of applications in RS [31].In particular, GANs were first presented by [32] in 2014 and are based on the training with the backpropagation algorithm of two sub-models, a generator and a discriminator.An extension of GANs are the Conditional GANs [33], in which the generator is given additional information to better approximate the distribution of the real samples.The competitive game of one model against the other pushes the generator to create new fake examples that are indistinguishable from real ones.While the generator creates new data from an input distribution, the discriminator is devoted to discern the real and generated examples looking at their distribution.For these reasons, GAN have attracted much research efforts to computer-vision-related tasks [34].
GANs have been employed also in different RS applications.Among those, a promising application is super-resolution, where GANs offer the ability to retrieve high-frequency components that seem not to be captured by existing Convolutional Neural Networks (CNNs) [35], thanks to the contribution of the adversarial loss [36] [37] [38].Chen et al. [39] proposed a GAN-based approach to super-resolve Landsat-8 images and reconstruct them to be Sentinel-2-like using the true color composite of RGB bands.In our approach we propose the opposite direction of the data flow, from Sentinel-2 to Landsat-8 data, as our proposed method focuses on radiometric consistency rather than spatial resolution.Moreover, we also use the near infrared (NIR) and the short wave infrared (SWIR) bands, which are extremely important to perform environmental monitoring (e.g., vegetation biophysical and biochemical variable retrieval, ice detection, etc. ).In particular, the NIR and SWIR spectral channels provide key information on vegetation and crops status.GANs have been applied also to other tasks, such as to enhance the detection of small objects in RS data with an adaptation of the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) [40], or to change detection with multi-sensor data with the use of a CycleGAN [41].Conditional GANs were used also for the fusion of acquisitions from Synthetic Aperture Radar (SAR) and optical sensors, e.g., in [42] optical data were reconstructed from SAR and in [43] a GAN was used to fuse SAR and optical multispectral data for cloud removal.
A well known bottleneck of employing DL models is the large amount of computational resources that are needed for the training phase.DL models require to be fed with large amounts of data in order to learn meaningful features, thus implying the need for dedicated pipelines for extraction and handling of such data, which can impact severely the performances of the methods.Despite the great success of CNNs, their deployment on commodity hardware (e.g., desktop computers, laptops) is often challenging, given their computational power and memory constraints.High-Performance Computing (HPC) systems can come at aid in that regard, offering dedicated hardware accelerators to efficiently deploy and scale-up processing workflows and significantly enhancing their computational performance (i.e, reported as Floating Point Operations per Second (FLOPS)).HPC systems are on the verge of entering into the new era of exascale computing in the coming years, as currently the most powerful computers can reach hundreds of PetaFLOPS 4 .A large number of fields of research use HPC systems for addressing data storage challenges and developing scalable data processing workflows: from climatology to astrophysics, medicine and industrial applications [44].In RS, HPC has been an essential component from the very beginning in the field of EO since its technology and applications include unique data processing, storage or transmission requirements [45] [46].In the current era of Artificial Intelligence (AI) supercomputers (i.e., HPC systems equipped with specialized hardware accelerators [47]), applications from RS also use them to speed-up the processing of DL models that include a high number of trainable parameters [48].
From this brief analysis of the literature, it turns out that the integration of the multispectral images acquired by Landsat-8 and Sentinel-2 is extremely interesting from the operational view point due to the complementary properties of the two sensors.While Landsat satellites are approaching 50 years of continuous global data collection with a temporal revisit of 16 days, the recent launch of Sentinel-2 allows for the acquisition of images having a very high revisit time (i.e., 5 days at the equator with 2 satellites which results in 2-3 days at mid-latitudes).In this context, Sentinel-2 images can be used to generate Landsat-8 like images (from the spectral and spatial view point) with the aim of having dense TSs of images compatible with the TSs of real Landsat-8 available in the past.Such long and dense TSs of images allow for longterm environmental analyses, which are extremely important for several applications (i.e., climate change, deforestation analysis, desertification, urban monitoring, etc.).
In the literature, the integration of Landsat-8 and Sentinel-2 images has been mainly addressed by the RS community considering physical methods or regression models due to: (i) their capability of properly handling the harmonization problem from the physical view point, and (ii) their low computational burden.The latter is particularly important when working at country or continental scale, where the optical pre-processing has to be applied over a hundred of images.However, such methods can only partially mitigate the reflectance difference and may fail in heterogeneous areas where complex non-linear harmonization problems have to be solved.In this framework, it is necessary to define an automatic system suitable for all land cover types and at all geographical locations, which is able perform the integration of these data in a fast and efficient way.This paper presents an automatic work-flow which aims to facilitate the integration of the optical satellite images acquired by Landsat-8 and Sentinel-2 spectral sensors at operational level.Differently from the literature, the proposed system architecture takes advantage from the capability of the GAN to accurately learn and model the considered non-linear problem, while preserving the spectral and spatial properties of the two satellite sensors.To mitigate the computational cost of the required DL models, we take advantage of HPC systems to deploy a parallel and scalable processing workflow that encompasses the extraction of the features from the input tiles, the training of the model and the reconstruction of the harmonized Landsat-8 and Sentinel-2 data product.The speedup of the training of the DL model is obtained thanks to the adoption of a data parallel strategy, which distributes the training of the GAN on multiple GPUs.The main contributions of this work are: (1) the definition of a multispectral adaptation GAN tailored to the peculiar properties of Sentinel-2 and Landsat-8 in terms of spatial resolution, spectral bandwidth, and spectral response function, (2) the implementation of a fully automatic and unsupervised dedicated pipeline, ready-to-use, being able to ingest Sentinel-2 and Landsat-8 data and to produce a dense TS of optical satellite images, and (3) the efficient implementation of a parallel and scalable processing workflow developed and deployed on an HPC environment on up to 16 GPUs thanks to the adoption of a data distributed strategy, which contributes to mitigate the computational burden of the training.

II. PROPOSED MULTISPECTRAL ADAPTATION GAN
The aim of this work is to generate harmonized dense TSs of Landsat-8 and Sentinel-2 images.To this end, we propose a Multispectral adaptation GAN (MGAN) model tailored to the specific properties of the considered satellite optical data.Our objective is to model the spatial and spectral properties (Point Spread Function) of the two sensors in order to adapt the Sentinel-2 data to be Landsat-8 like.Indeed, the proposed GAN is tailored to the specific spectral and spatial properties of the considered sensors to facilitate the adaptation of the Sentinel-2 images to the Landsat-8 ones.In particular, the proposed architecture is build upon the established pix2pix conditional GAN [49] that was designed for color and grayscale image-to-image translation.Based on the GAN concept, the adversarial game played by the two models of the original pix2pix architecture [49] can be represented by the formula: where E is the expected value, X and Y are the source and target images (having the same resolution), z the input noise of the generator and V (G, D) is the value function.In particular, the generator G and the discriminator D of pix2pix are a Unet encoder-decoder architecture with skip connections and a PatchGAN, respectively.In the U-net encoder-decoder generator [50], the first part contains a number of downsampling convolution layers.The second part is a mirrored version of the first, with a transposed convolution for upsampling the data, which flows from the bottom to the top of the U-net through a bottleneck.The skip connections, which link the inner layers of the encoder and decoder, allow low-level information to pass directly from the first to the last layers of the U-net.
Differently from the original implementation of the pix2pix, the input data are no more RGB natural images, but multiresolution and multiband images with different spectral properties.To handle the peculiarities of the considered RS data, we trained the proposed MGAN from scratch using paired Landsat-8 and Sentinel-2 images.Table I reports the properties of the considered spectral bands in terms of spatial and spectral resolutions for both the considered optical sensors.According to the spectral characteristic of Sentinel-2 and Landsat-8, we focused the attention on the four 10 m bands and the two shortwave infrared spectral channels acquired at 20 m by Sentinel-2 (i.e., the spectral bands consistent with the Landsat-8 ones).Let us focus on the multiresolution Sentinel-2 images.Let X HR ∈ R d1×d1×LHR and X LR ∈ R d2×d2×LLR be the set of high resolution (10 m) and low resolution (20 m) spectral channels of Sentinel-2, respectively, where X HR has d 1 × d 1 pixels and L HR bands while X LR has d 2 × d 2 pixels and L LR bands.Let Y ∈ R d3×d3×L3 be the real Landsat-8 image contemporary to the Sentinel-2 one, having d 3 × d 3 pixels and a number of bands equal to L 3 = L HR + L LR .
In the considered implementation of the proposed MGAN, the bottom of the generator has been modified to take as input The PatchGAN discriminator is designed to capture the patterns at the scale of the input image.Its objective is to classify N × N patches of G(X, z) (the input synthetic patch created by the generator) and Y (the target Landsat-8 patch) as fake or true, encouraging the generator to produce more accurate and realistic outputs.Differently from the standard pix2pix implementation, the considered MGAN does not perform the instance normalization [51], since it is not suited to multispectral images.Indeed, similarly to the case of the standard batch normalization typically used in computer vision, the patches may not be consistent from the spectral view point.For this reason, in the model we added the spectral normalization right after the instance normalization in the downsampling blocks of the discriminator [52].The addition of those layers in the discriminator is beneficial for the stability of the training and the spectral content of the obtained synthetic Landsat-8 images.The addition of those layers in the discriminator is beneficial for the stability of the training and the spectral content of the obtained synthetic Landsat-8 images.In greater details, we train the generator and discriminator jointly, employing two losses.The L 1 loss is used in for the training of the generator to learn a lowfrequency representation: where Ŷ = G(X, z) is the generated image obtained considering as input the Sentinel-2 image X and Y is the target Landsat-8 image.We adopted a relativistic adversarial loss for the discriminator (shown in equation 3) as a replacement of the original adversarial loss employed in pix2pix.Using the relativistic loss, instead of the absolute probability that one input image is real or fake, the relative probability that a real image is more realistic than a fake one is computed [53].The adoption of the relativistic adversarial loss increases for the stability of the training [54].The discriminator loss is: and the generator loss: where Y and Ŷ are the real and the fake generated images, respectively, and D Ra is the output of the discriminator.
To properly train the considered MGAN from scratch, we implemented data augmentation.The lack of large amount of data is known to pose several challenges during the training of GAN, since in that setting the discriminator tends to fool the generator easily, which in turn gets stuck and cannot improve anymore.This is particularly true when dealing with RS data.[55] recently introduced a data augmentation technique specifically designed to work with GANs.Differentiable augmentation addresses this issue by applying the same set of transformations on both the generated and real images, regularizing the discriminator and reducing training instability.We adopted the color (contrast, brightness, saturation), translation (the images are translated and zero padded) and cutout (masking a region of the images) policies.

III. DATASET DESCRIPTION AND DESIGN OF EXPERIMENTS
In this section, we present the considered study area and the RS data employed to test the proposed approach.Then, we describe in detail the procedure designed to generate the harmonized TS of Sentinel-2 and Landsat-8 images.

A. Dataset Description
Fig. 2 presents the considered study area, which covers the valley of the Donau in the proximities of Linz, Austria (tile 33UVP of Sentinel-2, tile 191/026 of Landsat-8).Such area is characterized by a heterogeneous landscape typical of the Alpine region, where the topography ranges from high mountain to lowlands areas.The land cover is characterized by the presence of many crop types, which model a complex scenario since crops rapidly change their textural and spectral features.Moreover, the study area is heavily affected by cloud and snow coverage.Due to high temporal resolution of Sentinel-2, several pairs of real Landsat-8 and Sentinel-2 images acquired at the same date (or a one day of distance) are used to train the GAN network from scratch.Table II reports the acquisition dates of the considered images collected in Spring and Autumn.Only images having low cloud coverage (smaller than 30%) were used to train the MGAN.
To assess the capability of the trained MGAN to correctly generate synthetic Landsat-8 data from Sentinel-2 images, the Sentinel-2 data acquired on 03/07/2018 was not involved in

B. Design of the Experiments
To train the considered MGAN, both the Landsat-8 and Sentinel-2 images are split into patches.Fig. 3 reports the different stages of our method, from the training of the model to the prediction and reconstruction of the entire tiles.First, the Landsat-8 images are warped to extract the region overlapping with the Sentinel-2 tiles, by applying a nearest neighbor resampling strategy that does not affect the spectral content of the image.Then, possible spectral outliers are removed from both images.To this end, we considered the standard procedure of saturating the pixel values below and above the 1 and 99 percentiles of the spectral distribution computed per band.Finally, the paired patches were extracted from the two original TSs of Landsat-8 and Sentinel-2.For the Landsat-8 data, the dimension of the patches is 128 px × 128 px (30 m resolution), while for Sentinel-2 they are 192 px × 192 px (for the 20 m resolution bands) and 394 px × 394 px (for the 10 m resolution bands).In particular, a stride of half the dimension of the patch is considered to generate overlapping patches, thus increasing the number of samples.Patches with a significant cloud or snow coverage are not used during training and are excluded with the usage of the available masks.The information provided by the cloud masks of Landsat-8 (i.e., pixel qa band) and Sentinel-2 (i.e., SCL band) are used to define the valid patches for training.The pixel values of each patch are normalized per band by subtracting the mean and dividing by the maximum value.Once that the GAN is trained, it can be used to predict synthetic Landsat-8 images by using Sentinel-2 data.During prediction, each original Sentinel-2 patch is fed into the generator and the corresponding synthetic Landsat-8 patch is produced.The final step is the reconstruction of the entire image from the predicted patches.We applied a buffer equal to 1/4 of the dimension of the patch when fusing them.The tile is then reconstructed using only the central part of the patches, skipping the buffers to limit distortions caused by the convolution operations at the edges of the patches.

IV. IMPLEMENTATION AND EXPERIMENTAL SETUP
In this section, details are given on the implementation and computational setup.Moreover, the quality indexes used to quantitatively evaluate the proposed method are reported in the experimental setup section.

A. Implementation Setup
Of the two main families to distribute the training of a model [56], we used the data distribution approach (i.e., data parallelism).Among the different frameworks that exist to integrate a data distributed strategy into existing code we adopted Horovod [57], a library that offers a flexible API that works on top of most DL libraries, i.e., TensorFlow, Keras, PyTorch and MXNet.Horovod makes use of Message Passing Interface (MPI) and the NVIDIA Collective Communication Library (NCCL) to implement a decentralized and efficient ring-allreduce algorithm [57], which allows the computation of the gradients in a distributed fashion.We used ADAM with base learning rate lr = 0.0001 for the optimization of both the generator and the discriminator, which we scaled linearly w.r.t. the number of Graphics Processing Units (GPUs), without warm-up phase and learning rate schedulers.The training was performed for 100 epochs, as after that point the L 1 loss begins to diverge and the quality of the predicted patches deteriorates.The weights of the U-Net and of the PatchGAN were initialized with the default Glorot uniform distribution [58].The local batch size used for each GPU is 16, therefore the resulting maximum global batch size used in the present work, computed as global batch size = number gpus × local batch size, is equal to 256.

B. Experimental Setup
The experiments were carried out on the Extreme Scale Booster (ESB) partition of the of the Dynamic Exascale Entry Platform -Extreme Scale Technologies (DEEP-EST) and on the booster partition of the Jülich Wizard for European Leadership Science (JUWELS) supercomputers at the Jülich Supercomputing Centre (JSC) [59].The training was scaled on up to 16 Nvidia Tesla V100 and A100 Graphics Processing Units (GPUs).We used Horovod data-parallel framework on top of TensorFlow2, with a custom made training loop.The data preprocessing was deployed on the JUWELS system [47].We used the Geospatial Data Abstraction Library GDAL 2.3.2 through its Python API.
To quantitatively evaluate the results obtained we considered several spectral distortion metrics typically used in the literature.In particular, we considered the Relative Dimensionless Global Error (ERGAS), the Spectral Angle Mapper (SAM), the Root Mean Square Error (RMSE), the Universal Image Quality Index (UIQI) and the Peak Signal-to-Noise Ratio (PSNR) measures on the valid patches (i.e., low cloud coverage).SAM [60] measures the spectral distortion in terms of angle between the vectors of the reference image and generated image: where Y is the real input and Ŷ the predicted input.The lower is the value of SAM, the lower the presence of spectral deviations between the two images.ERGAS measures the quality of the generated image compared to the reference image as a normalized mean square error between each band of the two images [61]: where 1 S is the ratio between the pixel sizes (i.e., equal to one in our case), Y l and Ŷl are the lth bands of the generated image and of the reference image, respectively; the MSE(Y l , Ŷl ) is the mean squared error between Y l and Ŷl and µ Ŷl is the mean of Ŷl .As for SAM, a low value of ERGAS implies a low presence of distortion in the generated image compared to the reference.The RMSE is defined as: where Y is the original input, Ŷ the predicted input.
The UIQI [62] has been computed on a sliding window of size 32 × 32 pixels, and averaged over all window positions per band.Let y j and ŷj denote the jth windowed segment of a single band of the reference and the simulated images, respectively.The UIQI is given by: where σ yj ŷj is the covariance between y j and ŷj , σ yj and µ yj are the standard deviation and the mean value of y j , while σ ŷj and µ ŷj are the standard deviation and the mean value of ŷj , respectively.This index has a range of [-1, 1], being equal to 1 when y = ŷ.To extend the UIQI index to the multiband case, we average the band indexes as follows: The PSNR is defined as: where λ is the number of levels of the images.

V. EXPERIMENTAL RESULTS
In this section, first we present the quantitative results obtained in terms of spectral distortion metrics.The quantitative evaluation is provided together with qualitative examples of the obtained synthetic Landsat-8 images.Finally, the generated TS of synthetic Landsat-8 images is used to perform a crop type mapping task to assess the capability of the network to accurately reproduce the spectral properties of the data.The proposed approach is compared with the physical method HLS [28] developed for reducing the reflectance differences between Landsat-8 and Sentinel-2, thus generating smooth spectral TSs.Please note that such method is widely used from the operational view point [29].

A. Quantitative and Qualitative Results
Table IV reports the results obtained for different spectral distortion metrics comparing the original Landsat-8 images and: (1) the synthetic Landsat-8 images produced by the proposed MGAN, (2) the harmonized Landsat-8 images generated using the baseline method HLS, and (3) the original contemporary Sentinel-2 images.The best results are highlighted in bold.Please note that the evaluation of the spectral difference between real Landsat-8 data and Sentinel-2 data is reported to evaluate the capability of the methods to reduce the spectral difference of these data.
From the results obtained, one can notice that the metrics computed between Landsat-8 and Sentinel-2 images demonstrate the need of harmonizing these data from the spectral view point.The HLS reduces the spectral distortion for some spectral bands.However, for all the metrics, the best results are achieve by the synthetic Landsat images generated with the proposed MGAN.In particular, the MGAN is able to correctly reproduce the spectral properties of Landsat-8 regardless of the spectral bands.Indeed, similar error metrics are achieved in both the RGB spectral channels (i.e., Band2, Band3 and Band4) as well as the near infrared (Band5) and shortwave infrared bands (Band6 ad Band7).
The results obtained from the quantitative view point are confirmed from the qualitative ones.In order to assess the consistency between the generated and the target data, Fig. 4 reports some portions of the: (1) original Landsat-8 image (target), (2) syntethic Landsat-8 data produced by the MGAN, (3) harmonized Landsat-8 data produced by the baseline method (HLS), and (4) contemporary Sentinel-2 image used to generate the Landsat-8 data.The synthetic image produced by the MGAN looks more similar to the original Landsat-8 image than the original Sentinel-2 input data and the harmonized Landsat8 data produced by the HLS method.These results also confirm that the quality of the generated images is good and does not suffer from significant distortions and artifacts.From the results obtained, one can notice that the generated data looks more similar to the original Landsat 8 image than the original Sentinel-2 input data and the harmonized Landsat-8 data produced by the HLS method.For instance, the presence of bright buildings absent in the real Landasat-8 images (see Fig. 4a) is visible in the harmonized data produced by the HLS method (Fig. 4c) but not present in the synthetic data produced my the MGAN (see Fig. 4b).

B. Crop Type Mapping Results
To assess the capability of the proposed MGAN to accurately model the spectral information of Landsat-8, a crop type mapping task was carried out using the obtained TS of produced synthetic images.This peculiar classification task requires the availability of accurate multitemporal and multispectral information to properly retrieve the crop types present in the scene.Indeed, differently from other land-cover classification tasks that can be performed using mono-temporal data, the temporal information is fundamental to accurately model the phenological trend of the crop types.
Table V reports the classification results obtained by considering TSs of: (1) 5 synthetic Landsat-8 images produced by the proposed MGAN, (2) 5 harmonized Landsat-8 images obtained by using the baseline methods (HLS), and (3) 6 synthetic Landsat-8 images produced by the proposed MGAN.The TSs of 5 images were produced the proposed and the baseline methods using the original Sentinel-2 and images.To generate the TS of 6 images, we considered the Sentinel-2 image acquired on 03/07/2018 for which no corresponding cloudless Landsat-8 data are available.Since no cloud-less images were acquired by the Landsat-8 sensor in July 2018 for the considered tile, no quantitative evaluation can be performed in terms of spectral distortion metrics.However, the PA%, UA%, F1% and OA% confirm the quality of the added image.The classification is performed by training a standard Support Vector Machine (SVM) with RBF kernels [63].The optimal kernel parameters (i.e., the regularization parameter C and the spread of the kernel γ) were selected by a 5-fold cross-validation.This test case demonstrates the need to densify existing TSs of satellite data.The temporal and spectral information provided by the satellite acquisition of July 2018 sharply increases the classification results by improving the modelling of the phenological trends of the considered crop types.This increases the OA% from 87.83% (TS of 5 synthetic Landsat-8 images) to 91.53 % (TS of 6 synthetic Landsat-8 images).From these results, we can conclude that the proposed MGAN can be used to generate harmonized dense TSs of Landsat-8 and Sentinel-2 images.

C. Scaling efficiency
The adoption of Horovod allowed us to distribute the training on multiple GPUs and significantly reduce the time required to complete the optimization of the model.The maximum number of GPUs used in the present work is 16, a configuration with which we obtained a speed-up of 14x on the JUWELS-BOOSTER and 12x on the DEEP-ESB partitions compared to the use of a single GPU (shown in Fig. 5).The scaling efficiency was close to 90% on the JUWELS-BOOSTER and above 75% on the DEEP-ESB partitions, respectively.In both cases, the scaling efficiency declined more steeply with 8 and more GPUs, possibly due to the increased communication time (time spent to synchronize the gradient among the GPUs) w.r.t. the computation time (time spent to optimize the model on each local GPU, which decreases proportionally to the increase of the number of GPUs, since each GPU is fed a smaller portion of the entire dataset).It can be noted that the efficiency shrinks more prominently on the DEEP-ESB partition.This behaviour could be explained by the fact that on the DEEP-ESB partition each node is equipped with only one V100 GPU, while each node of the JUWELS-BOOSTER partition has 4 GPUs.This means that when using the DEEP-ESB partition the communication is only internode (the nodes are connected through InfiniBand), while on the JUWELS-BOOSTER partition the communication takes place both inter-and intra-node (faster NVLink connections).We performed 3 runs for each experiment, and the reported results are the average and standard deviation.Fig. 6 shows the training time that was reduced from 175 and more than 200 seconds using 1 GPU to 12 and 14 seconds per epoch (16 GPUs) on the DEEP-ESB and JUWELS-BOOSTER partitions, respectively.The JUWELS-BOOSTER, which features the newer A100 GPUs, allowed us to obtain a 20% increase in performances in terms of training time compared to the V100 installed on the DEEP-ESB partition.

VI. CONCLUSION
In this paper we introduced a method to densify and harmonize TSs of images acquired by Landsat-8 and Sentinel-2 satellite.The proposed method, which is based on a multispectral adaptation GAN, was applied to a TS which covers 6 acquisitions in 2018.We designed an experimental setup to validate our approach by comparing it with the well established HLS.The results obtained demonstrate that the proposed GAN is able to accurately reconstruct the spectral properties of Landsat-8 by using the Sentinel-2 images.Moreover, the qualitative comparison with the baseline method confirms the quantitative evaluation of the spectral distortion metrics.Although the physical model employed to harmonize Sentinel-2 and Landsat-8 is a powerful tool to generate long and dense TSs of optical satellite images, the proposed method achieves more accurate results from the spectral view point.Another  important result is provided by the classification accuracy obtained when considering the TS of 6 images, which allow us to test the capability of the network to accurately predict synthetic Landsat-8 images never used to train the MGAN.The OA% was increased from 87.83% (TS of 5 synthetic Landsat-8 images) to 91.53 % (TS of 6 synthetic Landsat-8 images).Moreover, we deployed the entire workflow in an HPC environment, and with the utilization of Horovod we could make an efficient use of the resources provided by such system, reducing the time required for the training of the model.
Although in this work we demonstrated that our approach can successfully densify TSs of Landsat-8 images, several challenges remain open.We focused our attention on one single region where we could validate our method also in terms classification; however, our approach should be also extended to include different areas in the future.A strategy to ingest new data from different TSs and scale the training should be drawn up, in order to make the training of the with larger amount of data feasible in a reasonable amount of time.Further effort should be also put on finding the optimal hyperparameters of the training, such as the optimizers, learning rate, scheduler.Neural Architecture Search (NAS) could be employed to optimize the structure of the model, i.e., the number and type of layers, the activation functions, etc.Further loss functions should be also added, although this would significantly increase the space of the hyperparameters search, and a trade-off with available computational resources should be found.A repository with the code is available at https://gitlab.jsc.fz-juelich.de/sedona3/mgan.

Fig. 1 .
Fig. 1.Flowchart of the modified U-Net tailored to the peculiar spectral and spatial properties of Sentinel-2 and Landsat-8.

Fig. 2 .
Fig.2.True color representation of the Sentinel-2 image acquired on the 21/04/2018 over the considered study area (coordinates are reported in the UTM WGS84 33N system).An example of the reference data used to perform the crop type classification task is reported in the zoom area highlighted in red.

Fig. 3 .
Fig. 3. Flowchart of the different stages of the proposed method.It receives as input TSs of Landsat-8 and Sentinel-2 acquired over the same geographical area.Firstly, the warping step aligns the two TSs.Then, the patch extractor generates paired and overlapping patches (i.e., training samples) that are trained by the proposed MGAN.The final stage reconstructs the whole synthetic Landsat-8 image.

Fig. 5 .
Fig. 5. Training time per epoch w.r.t. the number of GPUs on the JUWELS-BOOSTER and DEEP-ESB partitions.

Fig. 6 .
Fig. 6.Time per epoch w.r.t. the number of GPUs on the JUWELS-BOOSTER and DEEP-ESB partitions.

TABLE I SPECTRAL
BANDS OF LANDSAT-8 AND SENTINEL-2 SELECTED ACCORDING TO THE SPECTRAL AGREEMENT OF THE OPTICAL SENSORS.

TABLE II LANDSAT
-8 (TILE 191/026) AND SENTINEL 2 (TILE 33UVP) IMAGES USED IN THE EXPERIMENTS.FIVE IMAGES WERE USED TO TRAIN THE MGAN, WHILE THE SENTINEL 2 IMAGE ACQUIRED ON THE 03/07/2018 WAS USED FOR PREDICTION ONLY.

TABLE IV SPECTRAL
(1)TORTION METRICS BETWEEN THE ORIGINAL LANDSAT-8 DATA AND:(1)THE SYNTHETIC LANDSAT-8 IMAGES GENERATED USING THE PROPOSED MGAN, (2) THE HARMONIZED LANDSAT-8 IMAGES GENERATE USING THE BASELINE METHOD HLS, AND (3) THE ORIGINAL CONTEMPORARY SENTINEL-2 IMAGES.THE OBTAINED RESULTS ARE THE AVERAGE VALUES OVER THE 5 IMAGES OF THE CONSIDERED DATASET.RESULTS ARE PROVIDED PER SPECTRAL BAND AND OVERALL.THE BEST RESULTS ARE HIGHLIGHTED IN BOLD.

TABLE V CROP
TYPE MAPPING RESULTS OBTAINED BY CONSIDERING TSS OF: (1) 5 SYNTHETIC LANDSAT-8 IMAGES PRODUCED BY THE PROPOSED MGAN, (2) 5 HARMONIZED LANDSAT-8 IMAGES OBTAINED BY USING THE BASELINE METHOD (HLS), AND (3) 6 SYNTHETIC LANDSA-8 IMAGES PRODUCED BY THE PROPOSED MGAN.THE TS OF 5 IMAGES WERE PRODUCED BY THE PROPOSED AND THE BASELINE METHODS USING BOTH THE ORIGINAL SENTINEL 2 AND LANDSAT 8 IMAGES.TO GENERATE THE TS OF 6 IMAGES, A SENTINEL-2 IMAGE ACQUIRED ON THE 03/07/2018 FOR WHICH NO CORRESPONDING CLOUDLESS LANDSAT-8 DATA ARE AVAILABLE WAS USED.PA IS THE PRODUCER'S ACCURACY OR RECALL, UA IS THE USER'S ACCURACY OR PRECISION, F1 IS THE F1-SCORE AND OA IS THE OVERALL ACCURACY.