Spectral Synthesis for Geostationary Satellite-to-Satellite Translation

Earth-observing satellites carrying multispectral sensors are widely used to monitor the physical and biological states of the atmosphere, land, and oceans. These satellites have different vantage points above the Earth and different spectral imaging bands resulting in inconsistent imagery from one to another. This presents challenges in building downstream applications. What if we could generate synthetic bands for existing satellites from the union of all domains? We tackle the problem of generating synthetic spectral imagery for multispectral sensors as an unsupervised image-to-image translation problem modeled with a variational autoencoder (VAE) and generative adversarial network (GAN) architecture. Our approach introduces a novel shared spectral reconstruction loss to constrain the high-dimensional feature space of multispectral images. Simulated experiments performed by dropping one or more spectral bands show that cross-domain reconstruction outperforms measurements obtained from a second vantage point. Our proposed approach enables the synchronization of multispectral data and provides a basis for more homogeneous remote sensing datasets.


I. INTRODUCTION
C LIMATE change and related environmental issues-including the loss of biodiversity and extreme weather-are listed by the World Economic Forum as the most important risks to our planet [1]. Monitoring the Earth is critical to mitigating these risks, understanding the effects, and making future predictions [2]. Multispectral and hyperspectral satellite-based remote sensing enables global observation of the Earth, allowing scientists to study large-scale system dynamics and inform general circulation models [3]. In weather forecasts, satellite data initialize the atmospheric state for future predictions. On longer time scales, these data are used to measure the effects of climate change, such as land-cover variations, temperature trends, solar radiation levels, and the rate of snow/ice melt. In the coming decades, increased investments from the public and private sectors in satellite-based observations will continue to improve global monitoring, as highlighted in NASA's decadal survey [4].
Satellites are designed based on specifications for a given set of applications with fiscal, technological, and physical constraints that limit their temporal, spatial, and spectral resolutions. Geostationary (GEO) satellites rotate with the Earth to stay over a constant position above the equator at a high elevation of 35 786 km. This position enables GEO satellites with onboard multispectral imagers to take continuous and high-temporal snapshots over large spatial regions and is ideal for monitoring diurnal and fast moving events. Spectral bands measure the brightness and radiance intensities of the electromagnetic spectrum at a specified center wavelength and bandwidth. Bands are selected to satisfy defined variables of interest constrained by technological cost and accuracy. Applications of GEO sensors include atmospheric winds measurement [5], tropical cyclone tracking [6], wildfire monitoring [7], and short-term forecasting [8]. Multiple GEO satellites are needed to generate global high-temporal resolution datasets to better monitor these events around the world. However, variations in resolutions, sensor uncertainties, and temporal life spans lead to a set of separate datasets that are not consistent, making this process very challenging [3]. Developing consistent and homogeneous global datasets would relieve many of these challenges.
The current generation of GEO satellites (shown in Fig. 1) is now exception. The GOES-16/17 satellites operated by NASA/NOAA (cost: $11 billion) have a set of 16 imaging bands covering the visible, near-, and thermal-infrared spectral range [9]. The Himawari-8 satellite operated by the Japanese Space Agency (cost: $800 million) similarly has 16 bands but swaps an NIR (1.38 μm) band for a green channel (0.51 μm), enabling the construction of true-color images [10]. The 1.38-μm band is ideal for measuring Cirrus clouds, composed of ice particles in the upper troposphere, a major contributor to regulating the Earth's climate that is not yet well understood [11], [12]. Without capturing this band, directly observing Cirrus clouds over Japan, East Asia, and Western Pacific region from Himawari-8 is not possible. Synthetic observations via virtual spectral sensors could be a low-cost solution to improving coverage availability and consistency with current satellites. We present an approach to generate synthetic spectral channels from a multidomain unpaired satellite dataset. We treat satellites with either dissimilar spectral coverage or varying vantage points as separate spectral sets. In this way, the problem closely resembles that of colorization [13] and imageto-image translation tasks [14]- [16] in the case where paired images are not available but with the added complexity of a large number of spectral bands. We use a combination of variational autoencoder (VAE) and generative adversarial network (GAN) [17] architectures adapted to our problem to model a shared latent space, as in unsupervised imageto-image translation [14]. Generating synthetic bands is an underconstrained problem that paired with an adversarial loss in high dimensions, which promotes overfitting. Our approach mitigates these challenges by leveraging a weak supervision signal based on partial overlap in spectral bands between domains. By including a reconstruction loss on overlapping spectral bands between domain pairs, we can substantially improve spectral band synthesis.
To summarize our contributions, we: 1) introduce a shared spectral reconstruction loss to a VAE-GAN architecture for synthetic band generation; 2) test our methodology on real-world scenarios; and 3) present and release a test dataset of 2000 tiles of paired observations from the GeoNEX L1G GEO imagery for future research. In Sections II-IV, we will introduce related work in remote sensing and image-to-image translation, describe the architecture, and review experiments. Finally Section V, we will discuss the implications on this work and conclude with future directions.

A. Remote Sensing
Current generation GEO satellites observe 16 spectral bands over large regions every 10-15 min at a 0.5-2-km resolution. At a suboptimal 2 km, this produces full-disk images of size 5424×5424×16, which causes storage constraints while being computationally expensive to process. Physical and statistical models are used to convert these images into more easily interpreted variables, such as precipitation, cloud cover, and surface temperature [18]. Multiple GEO satellites, currently in orbit, extend the spatial ranges to actively monitoring larger regions. However, differences in spectral bands and sensor uncertainties/biases present challenges to commonly used sensor-specific models, and especially, existing downstream models do not generalize well to missing spectral information.
Spectral band adjustment is often applied to cross calibration of sensors using relative spectral responses with a hyperspectral sensor, such as Hyperion [19], [20]. This approach uses Hyperion to calculate spectral band adjustments factors (SBAFs) from relative spectral responses and has been applied to a number of datasets. For instance, SBAF was applied to Harmonized Landsat and Sentinal-2 for accurate cross comparison with the MODIS dataset [21]. Similarly, the work [20] used SBAF to evaluate long-term AVHRR surface reflectance datasets with a quadratic normalized difference vegetation index (NDVI). However, as outlined in these studies, applying SBAF requires an intermediate simulated or hyperspectral sensor to perform this translation. In contrast, our approach learns this intermediate sensor as a "shared" latent space capturing dependent information content between sensors. The following sections will develop our approach in the context of neural networks (NNs) and deep generative modeling.
Neural models have long been applied to process remote sensing data and generate downstream products. Hsu et al. [22] presented some of the first work that showed NNs could generate accurate and high-resolution precipitation products from satellite observations. In recent years, convolutional neural networks (CNNs) have been found to further improve this task [23]. Similarly, CNNs have successfully been applied to poverty mapping [24], super-resolution [25], subpixel classification [26], model emulation [27], and land-cover classification [28], all from low-level satellite products. In terms of spectral synthesis, a few studies have explored reconstruction of hyperspectral bands from RGB bands with supervised approaches [29], [30]. While many of these problems are within the class of image-to-image translation, they generally assume that labels are widely available and focus on individual sensors. To the best of our knowledge, no studies have developed approaches to synthesize spectral information by learning across satellites in the unsupervised setting.

B. Image-to-Image Translation
Many problems can be defined as an image-to-image translation task, including super-resolution, style transfer, and colorization. Approaches to image-to-image translation have been developed for both supervised and unsupervised settings to map images from one domain to another. In the supervised setting, image pairs are available to learn a direct mapping from one to the other. GANs have been shown to be highly successful at this task [31], [32]. Numerous unsupervised learning methods have been developed for the common case of large unpaired datasets [14], [15], [33], [34]. CycleGAN, for instance, proposed an approach to directly map from one domain to another and back by incorporating a cycle-consistency loss with a GAN [15]. UNIT [14] proposed a probabilistic approach that uses an intermediate latent space between domains with a VAE [35] and GANs [17]. In contrast to prior work on image-to-image translation, our scenario specifically requires spectral translation and across multiple domains. Rather than translating between relatively low-dimensional RGB images and segmentation maps, as is found in traditional multimodal image-to-image translation [36]- [38], satellite imagery contains tens to hundreds of spectral bands. Domain adaptation is another area of active research, which also considers the case of effectiveness in unseen environments with cycle consistency and domain invariant [39], [40]. Lee et al. [41] used a shared content loss to translate between RGB image styles. Sanchez et al. [42] presented an application of image-to-image translation for four-band Sentinel-4 images between different times of day. Using VAEs and GANs, their approach provided 94.5% accuracy in classifying ten land-cover types in the EuroSat dataset. Our approach is based on the proven fundamental techniques of learning a shared latent space using cycle-consistency and adversarial losses extended in the spectral dimension. We also use the prior understanding of spatial consistency between domains to implement a partial skip connection.

C. Variational Autoencoders
Autoencoders (AEs) are widely used in deep learning to encode high-dimensional data to a lower dimension representation [43]. AEs consist of two stages, encoding and decoding networks, and are learned in an unsupervised manner. The encoder network E(x) takes an input x to a low-dimensional representation and inputs a decoder, written as G(E(x)). AEs can then be trained with a mean square error loss to reconstruct the input, x − G(E(x)) 2 . While AEs are well suited for compressing a high-dimensional feature space, there is little control on the distribution of the latent space and can lead to severe overfitting.
In contrast to AEs, VAEs are generative models, which aims to learn an intermediate "latent" variable z as a compressed representation of the input. This is done by first encoding the data to z followed by a sampling operation and decoding to the original domain [35]. More formally, VAEs model the latent space as a probability distribution such that the likelihood is written as p(x|z) = p(z|x) * p(z). The prior distribution p(z) is generally assumed to be Gaussian where p(z) = N(0, I ). The posterior distribution p(z|x) is then approximated as q(z|x) = N (E(x), c * I ), where c is a constant. Conditioning on the latent space z to reconstruct x is then written as p(x|z) = G(z). The loss of a VAE is then written as where KL is the Kullback-Leibler divergence measuring the relative entropy between two probability distribution. KL is formally defined as With this approach, we are able to generate previously unseen examples from a constrained probabilistic latent space. Previous examples of VAEs in remote sensing include classification [44], hyperspectral unmixing [45], and feature extraction [46].

D. Generative Adversarial Networks
GANs have been found to be an effective approach to generate outputs that could realistically exist in the training set [17]. These models have been widely applied in computer vision for generating high-resolution images [47], medical imaging [48], and many others. In remote sensing, GANs have been shown to be effective at super-resolution [49], pan sharpening [50], and hyperspectral classification [51]. The idea of GANs is to learn a classifier to discriminate between real and generated examples, named an adversarial network. The generator then aims to fool the discriminator D(x) by learning to produce realistic looking outputs from a random variable z. To learn these functions, the generators G(z) and D(x) compete with each other as a minimax optimization problem. The corresponding loss for GANs can be written as: where z is a random variable. This loss can then be optimized through gradient descent.

III. APPROACH
VAEs and GANs are effective for image-to-image translation where pairs of images are not available [14]. This is the case for satellites with no space-time overlap. However, as in [14], a shared latent variable z can be used to approximate the joint distribution from marginal. An adversarial loss applied to cross reconstructions satisfies the shared latent space assumption but is underconstrained for high-dimensional, multispectral images. We shall observe that this leads to large errors in our task. To address this, we introduce a shared spectral reconstruction loss and skip connection to effectively generate synthetic spectral bands (see Fig. 2), and the result is a 50%-80% reduction in mean absolute error (MAE).
In the spectral domain, we consider the case of K satellites,  diagram in Fig. 2. The union of all sets, ∪ K i=1 S k , represents the complete set of spectral channels in the data. We denote the intersection of two spectral sets as overlapping bands. Our goal is to generate synthetic bands well where S i ∩ S c j = ∅ for ∀(i, j ), with c the complement. A shared latent variable z is modeled with a Gaussian prior to learn a general representation for mapping between sets such that the assumptions of shared spectral reconstruction, weight sharing, cycle-consistency, and cross-domain adversarial losses are satisfied.

A. VAE-GAN
For a given spectral set k, we define encoder-generator pairs {E k , G k } such that q(z k |s k ) = N (E k (s k ), I ) andŝ k→k k = G k (z k ∼ q k (z k |s k )) for s k ∈ S k . For any set j ,ŝ k→ j k corresponds to reconstruction from set k to j . The set of encoders {E 1 , E 2 , . . . , E k } shares their last layer of weights to constrain the latent space to high-level representations. Using prior p η (z) ∼ N (0, I ), the VAE likelihood is defined as Distributions p G k are modeled as Laplacian distributions and a Gaussian latent space with prior z ∼ N (0, I ). GANs are used to enforce realistic spatial/spectral distributions of reconstructed images from the latent space. Discriminator networks D 1 -D k compare observations with cross reconstructions from the latent space Network architectures for E k , G k , and D k follow those used in [14]. Our encoder, E k , is a CNN with one downsampling layer and 64 units in the latent space. Weights of the last layer are shared between encoders to constrain the latent space. The decoder, G k , is also a CNN with four residual blocks and one convolutional transpose layer for upsampling. The discriminator, D k , consists of two hidden layers with leakyrelu activation and average pooling and is learned with a least-square GAN loss. For details, we refer the reader to the code in the Supplementary Material. 1

B. Cycle Consistency
VAE and GAN losses are underconstrained and do not satisfy the shared latent space constraint alone. As in [14], a cycle-consistent loss is used such that s k = F j →k (F k→ j (s k )) for all satellite pairs ( j, k), where F k→ j (s k ) = G j (E k (s k )). The loss between s k and cycled reconstructionŝ k→ j →k k is written as With multiple domains, each domain should cycle through every other domain. The cycle-consistency loss for each permutation results in a complete cyclical graph. This loss is written as

C. Shared Spectral Reconstruction Loss
Adversarial losses can be easily fooled with increased dimensions. To help avoid this, we introduce an additional Algorithm 1 Generate a Synthetic Band by Translating From One Satellite to Another Result: Synthetic spectral band Image s k from satellite k; Encode to latent space z = E k (s k ); Decode to other satellites j = G j (z); Select synthetic band froms j ; loss, L SSR k . In this problem, if the intersection of spectral channels S k, j = S k ∩ S j between domains is not empty, then the difference between p(s k→k k |z k ) and p(s k→k k |z k ) can be minimized with KL divergence wheres k ∈ S k, j . The SSR loss encourages decoders to reconstruct identical spectral wavelengths with similar distributions while still synthesizing dissimilar bands. In this scenario, partial constraints are placed between domains and allow sampling of unobserved spectra from the shared latent space. By decreasing θ 6 , the bias between bands will be relaxed, which may reduce the effect of more uncertain domains.

D. Total Loss
The likelihood is maximized by optimizing the GAN minimax problem such that the generator aims to fool the discriminator, alternating updates between (E, G) and (G, D) The hyperparameters used correspond to those in [14] and set as θ 1 = 1, θ 2 = 0.01, θ 3 = 1, θ 4 = 1, θ 5 = 0.01, and θ 6 = 0.1. The Adam optimization is used to train the networks for 200 000 steps with a batch size of 8 with parameters β 1 = 0.5, β 2 = 0.999, and learning rate 1e − 5. The reader can find the detailed information in the Supplementary Material. In the following, we show the steps for generating a new band.

E. Data
Three GEO satellite imagery datasets, GOES-16 (G16), GOES-17 (G17), and Himawari-8 (H8), are used in our experiments. Each satellite captures hemispheric (full-disk) snapshots from a constant vantage point over time but of different regions. Examples are shown in Fig. 1. Images contain 16 bands (channels) in the visible, near-infrared, and thermal spectrum at 0.5-2-km spatial resolution (see Table I). G16 and G17 have identical specifications viewing the east and west regions of North America and include two visible (blue, red), four near-infrared (including cirrus), and ten thermal infrared bands. H8 has 15 overlapping bands with G16/G17 viewing the Pacific Ocean and East Asia, and this ensures similar information content. H8 captures three visible (blue, green, and red), three near-infrared (missing cirrus), and the same ten thermal infrared bands as G16/G17. Visible and near-infrared bands are measured as reflectances in the unitless  37 μm) exists in G16 and G17 but is not in H8, and green (0.51 μm) exists in H8 but not in G16 or G17. These differences cause difficulties when applying models relying on green or cirrus bands across satellite sets. G16 observes the North, Central, and South Americas, capturing a good distribution of land and ocean. G17 observes the Pacific Ocean as well as most of North and Central America. However, G17 has known problems with its thermal cooling system causing the near-and thermal-infrared channels to be unusable during periods of high heat and biased throughout [52]. This further highlights the gain in replacing low-quality bands of G17 with a virtual sensor. Periods of high heat are filtered out of our training and test sets with quality control checks to eliminate the temporal periods of known uncertainty. After quality control, considerable space-time overlap between G16 and G17 can be used for testing. H8 observes East Asia, Australia, and the Western Pacific, partially overlapping with G17. Discrepancies are expected between sensors caused by different solar and sensor viewing angles, but we are not aware of a more appropriate dataset for evaluation. The data generated by Wang et al. [53] are used, which normalized G16, G17, and H8 to a common georeferenced gridding system in order to facilitate intercomparisons and processed with the bidirectional reflectance distribution function (BRDF). Bands have resolutions varying from 500 m to 2 km which we interpolate to a common suboptimal resolution of 2 km. Full-disk images are on a common grid with tiles of size 300 × 300 × 16. Training data are generated from the multipetabyte datasets. We randomly sample images to build a well-distributed and large training dataset from years 2018 (G16, H8) and 2019 (G17), which totaled 359 GB of data. Each tile is split into 64 × 64 × 16 nonoverlapping patches for training, generating millions of samples. A test set including 2000 randomly selected tiles from 2020 from overlapping G16 and G17 observations (see Fig. 3). The random set of tiles assures a range of solar angles, system patterns, and land-cover types. This dataset will be made publicly available consisting of tiles from each satellite. Similarly, a pair of overlapping tiles of data from G17 and H8 on January 2, 2019, at 04:00UTC are selected to compare synthetically generated green and cirrus bands (spatial overlap of G17/H8 is mostly ocean).

IV. EXPERIMENTS AND DISCUSSION
In this section, we present a set of experiments to explore the properties of our approach by testing which bands can be robustly synthesized, how many bands can be generated, and how effectively the proposed loss performs. The metrics Bias, MAE, and Precision are used for evaluation. Relative metrics, RBias and RMAE, are computed by dividing by the mean intensity of the relevant band.

A. Individual Band Synthesis
Our experiments start with testing how well each spectral band can be synthesized. To do this, we remove individual bands from one satellite (G16) during training, synthesize these bands, and compare with the ground-truth observations. We use the full set of bands from the other two satellites during training. This approach is applied on G16 such that each model takes 15 bands of G16 and 16 of G17 and H8.
Three comparisons are used to help put the accuracy of synthesized bands for G16 into context. Ours refers to bands generated using our proposed approach with both SSR loss and skip connection. Ours w/o SSR refers to bands generated with our proposed approach without the shared spectral reconstruction loss. Ours w/o Skip refers to bands generated with our proposed approach without the skip connection between the input and generator. UNIT refers to the unsupervised image-to-image translation baseline as presented in [14] and is equivalent to ours without SSR and skip connection. Sensor refers to the performance if we simply use overlapping observations from another satellite (G17), and this acts as our lower bound in performance and is actually the status-quo (essentially substituting the missing band with images from the same band but from another satellite, which as we shall see is a suboptimal solution). Reconstruction refers to the images reconstructed from a full model trained on all satellites with no missing bands, and this acts as our upper bound in performance. Each of these signals is computed on our test set of 2000 overlapping tiles from G16/G17. Fig. 4 shows the RBias, RMAE, and Precision for each condition. Similarly, Table II shows the average metrics for VIS/NIR and TIR of each method. The MAE in the sensor condition is substantial and largely caused by clouds/aerosols in the vertical direction (see gif in the Supplementary Material). On the other side, synthetically generating bands using our approach substantially reduces MAE by over 40% compared to both this baseline and UNIT (see Table II). Similarly, synthesized bands also improve upon the view from G17 even though during training they did not see examples of the corresponding band from G16. Mean biases for all approaches are all near zero but have cross-sensor, UNIT, and ours w/o SSR each have high variance, corresponding to similarly low precision. In contrast, our approach performs similar to reconstruction with lower variance around the mean bias and high precision. Ablation experiments removing the shared spectral reconstruction loss and the skip connection show their effectiveness. SSR is critical to learning a robust latent space and the skip connection improves both VIS/NIR and TIR predictions in terms of RBias, RMAE, and precision. We observe that without introducing the SSR loss, performance is even worse than the sensor baseline. From this, we learn that applying an existing image-to-image translation model [14] to our task, without adaptation, performs poorly. We find that band 7, the shortwave infrared band (3.9 μm), is particularly difficult to synthesize with RMAE, RBias, and precision significantly above that of the full reconstruction. This result suggests that the shortwave infrared band captures information that cannot be inferred from the others. Notice how the wavelength gap between bands 7 and 8 is relatively large (2.3 μm), and this may explain why the performance is poor. In the future, a similar and more extensive analysis could be used to inform future satellite design configurations.
We show qualitative examples of generating synthetic bands in Figs. 1 and 7. GOES-16 and GOES-17 images in Fig. 1 show the examples of true-color images generated from a synthetic green band. This process is applied to Himawari-8 to generate a cirrus band (shown in Fig. 7). While there may be challenges in synthetically generating all bands, most can be reconstructed with a high signal-to-noise ratio and this suggests that our approach could be used to make software updates to current satellite datasets.

B. Land Versus Ocean Conditions
We analyze land versus ocean conditions using the models trained in the previous experiment synthesizing individual bands. Specifically, for each sample, we compute RBias, RMAE, and Precision for land and ocean pixels separately using the MODIS land/water mask shown in Fig. 3. The results are shown in Fig. 5 boxplots such that the columns represent metrics and rows correspond to VIS/NIR and TIR bands.
Overall, we find larger errors and reduced precision over land conditions compared to ocean areas. However, the bias, error, and precision of our approach outperform baselines cross sensor and UNIT consistently. Precision for VIS/NIR suggests that land areas are more challenging to recover compared to ocean but continuously outperform the baselines. Future work extended to surface reflectance may consider a similar analysis over different land-cover types. Results on TIR bands show high precision with very low bias.

C. Synthesizing Multiple Bands
Generating synthetic channels from satellites with a limited number of spectral bands could be of significant value for long-term analysis. For example, older generation satellites  often have fewer channels and could provide greater utility in downstream tasks if it was possible to generate images in additional frequency bands. Therefore, we set up an experiment to test how many additional bands can be synthesized reliably and how many initial bands are required. A set of synthesis models were trained on G16, removing bands one by one until just one band was left and while keeping all 16 G17 and H8 bands. For simplicity, and to reduce computation, we dropped bands in a fixed order: 9, 4, 13, 2, 15, 12, 6, 3, 10, 8, 14, 5, 11, 7, and 16. In the most extreme case, we use visible band 1 and attempt to synthesize the remaining 15. As above, results are computed on the test set of 59 overlapping G16 and G17 tiles. The results presented in Fig. 6 (left) show how the number of available input bands affects the MAE for VIS/NIR (bands 1-6) and TIR (bands 7-16). As expected, MAE falls more or less monotonically as more bands are given as inputs. When just two bands, 1 (blue) and 16 (TIR), are used as inputs, the synthetic TIR reconstruction of G16 still has a lower error than the observed sensor difference between G16 and G17. These results show that few bands are needed to synthesize images that improve upon the status quo. In the TIR range, we find that MAE plateaus after 3-4 bands are used as inputs. These results suggest that the information content in a subset of bands may be sufficient for many applications. However, we should be prepared that some bands may contain specific information useful for monitoring rare events. Overall, these results show that a good proportion of bands can be synthesized remarkably well.

D. Sharing Spectral Losses
The effectiveness of the shared spectral reconstruction loss is tested by gradually increasing the number of shared bands included in the loss one by one. Mathematically, this corresponds to the number of bands included in the set S k, j . In all runs, 16 bands of G16, G17, and H8 are used even if ignored by the SSR loss. Fig. 6(iii) and (iv) shows that the effect of adding shared bands during training leads to a dramatic decrease in MAE. Corresponding cross-sensor signals are shown as horizontal lines. In this setting, we find that using the SSR loss is critical to learning this model. Sharing two spectral bands in the loss function improves the signal and is almost all that is needed for accurate reconstruction. This further reinforces our insight above that a large amount of the information is captured in just a few spectral bands. In Table III, we further explore the SSR loss by testing a range of values for θ 6 from 0.01 to 10. Our results suggest that increasing SSR weighting factor improves the performance on the test set.

E. Synthesizing Cirrus for Himawari-8
As discussed above, the cirrus band (1.38 μm) monitors ice particles in the upper troposphere which regulate the climate, and H8 is missing this band. These ice particles are often seen as thin clouds high in the atmosphere, which may be viewed in the visible range, along with other clouds. To generate a synthetic cirrus band, an H8 observation is translated to G17. In Fig. 7, we show four images where G17 and H8 have space-time overlap but different viewing angles. Fig. 7(b)-(e) shows the cross-sensor G17 cirrus band, corresponding H8 synthetic cirrus band, absolute difference between cross sensor and synthetic, and histogram of differences, respectively. This scene consists of clouds of multiple types and atmospheric heights on June 10, 2019, at 04:00UTC. Cirrus clouds are found high in the atmosphere and are seen as thin or wispy (see the lower right portion of the images). Visually comparing images 7(b) and (c) shows the similarity between synthetic bands and observations. Lower level clouds, which can be seen throughout the true-color image, are ignored by both the observed and synthetic cirrus bands. However, from Fig. 4, we know that cross-sensor errors are large relative to real observations. Hence, the quantitative difference between synthetic and cross sensor is unsurprising. The differences are normally distributed, which indicates random noise. These results suggest that our learned latent space can represent unobserved bands, distinguishing different types of material.

F. Limitations
While the VAE-GAN architecture performs well overall, it does present some limitations. VAEs aim to explicitly model the data as a multivariate Gaussian and often produce blurry outputs. The GAN counteracts this effect by discriminating between real and generated images. However, there is a concern that this reduces data precision and fails to detect rare and anomalous events which may affect scientific applications. Extending our work to use normalizing flows, as in [16], may reduce this limitation.

V. CONCLUSION
We have presented an unsupervised learning approach for satellite-to-satellite translation that can be used to synthesize unobserved spectral bands. A novel shared spectral reconstruction loss is presented to further constrain learning and conserve spectral information and a partial skip connection maintains spatial consistency. Experiments with sensors on the GOES-16/17 and Himawari-8 satellites show that synthetic spectral bands can be generated through reconstruction from a shared latent space. For the first time, we are able to generate true-color images from GOES-16/17 and the cirrus band from Himawari-8, generating further value from these satellites. Future work may consider conditioning the shared latent space with known physical properties and extending to additional tasks.