Temporal Interpolation of Geostationary Satellite Imagery With Optical Flow

Applications of satellite data in areas such as weather tracking and modeling, ecosystem monitoring, wildfire detection, and land-cover change are heavily dependent on the tradeoffs to spatial, spectral, and temporal resolutions of observations. In weather tracking, high-frequency temporal observations are critical and used to improve forecasts, study severe events, and extract atmospheric motion, among others. However, while the current generation of geostationary (GEO) satellites has hemispheric coverage at 10–15-min intervals, higher temporal frequency observations are ideal for studying mesoscale severe weather events. In this work, we present a novel application of deep learning-based optical flow to temporal upsampling of GEO satellite imagery. We apply this technique to 16 bands of the GOES-R/Advanced Baseline Imager mesoscale dataset to temporally enhance full-disk hemispheric snapshots of different spatial resolutions from 10 to 1 min. Experiments show the effectiveness of task-specific optical flow and multiscale blocks for interpolating high-frequency severe weather events relative to bilinear and global optical flow baselines. Finally, we demonstrate strong performance in capturing variability during convective precipitation events.


I. INTRODUCTION
E VERY second, satellites around the earth are generating valuable data to monitor weather, land cover, infrastructure, and human activity. Satellite sensors capture reflectance/radiance intensities at designated spectral wavelengths, spatial, and temporal resolutions. Properties of the sensors, including wavelengths and resolutions, are optimized for particular applications. Most commonly, satellites are built to capture the visible wavelengths, which are essentially RGB images. Scientific specific sensors capture a larger range of wavelengths, such as micro, infrared, and thermal waves, providing information to many applications at temporal frequencies from 1 min at mesoscale to multiple years at climate scales. Satellites carrying sensors typically follow one of two orbits, geostationary (GEO) and polar. Polar-orbiting satellites most often have daily or longer revisit times and include NASA's A-Train [1], Landsat-8 [2], and Sentinel [3]. Data provided by the Moderate Resolution Imaging Spectroradiometer (MODIS) [4] carried on Terra/Aqua and Landsat are widely used for quantifying effects of climate change, land-cover usage, and air pollution, among others, but are not well suited to monitoring high-frequency events. On the other hand, GEO satellites are well suited for subdaily events, such as tracking weather and studying diurnal cycles. While a high altitude reduces spatial resolution, current generation multispectral sensors on GEO satellites are able to provide 1-15-min observations, enabling immense opportunity for understanding atmospheric, land cover, and oceanic dynamics. High temporal frequency observations from satellites are critical for studying extreme environmental events, such as storm tracking and wildfire detection, but, due to physical constraints, are not available at global scales. In this work, we present an approach to temporally interpolate global 10-min observations to 1 min by learning from spatially isolated mesoscale observations. Within a few years, a constellation of GEO satellites by multiple international institutions will provide global coverage of the earth's state. The latest generation of GEO satellites includes NOAA/NASA's GOES-16/17 [5], Japan's Himwari-8/9 [6], China's Fengyun-4 [7], and Korea's GEO-KOMPSAT-2A with future plans in development. Full-disk coverage from such satellites has revisit times of 10-15 min allowing applications to real-time detection and observation of wildfires [8], hurricane tracking, air quality, flooding, precipitation estimation, flood risk, and others [9]. Furthermore, given improved spectral and spatial resolutions in current-generation sensors, GEO satellites open future opportunities to incorporate and learn from less frequent observations from polar orbiters.
While 10-15-min revisit times are temporally sufficient for many applications, higher frequency snapshots can aid a variety of tasks. For instance, understanding rapidly evolving convective events is a high priority for improving atmospheric models, which are notoriously poor at simulating heavy precipitation, as highlighted in NASA's Earth Science Decadal Survey [10]. However, data for analyzing such events are often not available at the desired frequency. Similarly, comparing multiple satellite observations is dependent on their corresponding timestamps. This leads to an interpolation task between observations in a multispectral spatiotemporal sequence, similar to that of video interpolation.
Temporal interpolation requires a weighted combination of two images. The simplest approach to interpolation is to directly weigh each pixel in the image by the relative difference in time. Modern techniques to this problem use deep learning and optical flow to warp images based on movement captured by tracking apparent motion [11]- [13]. In this work, we present a novel application of deep learning-based optical flow to the problem of temporal interpolation between GEO satellite imagery. We compare properties of global and spectral specific optical flow interpolation using the SuperSlomo method (SSM) [13] with baselines. Our experiments show that spectral-specific SSM is capable of interpolating highfrequency atmospheric events. Furthermore, visual analysis suggests that the learned flow resembles atmospheric motion and dynamic visibility maps.
The remainder of this article is outlined as follows. Section II discusses related work including resolution enhancement in the earth sciences and the current state of video frame interpolation. Section III introduces the GOES-R dataset. Section IV details methodology. Experiments on a largescale dataset and a severe storm case study are presented in Section V. Finally, Section VI concludes with challenges and future work.
II. GOES-R SATELLITE DATASET GEO satellites are synchronized in orbit with the earth's spin to hover over a single location. Given this location, the sensor, measuring radiation as often as possible, can frequently capture data over a continuous and large region. This feature makes GEO satellites ideal for capturing environmental dynamics. The GOES-R series satellites, namely, GOES-16/17 (east and west sides of the Americas), operated by NASA and NOAA provide scientists with unprecedented temporal frequency enabling real-time environmental monitoring using the Advanced Baseline Imager (ABI) [14]. GOES-16/17 senses 16 bands of data that are viewed in Fig. 2 and listed in Table I   These GEO satellites are particularly useful in tracking weather, monitoring high-intensity events, estimating rainfall rates, fire detection, and many others in near real time. The mesoscale mode gives forecasters the ability to "point" the satellite at a user-specific subregion for near-constant monitoring of severe events. For example, GOES-16 and GOES-17 actively provide emergency response units tools for decision-making during wildfires in the Western United States. These high-frequency data provide valuable information on environmental dynamics and retrospective analysis, such as studying convective events [15]. Furthermore, mesoscale data can be used to inform techniques to produce higher temporal resolution CONUS and full-disk coverage. In this work, we develop a model to improve the temporal resolutions of CONUS and full disk by learning an optical flow model to interpolate between consecutive frames. With this, we are able to generate 1-min full-disk artificially enhanced data.

III. RELATED WORK
In this section, we begin by reviewing previous work in the areas of data fusion and resolution enhancement as applied generally to remote sensing satellite imagery and some recent successes of deep learning in the area. Second, we provide a brief review of optical flow and video frame interpolation techniques.

A. Resolution Enhancement of Satellite Data
Earth science datasets are complex and often require extensive preprocessing and domain knowledge to effectively render themselves useful for large-scale applications or monitoring. Such datasets may contain frequent missing values due to sensor limitations, low-quality pixel intensities, incomplete global coverage, and contamination with atmospheric processes related to cloud and aerosols. Furthermore, spatial and temporal resolution enhancement is often applied to improve analysis precision. Techniques to handle these challenges have been developed and are widely applied across the remote sensing community. Many statistical and machine learning methodologies for improving spatial resolution have been explored and are an active area of research. Data fusion is one area where two or more datasets are fused to generate an enhanced product, often with both higher spatial and temporal resolutions [16]. The Spatial and Temporal Adaptive Reflectance Fusion (STARFM) algorithm, for example, uses Landsat and MODIS to produce a daily 30-m reflectance product by using a spectralwise weighting model [17]. Using CubeSats, Houborg and McCabe [18] present an approach to further leverage Landsat and MODIS for further spatiotemporal improvements. Similarly, nearest neighbor analog multiscale patch-decomposition data-driven models are used as state-of-the-art interpolation techniques for developing global sea surface temperature (SST) datasets [19]. In recent years, super-resolution techniques have presented state-of-the-art results for spatial enhancement of satellite images [20]- [22].
Approaches for temporal resolution enhancement of individual satellite observations have not been as well studied. Liebmann and Smith [23] presented the first linearly interpolated datasets filling in missing and erroneous longwave radiation many days apart to improve global coverage. Similarly, Kandasamy et al. [24] presented a comparison of multiple methods for interpolating between MODIS observations to generate a synthetic leaf area index dataset. A number of statistical techniques, including long-term climatology measures and time-series decomposition, were applied to smooth observation and fill gaps. Reference [25] presented an approach using linear interpolation on subdaily GEO imagery to match timestamps between multiple satellites. However, given more frequent observations by the recent generation of GEO observations, more complex methods beyond linear interpolation may be more applicable and accurate in the temporal domain.
Our work proposes to apply deep learning-based optical flow methodologies to optimize the interpolation problem. In recent years, a number of applications in processing and learning from satellite data have shown state-of-the-art results using deep learning. For example, Benedetti et al. [26] showed that recurrent and convolutional neural networks effectively assimilate multiple satellite images. Lanaras et al. [22] presented a global deep learning super-resolution approach for Sentinal-2 with a 50% improvement beyond traditional techniques. In terms of classification, DeepSat showed that normalized deep belief networks that are tuned were able to outperform traditional techniques for image classifica-tions [27]. Convolutional neural networks have been shown to effectively classify land use in remotely sensed images, from urban areas [28] to crop types [29].
While many studies have explored resolution enhancement spatially, and temporally, the authors are not aware of any prior work exploring temporal interpolation at the minute-tominute scale. Prior approaches on longer time scales have applied linear interpolation and nearest-neighbor techniques. We explore the applicability of a more complex optical flow approach to temporal interpolation at very high resolutions and use linear interpolation as our baseline, as applied in prior work. Atmospheric motion vectors (AMVs) are a related line of work focused on tracking the movement of clouds and corresponding heights to initialize numerical weather prediction and data assimilation models [30], [31]. Traditional techniques to AMVs use spatial similarities across time to approximate lateral movement and often produce low yields and coarsened resolution, which do not apply to temporal interpolation. Recently, the optical flow has been shown to be effective at this task [32], which could motivate future work.

B. Optical Flow and Video Frame Interpolation
Temporally interpolating between frames of images can be computed with a weighted average of two frames, warped or not, to a defined intermediate time, t. Linear interpolation is the simplest approach and can be written as I (t) = I 0 * t + I 1 * (1 − t), given two frames I 0 and I 1 . However, this technique fails to account for any motion, physical phenomena, and corresponding occlusion that occur between two frames giving a poor performance. Temporal interpolation techniques applied to video have shown high skill at generating slowmotion footage by generating intermediate frames in spatially and temporally coherent sequences [11]- [13], [33]. These approaches are designed to learn the dynamics by inferring displacement of spatial structure between consecutive images. Optical flow is widely used for this task, which estimates spatial displacement by comparing movement between two images.
In recent years, deep learning architectures have shown promising results for both optical flow and video interpolation. Supervised learning of optical flow is often constrained by the availability of training data as motion is rarely quantified in real images. Datasets, such as Flying Chairs [34] and MPI Sintel [35], have been generated synthetically and are used for methods development but may not be realistic to real-world scenes. High-performing architectures for supervised optical flow learning include FlowNet [36], [37], PWC-Net [38], and RAFT [39] that take advantage of encoder-decoder, volumebased correlations, pyramid, and recurrent layers. However, since optical flow labels are rarely available, as is the case in our GEO temporal interpolation, unsupervised learning techniques are often applied to real-world naturally generated datasets. Unflow [40] presented an approach to training Flownet using a bidirectional occlusion-aware unsupervised reconstruction loss. Recent studies have improved unsupervised learning further, such as Geonet [41] with depth perception and [42] by predicted occlusion directly. In practice, we have found that training fully unsupervised optical flow networks on satellite imagery to be a nontrivial exercise providing poor performance in our unreported experiments, likely due to physical complexity in the dataset.
Many video interpolation techniques focus on single frame interpolation, meaning that a single frame is estimated by the model directly between two consecutive frames at t = 0.5 [11], [33], [43]. However, when interpolating satellite imagery, time-dependent and multiframe estimation is preferred for more flexibility. Jiang et al. [13] presented SSM that combines both optical flow and occlusion models for time-dependent estimation between consecutive frames. The time-dependent nature of this approach produces spatially and temporally coherent predictions of any time between 0 and 1.
In their experiments, Jiang et al. [13] show that 240-frames/s video clips can be estimated from 30-frames/s inputs. Further details of this work will be presented in Section IV where we apply their architecture with an extension to multiscale optical flows.
High-frequency satellite imagery can take advantage of these techniques to extract the dynamics of different physical processes. Our application requires a time-dependent methodology with no labels available for supervised optical flow learning. Furthermore, our imagery has 16 spectral bands in the visible, near-infrared, and thermal-infrared spectra to capture physical phenomena and with varying spatial resolution. Between frames, these processes, such as convection, break optical flow's consistency assumption making unsupervised learning a challenge. SSM provides a fundamental technique to temporal interpolation that is well suited to our application. We generalize this approach with task-specific models for each spectral band in our dataset. In the remainder of this work, we study how SSM can be effectively applied to this problem by experimenting with global-and task-specific models.

IV. METHODOLOGY
Temporal upsampling of GEO satellite data is a similar problem as intermediate video frame interpolation with domain-specific characteristics. In video interpolation, the goal is to estimate an intermediate frame given two or more consecutive images. A single set of optical flows is sufficient for interpolating between RGB images as objects captured in the visible spectrum are reasonably consistent across frames. However, as discussed above, satellite imagery often consists of tens or even hundreds of spectral channels with varying spatial resolutions. Furthermore, each channel captures different physical properties with heterogeneous motion, including severe events, such as convection leading to heavy precipitation and tornadoes. The goals of the proposed methodologies include interpolating to a user-defined point in time, capturing varying spatial dynamics, and computational efficiency at scale. In this section, we describe the SSM framework [13] for temporal upsampling with the optical flow with our domainspecific adaptions with global-versus task-specific networks. The global model performs interpolation on all channels simultaneously, while task specific can be learned for each channel independently.

A. Intermediate Frame Interpolation
SSM intermediate frame interpolation considers the case of frame estimation at a user-defined point in continuous time [13]. To ensure smooth transitions and structural similarity between frames, SSM is designed to predict optical flows between two input images as a function of time. The approach, which can be seen in Fig. 3, consists of two deep neural networks. The first estimates forward and backward flows between two input images. The second network, depending on time, updates the forward and backward flows and generates visibility maps to handle occlusion. These features of SSM are well suited to GEO data by enabling arbitrary temporal upsampling and synchronization of multiple datasets.
Let I 0 , I 1 , I t ∈ R H ×W ×C , where t ∈ (0, 1), H is the image height, W is the image width, and C is the number of spectral bands. Task-specific optical flow is defined when C = 1. The goal is then to construct an intermediate frame I t with a linear combination of warped I 0 and I 1 as defined bŷ where F 0→t and F t→1 are the optical flows from I 0 to I t and I t to I 1 , respectively. g is defined as the backward warping function, implemented with bilinear interpolation, and α represents a scalar weight coefficient to enforce temporal consistency and allow for occlusion reasoning. In the case of high temporal resolution satellite imagery, the interpolation is virtually estimating the state of atmospheric variables (clouds, water vapor, and so on) over a static land surface. If a given pixel in I 0 captures land surface, but the same pixel in I 1 sees a cloud, the occlusion principle is used to estimate at what time t the cloud covers the pixel. Furthermore, atmospheric dynamics cause physical characteristics to change over time. One example is convection such that warm/cold air vertically and rapidly mixes in the atmosphere causing severe weather events. In the context of interpolating, dynamics between I 0 and I 1 cause cloud temperature to rapidly decrease, leading to a drastic change brightness intensity and breaking consistency assumptions of optical flow. Visibility maps, V 0 rightarrowt , V 1rightarrowt ∈ (0, 1) H ×W , weight brightness importance to account for both occlusion and intensity changes. Equation 1 is then be redefined aŝ

B. Task-Specific Interpolation
Multispectral satellite imagery has multiple spectral channels, each observing varying phenomena, such as clouds moving faster at higher levels of the atmosphere. The movement of objects within images of varying spatial resolutions can have dramatic effects on the performance of optical flow networks. In traditional optical flow, interpolation features are tracked using a single model for single or three-channel images in the visible range. The optical flow assumption of brightness consistency is relatively well satisfied on the pixel level in high frame-rate sequences. In satellite images, different movements appear in each channel of the data with underlying physical processes affecting brightness intensity. Rather than modeling all the channels in a single SSM model, we propose to model each channel independently. We denote SSM-G as the global SSM model where all channels are modeled simultaneously, as presented in Fig. 4. SSM-T c denotes task-specific networks that are trained for individual channel c. Formally, SSM-T learns a set of SSM models [I c t ] for c ∈ C. While requirements for graphics processing unit (GPU) computation multiply with SSM-T, we will show that improved performance of taskspecific models improves results substantially.

C. Network Architecture
Deep neural networks with encoding and decoding are well suited to model both local and global spatial structures. Architectures of this type include Flownet [36] and U-Net [13], which have been shown to perform well in the task of optical flow. We follow this approach using a U-Net architecture for each of the flow and interpolation networks. The U-Net architecture applied has four downsampling layers followed by four upsampling layers with skip connections between each corresponding layer. A convolution layer maps the input to 64 channels with a kernel size of 7. The following downsampling layers are of size 128, 256, 512, and 512 with kernel sizes 5, 5, 3, and 3. Each downsampling layer performs. average pooling and two convolutions with rectified linear unit (ReLu) activations. Upsampling layers of sizes 256, 128, 64, and 32 all with kernel sizes of 3 are then applied. Each layer performs bilinear interpolation followed by two convolutions with ReLu activations. Finally, 32 channels in the last hidden layer are mapped to the number of output channels using a convolution operation of kernel size 3. Flow and interpolation networks use the same architecture with different input and output dimensions as discussed above.
Tracking both small and large displacements continues to be a challenge, even with encoder-decoder network architectures. Other approaches have shown that using a stack of networks performing small and large displacements performs well [37]. In this work, we explore the applicability of multiscale hidden layers to track local and global features. We follow a similar approach applied in [44] where hidden layers are defined to have multiple convolution operations with different sized kernels followed by a concatenation layer. In our networks, kernels of sizes 3, 5, and 7 conserve high-frequency spatial details while abstracting global motion for improved optical flows and visibility maps.

D. Training Loss
As all variables in the architecture are differentiable, the model can be learned in an end-to-end manner. Given two inputs frames I 0 and I 1 with N intermediate frames and corresponding predictions Î t i N i=1 , a loss function can be defined as a weighted combination of reconstruction, warping, and smoothness losses such that l = λ r l r + λ w l w + λ s l s . ( We note that Jiang et al. [13] include a fourth term for the perception of image classes that are not available for this satellite dataset. Similarly, we employ L 1 loss functions for each loss term unless noted otherwise.
The reconstruction loss is defined as the distance between observed and predicted intermediate frames A warping loss is used to optimize estimated optical flows between input and intermediate frames for channel c A smoothness loss is applied to forward and backward flows from I 0 to I 1 to satisfy the smoothness assumption of optical flows in the first network such that In practice, this training setup requires optimization over multiple hyperparameters, including λ r , λ s , λ w , and a learning rate.

V. EXPERIMENTS
We demonstrate the effectiveness of a set of SSM models on a large-scale dataset using a high-performance computing system with a cluster of GPUs. The goal of our experiments is to show that optical flow is highly applicable for temporal interpolation of satellite imagery and compare to the baseline of linear interpolation, as traditionally applied. Section V-A outline the training process, compare methodologies, and study the effectiveness of a severe convective precipitation event. Code for this work can be found in the Supplementary Material. 1

A. Training
Data for training and testing were taken from the GOES-16 Mesoscale 1-min imagery. These images are of identical spatial and spectral resolution as North America and fulldisk imagery, so the learned models are directly applicable to 1 https://github.com/tjvandal/geostationary-superslomo  these datasets. Training data were selected using all samples for every five days of the year 2018 and testing data on a randomly selected set of examples from 2019. Samples were generated as 264 × 264 subimages and randomly cropped to 256 × 256 during training. Standardized normalization was applied independently to each channel to ensure similar pixel intensity distributions across bands. Temporally, samples are selected from a sequence of 15 time steps such that inputs (I 0 , I 1 ) are 10-min apart with a random label I t in-between. Furthermore, during training, images are randomly flipped and rotated to improve generality in the U-Net architecture. A random training/validation split of 20% was used to monitor learning. We select cloud top temperature tracked by band 13 (10.3 μm) in ablation and demonstration experiments as used in studies of convection and AMVs. Experiments for this study leveraged NASA's Pleiades Supercomputer and the NASA Earth eXchange (NEX) to process large-scale GOES-16 data and train individual networks for each of the 16 channels.
Adam optimization is used to minimize (3) with default parameters β 1 = 0.9, β 2 = 0.999, and eps = 1e-8 in PyTorch. We found that learning is sensitive to hyperparameters λ s and λ w and are optimized using probabilistic grid search and constrained Bayesian optimization [45]. Constrained Bayesian optimization applies efficient randomized Monte Carlo simulations over λ s and λ w holding λ r = 1. We perform this process using the open-source Ax library [46]

B. Model Comparison
This section compares variations of SuperSlomo with a linear interpolation baseline for interpolation of GEO images. Linear interpolation between frames is performed by taking a linear combination of two input images weighted by time, I t = (1 − t) * I 0 + t * I 1 . A set of three SuperSlomo models are explored, including global (SSM-G), task specific (SSM-T), and task specific with multiscale layers (SSM-TMS). SSM-T models are trained for each band separately. SSM-G is trained using training data from all bands and, hence, a substantially larger training set. The root mean square error (RMSE), the peak-to-signal-noise ratio (PSNR), and the self-similarity measure (SSIM) are used to evaluate performance.
We first study the inherent properties of SSM on band 13, including time dependence and sensitivity to larger displacements. Interpolation between two frames is expected to have smooth transitions from one frame to another. Generally, interpolation will have the largest error where the distance to frames is maximum (i.e., directly between the input frames). In Fig. 5, we compare PSNR as a function of t ∈ [0, 1] between models and see this effect. The gap between linear and SSM models is pronounced. Between SSM models, SSM-T and SSM-TMS have similar performance. SSM-G that is a more generalized model does not perform quite as well SSM-T and SSM-TMS, suggesting task-specific models across bands perform better. Fig. 6 shows PSNR at t = 0.5 while increasing the gap between I 0 to I 1 from 5 to 45 min. A 45-min gap contains 9x more displacement than a 5-min gap making the optical flow problem more difficult. Over the first 15 min, SSM models perform similarly and better than linear. As the gap widens, SSM-TMS and SSM-G begin performing better than SSM-T. This suggests that SSM-TMS multiscale layers may be capturing more motion. SSM-G's more diverse dataset includes 500-m data, which has larger displacements than the 2-km band 13. In Table II,  Errors are in terms of brightness temperatures measured by the sensor. As a whole, our results find that task-specific SSM models, SSM-T and SSM-TMS, outperform linear interpolation and a single global interpolation network, SSM-G. Interpolation of the visible (1 and 2) bands with optical flow provides modest improvements in all metrics. Performance improvements are not found for NIR bands 3 and 5, veggie, and snow/ice, respectively, where each method produces high PSNRs and low RMSEs. On the other hand, thermal bands find large improvements from SSM optical flow, increasing SSIM from approximately 0.77 to 0.93, on average. We find that taskspecific models outperform the global model throughout even with the reduced training data size. Errors do vary across bands, which is largely associated with the radiance distribution of that particular band. For instance, band 4 (1.37 μm) is typically sparse such that predicting low intensities is an easy task.

C. Severe Weather Event
This section studies an example of two convective precipitation events visualized in Figs. 7 and 8. In the context of severe weather, convection is vertical motion in the atmosphere that occurs when warm air on the surfaces forces cold air in the atmosphere down often causing supercells and heavy precipitation. For the first time, Apke et al. [47] studied this process using GOES-14 1-min imagery for a set of supercells. The authors found that atmospheric motion can help define signatures of supercell events to better inform weather forecasting models. Here, we show that cloud top brightness from the long-wave clean IR band can be interpolated from 10 to 1 min during two convective events.
The 1-min mesoscale (M1) data from May 23, 2019, from 2:00 to 3:00 UTC at −95 • longitude and 37 • latitude are used for the analysis. In this region, a convective storm is occurring and moving east. The data are downsampled to 10 min interpolated back to a 1-min time series. Fig. 7(a) shows the region of interest with predictions (I t ), optical flows (F 0→t ), and visibility maps (V 0→t ) between times 0 and t. The optical flows show the storm moving east and slightly rotating with a maximum displacement around the storm edges. Visibility pixels correspond to edges of clouds, which allows (1) to be a nonlinear combination relative to time. The flows and visibility maps in Fig. 7(h)-(j) and (m)-(o) show an increasingly apparent horizontal strip of contrasting flows caused by two artifacts of the approach. First, the approach uses both flows and occlusion maps to take a nonlinear combination of beginning and end images and can weigh some areas more than others. Second, U-Net depends on constant input size, so images are interpolated in blocks with stride 20, which can cause artifacts. As our approach is not optimizing flows directly, but rather the interpolation, these artifacts are less pronounced in the predictions. Fig. 8 presents time series of two severe events with the corresponding observations. The first row depicts a tornado outbreak in the southeastern United States on March 3, 2019, where we see variability in cloud top brightness. A dashed line shows the 10-min time series and is equivalent to linear interpolation. SSM-T overlayed the observation and well captures the variability of a drastic 15 K temperature dropout interpolation approach that generates a highly correlated time series with an R-squared of 0.955. The second row shows a time series through the eye of Hurricane Dorian on September 1, 2019. Similarly, we find that SSM-T is highly correlated with an R-squared of 0.986. These results suggest that optical flow may be a promising approach for interpolating GEO imagery for applications to severe events.

VI. CONCLUSION
This work proposes that temporal interpolation with the optical flow is capable of modeling high-frequency events between GEO images with high accuracy by learning from mesoscale rapid-scan observations. Experiments showed that learning independent weights of SSM for each band improve performance beyond one global SSM and the linear interpolation baseline. Multiscale blocks in SSM-TMS perform well for larger displacements and are comparable to SSM-T overall. Interpolation well captured temporal variability of cloud top brightness during multiple severe convective events. This interpolation has direct applications to improve precipitation estimation and weather variability. Furthermore, deep learning-based optical flow routines are able to better harness graphical processing units with a single feedforward pass [36] relative to traditional and more expensive routines, such as the polynomial expansion method applied in [32]. Improvements and accuracy from deep learning approaches to optical flow present a promising direction for future work.
While further analysis is necessary, our results suggest that dynamics of atmospheric motion are learned by the network using displacement flows and visibility maps, which would have direct implications for weather forecasting. Second, internal dynamics captured may provide knowledge on how to predict future states as applied for video-frame prediction. In future work, we will explore the accuracy of optical flow to estimating atmospheric motion relative to large-scale observations and model interpretability to better understand which physical dynamics are captured.