MPL-GAN: Toward Realistic Meteorological Predictive Learning Using Conditional GAN

,


I. INTRODUCTION
Weather forecasting is one of the main applications of meteorological prediction. It is important for our daily life as well as industrial and agricultural production. Common uses include precipitation nowcasting [1], [2], streamflow prediction [3]- [7], wind speed simulation [8], radiation estimation [9], and temperature forecasting [10], [11]. Numerous techniques have been proposed to predict more accurate weather measurements including Numerical Weather Prediction (NWP), radar map based methods, and satellite imagery based methods. Recently, with the advances of deep learning techniques, researchers adopt Recurrent Neural Network (RNN) based methods to improve those traditional approaches in order to address this challenging problem. For example, authors [1] formulated the precipitation nowcasting problem into a spatio-temporal sequence forecasting model, and proposed a LSTM-based model named ConvL-STM for radar echo map prediction. A Seq2Seq-LSTM based model [11] was proposed to improve NWP performance through historical observations. A study [12] proposed an adversarial model to predict cyclone trajectory with satellite imageries. These studies reveal that radar and satellite The associate editor coordinating the review of this manuscript and approving it for publication was Huiling Chen .
imageries play more and more important roles in meteorological prediction, not only because they are more robust, but also they provide end-users with more sequential information and better visualisations of the history from current to predicted atmosphere. However, these approaches share some common drawbacks: they do not generalise well in real-world meteorological datasets especially for long term predictions. To be more specific, the pioneering work, ConvLSTM, produces blurry radar imagery predictions, and keeps going worse as the time step moves forward. These meteorological imageries do not appear to be realistic but blurry resulting in unpleasant visualisations. These drawbacks are mainly due to two reasons. First, these models optimise Euclidean losses such as Mean Absolute Error (MAE) and Mean Square Error (MSE) across the overall length of sequential meteorological imageries. A few studies introduced various models with MAE and MSE, but produced blurry images [13], [14]. This is mainly due to the assumption that the data is drawn form the Gaussian distribution which only works on a continuous portion of the image while ignoring isolated small regional areas. Second, due to the nature of RNN architecture, small errors are quickly accumulated to become large errors along the generated sequence because of the gap between training and inference [15], [16]. These two causes indicate that it is crucial to include an uncertainty VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ handling procedure to generate realistic meteorological predictions. Meanwhile, the video frame prediction can be modelled as a spatio-temporal sequence forecasting problem. For instance, authors [17] extracted a sequence of images from video frames, and proposed an encoder-decoder RNN based model named PredRNN [17] and its improved version PredRNN++ [18]. Nevertheless, these models suffer from the same drawback as ConvLSTM produces blurry predictions. Recently, Generative Adversarial Network (GAN) [19] was used to handle uncertainties in video frame prediction [13], [20]- [22]. GAN models match two distributions by one generator and one discriminator playing the minmax game, where the generator learns to generate samples to fool the discriminator and the discriminator learns to distinguish these fake samples. These unconditional GAN based models are able to produce realistic looking videos by learning a high dimensional distribution of complex datasets. However, these models are not suitable for meteorological predictive learning even though they are able to produce realistic looking and temporally coherent video frames. This is because those generated video frames do not model the real-world meteorological changes given by the source of meteorological imagery frames. Note that, meteorological prediction needs to consider the moving entities' (pixel wise) direction, speed, rotation acceleration and other information.
To sum up, on one hand RNN based meteorological predictive models with MAE and MSE produce blurry predictions. On the other hand, GAN based models are able to generate realistic looking video frames but fail to catch the actual atmospheric movement missing local variations and patterns. In this work, we propose a Conditional GAN based model named Meteorological Predictive Learning GAN (in short MPL-GAN) that optimises both the regression loss and GAN loss, and aims at generating realistic meteorological predictions. Optimising regression loss aims at modeling the real-world atmosphere imagery movement which is crucial for weather forecasting. The GAN loss is used to estimate the data distribution to deal with the uncertainty to produce non-blurry predictions.
Our main contributions are summarised as follows: • to the best of our knowledge, this is the first model that combines regression loss with GAN loss to generate realistic meteorological predictions that provide better visualisations; • to conduct extensive experiments on a real-world radar imagery dataset. Experimental results demonstrate our model generates non-blurry predictions even in the long term while it catches real-world atmosphere changes; • to provide an extensive experiment analysis to show that the pure GAN model without the predictive learning module fails to catch the actual atmospheric movement, which demonstrates the effectiveness of our proposed MPL-GAN model detecting meteorological changes.

II. RELATED WORK
This section briefly reviews related studies.

A. METEOROLOGICAL PREDICTIVE LEARNING
Optimal flow based methods [23], [24] have been a long history in the meteorological predictive learning literature. With the recent advances in deep learning, authors [1] explored the possibility of applying RNN, and proposed a model called Convolutional LSTM (ConvLSTM) [1] and its improved version TrajGRU [2] for radar echo imagery prediction. Both approaches tried to optimise the MSE loss. In the meantime, the video frame prediction and the traffic flow prediction can be considered as the same problem. PredRNN and its improved version PredRNN++ are proposed by [17] to tackle this problem, again these methods optimised the MSE loss as well, and they shared the same issue that the prediction gets more blurry over time.
Beyond the meteorological imagery predictive learning, neural network based methods have also been used in numerical weather forecasting. For example, a study [11] proposed a Seq2Seq LSTM to predict temperature, wind speed, and relative humidity. Another study [10] improved such method by introducing a temporal progressive growing schedule sampling strategy. Nevertheless, these approaches suffer from the same long term prediction accuracy degradation.

B. GAN FOR IMAGE AND VIDEO GENERATION
GAN [19] has been the most popular generative model since it was first released in 2014. Since then GAN models have shown their superior abilities especially in image generation, starting from hand-written digit generation [19], [25] to large scale image set generation [26], [27]. Recently, researchers try to push the limit of GAN by generating photo-realistic videos using unconditional GAN [13], [20]- [22], [28]. Those video GANs aim at producing photo-realistic and temporal coherent videos, and they are used to match the high-dimensional data distribution between the two. Note that, there are no other considerations those models take into account. That is, for given initial frames, generated frames do not need to consider the moving entities' direction, speed and other moving information. However, these moving properties play an important role in our study, and our GAN takes them into account unlike traditional unconditional GANs.

III. PROPOSED MPL-GAN
In this section, we will describe our proposed MPL-GAN model that aims to produce realistic looking meteorological imageries. Figure 1 shows the overall architecture of our proposed model that contains a predictive learning model to generate predictions and a Conditional GAN module to map those predictions back to photorealistic distributions. First, we formulate the meteorological predictive learning as follows: the meteorological imagery at time-step t. Metrological predictive learning is to predict a sequence of corresponding meteorological imageries of following K time-steps based on the past frames of M, denoted asP K 1 ≡ X 1 ,X 2 , . . . ,X K . Note, a meteorological imagery carries important weather information such as rainfall, temperature, and wind speed etc.

A. PREDICTIVE LEARNING
In order to model meteorological changes, we adopt encoderdecoder ConvLSTM [1] as a predictive learning module. As investigated by previous studies [1], [2], predictive learning aims to capture the local spatio-temporal pattern movement such as rotation and scaling. As we mentioned earlier, all existing GAN-based next frame prediction models are not suitable for meteorological predictive learning as these models do not capture the real-world meteorological changes. Furthermore, these GAN models without such predictive learning module are not able to produce long term predictions. For example, a study [13] can only produce a maximum of two frames for future video predictions. In contrast, our MPL-GAN model generates the next prediction conditional on previous ground-truth and current predictive output with conditional GAN. We manage to generate more than 10 frames of non-blurry and realistic meteorological imagery predictions, and yet still model the real-world atmospheric changes with the help of a predictive learning module. This demonstrates that the predictive learning module is crucial for modelling meteorological changes. Note that, we use ConvLSTM for evaluation purpose in this study, but it can be replaced by any other advanced models such as Traj-GRU [2] and PredRNN++ [18]. On one hand, the predictive learning module is required for modelling meteorological changes, but on the other hand, naive predictive learning models suffer from the blurry image issue and they need to be specially refined for meteorological change analysis. In the next section, we will introduce Conditional GAN to solve the blurry problem caused by the traditional naive predictive learning module.

B. CONDITIONAL GAN
GAN [19] attempts to learn a mapping function G to map a random noise vector z to an image X , G : z → X . In our settings, we aim to map the blurry prediction produced by ConvLSTM to the original non-blurry distribution. LetP K 1 denote the generated sequence of ConvLSTM, M K 1 denote the observed ground-truth frames, our goal is to train a conditional Generator G : {z,P K 1 } → M K 1 .

1) CONDITIONAL GENERATOR
As the prediction sequence is generated recursively by the ConvLSTM, we train the Conditional Generator G : {z,X t } → X t per frame along with the ConvLSTM time steps instead of taking the whole generated sequence to train the GAN generator, whereX t denotes the ConvLSTM output at VOLUME 8, 2020 time t and X t is the ground-truth frame at time t. However, when the ConvLSTM prediction gets more blurry in the later time steps, the Conditional GAN gets harder to map the conditional distributions between the two. To solve this problem, we train the generator along with conditioning on the previous frame X t−1 , i.e. G : {z, X t−1 ,X t } → X t . Ideally, we should train the generator conditioning on the current ground-truth frame X t and current ConvLSTM prediction frameX t . We use the previous frame instead of the current frame based on the observation that two consecutive meteorological frames are very similar in terms of data distribution. They even look to be very similar since the atmosphere normally changes gradually and slowly. Moreover, during the inference stage, none of the previous and current ground-truth frames would be available. Then, we can replace the previous ground-truth frame X t−1 with the generator output of previous time step during the inference phase, i.e.X t = G(z,X t−1 ,X t ). We can also think the other way around, since the ground-truth frames are not available during the inference phase. However, we make an assumption that the output distribution of our generator perfectly matches the actual data distribution, then the data distribution can be carried forward from the last known ground-truth frame to the predicting frame recursively. Therefore, we use the ground-truth frame X t−1 instead ofX t−1 during the training phase for training stability, then replace X t−1 withX t−1 during inference.

2) FRAME DISCRIMINATOR
We randomly select N frames among the K time steps to train the frame discriminator D Fr . The D bFr outputs 1 for the true frame X t and outputs 0 for the fake frameX t . Then, we train the frame discriminator by optimising the minmax game defined in the original GAN, we use Hinge Loss [29] for L D Fr defined as follow: +E[max(0, 1 + D Fr (X t ))]. (1)

3) FLOW DISCRIMINATOR
Frame discriminator aims to ensure that the generator produces samples matching the actual data distribution, i.e. to ensure the produced samples looked to be realistic. On top of this, similar to the video discriminator proposed by [22], we propose a flow discriminator D Fl ensuring the generator produces temporal coherent frames. Similarly, D Fl outputs 1 for the real sequence M K 1 and outputs 0 for the generator sequence (M k 1 ;P K k ), here we concatenate the initial source sequence M k 1 and the conditional generated sequenceP K k . L D Fl defined as follow: Again, a predictive learning module is essential for modelling the real-world meteorological movement patterns, and conditional GAN is used to map the blurry predictions generated by predictive learning back to non-blurry imagery distributions. Therefore, we divide the training process into two stages. First, we start training the predictive learning module, when the training of predictive learning module is almost stable then we start the training of the GAN module.

1) PREDICTIVE LEARNING TRAINING
Following the settings of ConvLSTM [1] and TrajGRU [17], we train our Predictive Learning (PL) module by minimising the balance of MSE and MAE losses (B-MSE-MAE) with Stochastic Gradient Descent (SGD) and Back-propagation Through Time (BPTT) [30]. We train the B-MSE-MAE loss until it becomes stable before we start training the Conditional GAN so that the Conditional GAN learns the stabled data distribution. However, we continue to train the PL module along with the GAN module even the loss does not decrease. The intuition behind this is to give the GAN variances of distribution to make the GAN become more robust.

2) CONDITIONAL GAN TRAINING
Training GAN models requires training both the generator and discriminator by optimising the minimax game [19], where the generator learns to fool the discriminator with generated fake samples and the discriminator learns to identify true and fake samples. We follow the same spirit and extend it to training one conditional generator and two separate discriminators. The losses of Frame Discriminator D Fr and Flow Discriminator are defined in Equation 1 and Equation 2. Now we define the loss function for conditional generator as follow: where φ denotes a process of applying the generator recursively with the ConvLSTM time steps to generate a sequence flow of frames. Therefore our overall optimisation goal is to minimise L D Fr and L D Fl that maximises the probability of the discriminators identifying fake frames and fake sequence; and minimises L G to maximise the probability of the generator producing samples that the discriminators think they are true.
Note, N random frames will be selected for training D Fr N times for each training batch b, whereas D Fl will be trained once for each b. Moreover, the gradient of G will be back-propagated multiple times recursively with operation φ when training G with D Fl . This makes it extremely difficult for training the GAN. Following the principle of [20] and [27], we downsample each frame of the sequence passing to D Fl to overcome the difficulties of training.

IV. EXPERIMENT
In this section, we briefly describe the dataset used and provide experimental results.

A. DATASET
We use HKO-7 [1], [2] radar echo imagery dataset to evaluate our proposed MPL-GAN model. The radar echo imagery is recorded every 6 minutes, therefore there are 240 frames per day. Each frame contains 480 × 480 pixels that cover a 512km × 512km area. We sample data into sequences of the length of 15 frames by a sliding window, 5 for the encoder and 10 for the decoder. In the total number of 993 days of data, we randomly select 80% for the training set, 5% for the validation set, and 15% for the testing set. Unlike the original experimental setting of ConvLSTM and TrajGRU where they try to predict the pixel value and report the precipitation prediction based on that, we focus on imagery frame prediction that is realistic for better visualisation.

B. IMPLEMENTATION AND PARAMETERS 1) PREDICTIVE LEARNING
We use ConvLSTM as the predictive learning module. We implement a three layer ConvLSTM encoder-decoder with the following parameters for each layer:

2) CONDITIONAL GAN
Training GAN is challenging, thus we carefully choose our architecture for the generator and discriminators. For the generator, our architecture is somewhat similar to PG-GAN [27]. In order to match the resolution of generated samples, we upsample the original resolution from 480 × 480 to 512 × 512. We randomly select N = 2 frames to train the Frame Discriminator D Fr . 3D-Conv of Flow Discriminator consists of three layers set to the following parameters: We implement our model using PyTorch 1.4, a well known deep learning library developed by Facebook. Our model is trained and evaluated on a server with Nvidia V100 32GB GPU and Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz (24 cores). We train our model with Adam optimiser [31] with a learning rate of 1e −4 for ConvLSTM and 1e −3 for the generator and discriminators. Batch size is set to 2 due to the resource consumption of conditional GAN model. For all models including baseline methods, we train 100,000 batches and select the best model based on the minimum MSE against the validation set. For MPL-GAN, we train the PL module for 10,000 batches before training the conditional GAN. All experimental results are reported based on the test set.

C. OVERALL EVALUATION 1) BASELINES
In order to evaluate the effectiveness of our proposed model, we compare our model to two baseline methods.
• PL with MSE. In order to show that the PL with MSE produces blurry predictions, we compare our model to the pure ConvLSTM without the conditional GAN module [1].
• PG-GAN. We extend the PG-GAN [27] from image generation to sequence generation with the same architecture of our conditional GAN module, that has a Frame discriminator and a Flow discriminator. This is also very similar to DVD-GAN [22].

2) EVALUATION MATRIX
We use the sharpness measure based on the gradient of two images defined in [13] as follow: Sharp. = 10 log 10 max 2X where, X is the ground-truth frame andX is the output frame; maxX is the maximum possible value of the image intensities. We report the average sharpness of the test set across K frames in the table as well as the individual frame evaluations as a line chart as shown in Figure 2.

3) EXPERIMENT ANALYSIS
We conduct both quantitative and qualitative evaluations with two baseline approaches. As shown in Table 1   models not only beat PL with MSE in the average score but also in the long term predictions. MPL-GAN and PG-GAN achieve a similar score in the first frame, however, PG-GAN drops performance quickly in the long term. Besides quantitative evaluation, we visualise a sample of prediction sequence in Figure 3 (please view the animation by clicking the figure using Adobe Acrobat Reader). As shown in the animation, both PL with MSE and MPL-GAN catch the real-world meteorological movement patterns. However, PL with MSE generates blurry predictions and continues to get more blurry over time, especially in the long term. Whereas our proposed MPL-GAN continues to generate realistic looking and sharp predictions. Furthermore, if we look at small regions of the prediction frames, PL with MSE tends to omit small areas as the result of MSE loss, whereas our model has more regional details with the incorporation of uncertainty handling in GAN.
Furthermore, the whole experiment aims to find out whether the GAN model is able to solve the blurry prediction problem caused by PL with MSE. In fact, we can see our proposed model as PL with MSE with the advanced version of PG-GAN with two discriminator heads. As stated previously, GAN based models are able to produce sharper predictions compared to PL with MSE. However, as samples of Figure 4 show that PG-GAN is not able to model the meteorological movements. More specifically, the first generated frame of PG-GAN looks very close to the first generated frame of our model MPL-GAN, but the later frames are just the expansion of the first frame which is obviously not the real-world scenario. On the other hand, with the constraint of the predictive learning module, our proposed model MPL-GAN continues to generate realistic looking and diverse meteorological frames that catch the real-world meteorological movement pattern. We summarise the findings above in Table 2.
In summary, the quality of meteorological imagery prediction is of crucial importance in weather forecasting, and in monitoring climate change. Figure 3 clearly depicts that MPL-GAN produces a quality prediction result identifying both global trends and local variations whilst PL with MSE is too blurry and coarse missing local variations and details. PL with MSE is less useful in practice since it misses many localised patterns, and results in inaccurate predictions, however MPL-GAN is practically useful for forecasting weather and monitoring local and global climate change as evidenced in Figure 3 and Figure 4.

V. CONCLUSION
We propose MPL-GAN to solve the blurry prediction problem of predictive learning methods such as ConvLSTM and TrajGRU. We utilised a conditional GAN to handle this problem by mapping the blurry predictions generated by predictive learning methods back to their original non-blurry data distributions. To do that, we recursively apply a conditional generator conditioning on the previous output of itself and the current output of the predictive learning module. Through the novel design of Frame Discriminator and Flow Discriminator, the generator learns to produce temporally coherent and realistic frames. Experiments on a real-world radar echo dataset demonstrate that our proposed MPL-GAN model not only produces sharp and realistic looking meteorological predictions, but also models the real-world meteorological movement patterns with the constraint of predictive learning module. Although our model is able to generate non-blurry predictions, there is room to improve the prediction accuracy. Since the GAN model brings uncertainties improving the sharpness of prediction, but deteriorates the accuracy, our future work will investigate this problem, and evaluate our proposed model with various real-world datasets.