Spatiotemporal Deep-Learning-Based Algal Bloom Prediction for Lake Okeechobee Using Multisource Data Fusion

This study focuses on predicting harmful algal bloom (HAB) events in Lake Okeechobee, a shallow lake in Florida. A spatiotemporal deep learning model is employed to predict the levels of cyanobacteria Microcystis aeruginosa present in the lake for a single-day and a 14-day prediction horizon. Datasets collected from remote sensing (i.e., satellite images from January 2018 to December 2020) and from a physics-based simulation model (i.e., daily simulation from January 2018 to December 2020) are available. Owing to the low quality of remote sensing data caused by various environmental and technical issues, the two available datasets are fused together to create a multisource hybrid dataset for deep learning model training. A convolutional long-short term memory (ConvLSTM) deep neural model is trained on the datasets, and the results of the predictions are compared to the true cyanobacterial index for that time period. Findings include the following: 1) the deep learning model, ConvLSTM, shows promising performance for short- and mid-term HAB forecasting; and 2) the hybrid dataset that fuses remote sensing with physics-based modeling (a.k.a. modeling based on fundamental physical and biogeochemical principles) speeds up the model learning and improves its performance significantly. The proposed methodologies are reliable and cost-effective and could be used to forecast algal bloom occurrences in shallow lakes with limited sparse observations.


A. Research Motivation
For the past few decades, many major lakes have experienced an increased occurrence of algal blooms, including Lake Victoria in Africa [1], Lake Taihu in China [2], and Lake Okeechobee in Manuscript  the United States [3]. Specifically, Florida's Lake Okeechobee is the second-largest lake within the contiguous United States. A huge number of microorganisms reside in the waters of Lake Okeechobee. Some microorganisms are often the cause of algal blooms, which include cyanobacteria, also known as blue-green algae. Algal blooms are high concentrations of phytoplankton and harmful algal blooms (HABs) are problematic algal blooms that produce toxic or harmful effects on the ocean environment and human health. The dominant HABs in Lake Okeechobee are the toxic cyanobacteria blooms, such as the Microcystis aeruginosa (M. aeruginosa), which not only produces a dense surface mat that blocks waterways and emits a foul smell that cause hypoxia events but also produces microcystin, which is a potent hepatotoxin related to skin diseases, respiratory distress, and liver damage in animals and humans [4], [5]. This common freshwater species is quickly becoming a global health threat, with reported increases in both the frequency and intensity of blooms around the world [6], [7]. Improved understandings of the ecology and persistence of M. aeruginosa blooms and the distributions and bioaccumulation of their toxins represent a key challenge for scientists and water managers [8], [9]. To combat freshwater eutrophication and protect human and ecosystem health, this article develops an HAB prediction model based on state-of-the-art deep learning algorithms, historical remote sensing data, and physics-based numerical simulation. This enhances the existing HAB monitoring program by predicting the bloom distributions and timing and helps management efforts to control the spread of HABs. Moreover, the proposed methodology in this research is applicable to other shallow water or lake algal bloom forecasting.

B. Machine Learning for HAB Modeling and Prediction
Deep learning methods and non-deep-learning methods have been developed for HAB detection and prediction.
1) Classical Machine Learning Models: HAB monitoring is divided into detecting the occurrence and tracking motion caused by wind. Currently, available HAB detection systems are mainly based on empirical relationships obtained from previous observations. These methods have a high false alarm rate because they do not consider any temporal aspect or spatiotemporal dependencies. It is critical to understand the spatiotemporal This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ dynamic behavior of HABs, especially for data-driven modeling approaches. Previous work shows that the spatiotemporal hybrid model can improve the predictive ability of traditional neural network (NN) and multivariate regression models [10]. The combination of kernel principal component analysis and support vector machine (SVM) has been proposed for generalized and improved HAB detection [11]. The backpropagation (BP) NN, the generalized regression neural network (GRNN), and the SVM are then compared to demonstrate that the improved BP algorithm and SVM are better than the GRNN for HAB detection [12]. The random forest model using MODIS and MERIS satellite data while applying a threshold filter to balance the training inputs and labels has also been proposed, showing significantly better performance [13].
2) Deep Learning Models: Deep learning models have been recently employed for HAB detection and prediction. For example, multilayer perceptron, recurrent neural network, and long short-term memory (LSTM) have been proposed for modeling the HABs [7]. Results demonstrated that the LSTM model has the highest prediction rate, which reveals the potential for predicting algal blooms using LSTM and deep learning. Following this work, a variety of deep learning architectures that utilized the state-of-the-art spatiotemporal analysis methods based on convolutional neural networks (CNNs), LSTM components together with random forest, and SVM classification methods have been investigated [14]. Results were favorable with about 91% accuracy for detection and 86% accuracy for prediction. From the analysis of the above articles, the prediction of HAB must take into account its spatiotemporal dependencies. In similar research, a convolutional long short-term memory (ConvLSTM) is used to build a trainable model for spatiotemporal sequence forecasting problems and applied to end-to-end precipitation nowcasting [15] and Chlorophyll-a concentration prediction [16], [17]. Results show that the ConvLSTM network can capture spatiotemporal correlations well, and this inspires the use of ConvLSTM for HAB prediction in this article.

C. Challenges and Contributions
The following scientific challenges are identified to develop an HAB prediction model for Lake Okeechobee.
1) Complex and fast lake phytoplankton dynamics: Lake Okeechobee is a shallow lake in southern Florida that is highly eutrophic. Phytoplankton lives in a seemingly chaotic environment where winds, waves, tides, and convection act to mix up the water that surrounds them. Phytoplankton in the lake often forms patchy structures on a wide range of scales, in spite of the turbulent mixing. 2) Spatial and temporal dependencies: HABs growing in lakes are spatially and temporally correlated yet highly variable. Its spatiotemporal footprint varies, ranging from weeks to months, from a few square kilometers to thousands of square kilometers, with patterns that vary daily, seasonally, and yearly [18]. It is critical to consider those factors when developing a spatiotemporal deep learning model that can precisely and timely pinpoint the areas that are or will be affected.
3) Sparse and noisy remote sensing data: Owing to the required working conditions of satellite sensors and the influence of the atmospheric environment, remote sensing images often suffer from missing information and low quality, such as dead pixels and thick clouds [19]. As a result, satellite images only represent snapshots of the lake blooms, and there is no useful information for relatively long periods of time. Therefore, it is inadequate for effectively training a deep learning model alone. To address these challenges, this article develops a novel deep-learning-based HAB forecasting model using multisource datasets. The major contribution is twofold.
1) Creating a hybrid dataset from multisource: Two datasets are available to train a deep learning model to predict the cyanobacterial index (CI) levels in the lake. The remote sensing dataset is collected from images taken by the satellites, which represents critical yet discontinuous information about the lake blooms. The second dataset is generated by a coupled hydrodynamic-biological model that simulates chlorophyll levels in the lake based on the environmental conditions of the lake. This dataset is clean and continuous. However, because it is based on a numerical model, it is a simplification of the real blooms. The remote sensing images and physics-based model data are fused together to form a hybrid dataset, enabling physicsinformed data-driven modeling. The resulted dataset has the clean and continuous properties of the simulation along with the close to true-value information from the remote sensing. 2) Developing a spatiotemporal prediction model using deep learning: HABs have an inherent aspect entrenched in space and time; therefore, a combination of spatial and temporal analysis is required for the most effective prediction [14]. Utilizing the hybrid dataset as a continuous sequence of images, we develop an HAB prediction model based on ConvLSTM [15] that heritages the advantage of LSTM for capturing the temporal correlations in the sequence of daily lake condition and the advantage of CNN for extracting spatial feature in the images. The rest of this article is organized as follows. Section II first discusses the preliminaries of datasets and then presents the hybrid dataset generated from remote sensing and physicsbased simulation. Section III presents the proposed HAB prediction model based on ConvLSTM. Section IV presents the comparative results. Finally, Section V concludes this article. The mathematical background of the deep learning model is presented in the Appendix.

II. MULTISOURCE HAB DATA FUSION
A. Background of the Datasets 1) Remote Sensing: Satellite remote sensing technology enables real-time, large-scale, routine monitoring and prediction of HABs, significantly reducing unnecessarily wasted expense and time [20]. Although the detection of HABs in thin water surface layers is challenging, remote sensing still provides an effective tool for identifying high biomass HABs such as red tides [21]  and M. aeruginosa [22]. M. aeruginosa is the HAB of most concern in Lake Okeechobee. The organism can outcompete other phytoplankton for light by regulating its own buoyancy, allowing it to float at the surface in turbid waters. This fortuitously allows detection via satellite remote and, thus, inclusion of these data for a predictive model. For example, Fig. 1 shows a typical remote sensing image from satellites, where the greenish streaks and patches at the surface across different parts of the lake are the cyanobacteria. During bloom peaks, thick green mats of highly concentrated cyanobacteria will aggregate at the surface.
2) Physics-Based Simulation: The concept of the physical model is to combine a hydrodynamic model with a biological model to simulate all of the major processes including hydrodynamics and nutrient cycling to predict phytoplankton blooms. The Regional Ocean Model System (ROMS) is a finitedifference and terrain-following ocean model that has been widely applied to coastal and regional ocean modeling [23]. The modeling system is flexible, allowing a modular implementation of many important oceanic components such as tides, circulation, biology, and sediment transport. However, its applications to inland waters, such as lakes and reservoirs, remain limited.
In the physical model, each of these variables interacts with several variables at once. Each variable has a dynamic equation with several terms describing how interactions take place. For example, the phytoplankton equation is as follows: (1) where on the right-hand side of this equation describes, in order of the terms, the net growth rate of phytoplankton (P), where μ is dependent on light, temperature, and nutrient concentration, grazing by zooplankton (Z), coagulation of phytoplankton (PON), and sinking. One can calculate each of these terms to find which of them dominates.
In order to simulate phytoplankton blooms, a biogeochemical model is usually co-simulated. For example, one of the widely used models is the eight-component Fennel model [24], which includes seven variables for simulating nitrogen cycle and phytoplankton blooms, and one variable for dissolved oxygen (DO). Dissolved inorganic nitrogen is represented by nitrate (NO 3 ), which also includes nitrite (NO 2 ), and ammonia (NH 4 ). There is only one group of phytoplankton and one group of zooplankton, but chlorophyll is also directly simulated. Chlorophyll (Chl) concentration is treated as an independent variable, but it is closely related to phytoplankton biomass. The biogeochemical model is coupled with the ROMS hydrodynamic module to simulate the physical-biological coupled dynamics. Dynamics of phytoplankton blooms is complex and highly nonlinear. While the model is designed to capture all of the major processes, its predictive capability is limited due to the limited knowledge of the biology and biogeochemistry and imperfect numerical solutions.

B. Remote Sensing Dataset
Following the idea of Hill et al. [14], the remote sensing data are initially used as the input data for our model. The used satellite products were imagery from the Ocean Land Color Imagery (OLCI) sensor on board the twin satellite pair Sentinel-3A/B operated by the European Space Agency. Sentinel-3 provides near-real-time basic information for oceans and weather forecasts. The mission is based on two identical satellites operating in a constellation for optimal global coverage and data transfer. The 1270-km-wide OLCI provides global coverage every two days. For nominal orbit, at subsatellite point, OLCI full resolution is about 300 m above ground [25]. Acquired images were then processed through NASA's SeaDAS image processing package, producing two ocean color products for every image-turbidity and the CI. For our research object, we mainly focus on the CI product. The CI is an optical product based on a line height form using three bands in the red/near-infrared (NIR) region [26], specifically the reflectance at 665, 681, and 709 nm. The CI captures an optical feature that is related to particles in suspension near the surface of the water with a combination of spectral absorbing and scattering properties [27]. Here, the CI is calculated as where ρ xxx is the partial reflectance at that wavelength. A positive CI indicates the presence of cyanobacteria, which we used as a flag in data plots. Note that the negative sign for CI asserts a positive value when there is a trough detected at 681 nm. The CI product uses wavelengths in the red and NIR spectrum, shown in Fig. 2, and is insensitive to aerosol loads in the atmosphere. To derive this product, images were produced from partially atmospheric correction where only the molecular scattering is removed, leaving aerosols. These partially corrected images, which resulted in spectral reflectance products, were then subsequently used in the CI algorithm developed in [26]. The cloud and turbidity masks for the CI product were applied using the NOAA operational scheme [28]. These image products were saved into a NetCDF file. Owing to the type of algorithm applied for the surface cyanobacteria detection, the partial atmospheric correction could be applied for the CI product, avoiding some limitations associated with the full atmospheric correction. However, only cyanobacteria immediately at the surface could be detected with any degree of accuracy with this method. Nonetheless, the patterns in the images were useful for observing the spatial scale of events and were very useful in detecting surface cyanobacteria populations. While these are more qualitative, the expression of CI indicating surface cyanobacteria in and of itself has quantitative information. For turbidity, a full atmospheric correction is required. We used a variation of the NASA standard scheme called the Management Unit of the North Sea Mathematical Models (herein MUMM) algorithm. This scheme was developed to account for NIR backscattering using a different water model and assumed a constant shape in the NIR and was derived from waters where particles are dominated by inorganic types [29], [30]. The MUMM was used for image processing and generating spectral remote sensing (Rrs) products. A turbidity product was generated from a semiempirical single-band turbidity retrieval algorithm [31], [32] for turbid waters with Rrs as input. The algorithm relates turbidity (T ) and water reflectance ρ w (λ) at wavelength λ using where A T and C are two wavelength-dependent calibration coefficients. In our study, we used the λ = 865-nm wavelength. The CI product was generated for the same image but with the partial atmospheric correction using rho (reflectance) as input. This dataset can be considered large, containing 191 × 216 pixel remote sensing images from October 2016 to the present. Although the dataset is large, there are a few issues that render some samples inoperable. On the one hand, although Sentinel-3 consists of two satellites, Sentinel-3A and Sentinel-3B, the two satellites in orbit make the revisit time of Lake Okeechobee only less than two days; there is no way to guarantee that we will get remote sensing images of the lake every day. There are also environmental sources that degrade image quality, such as clouds, sun glint, and turbidity. It is often cloudy over Lake Okeechobee, and there is additional haze from the atmosphere, e.g., the sugar cane farms surrounding the lake routinely light fires as part of the cultivation process. The types of satellites used to detect cyanobacteria cannot see through the clouds, as they are passive sensors that detect the visible light field (much like our eyes). The near-total reflection of the sun off the water directly into the viewing sensors of the satellite is known as sun glint, and it is particularly acute between April and September. The extreme level of turbidity in Lake Okeechobee further limits the capabilities of the satellites, in particular, detecting cyanobacteria beneath the surface of the water. All these factors cause random discontinuities in the dataset. Fig. 3 shows some remote sensing data from June 19 to June 28, 2021. It can be seen that these samples contain missing and bad data, which compromises the integrity of the entire dataset.
Because HABs are related to space and time, their prediction requires high temporal continuity. Lack of critical information and discontinuous datasets will greatly affect the performance of deep learning models and lead to inaccurate predictions. A remedy to this problem involves the reconstruction of the remote sensing data. Li et al. [33] proposed a method to determine total algal biomass in shallow lakes by combining remote sensing image data and hydrological data under nonalgal bloom conditions. Zhang et al. [19] used a framework called STS-CNN to do the reconstruction. However, the amount of missing data in our dataset is too influential and would cause significant discrepancies between the reconstructed remote sensing data and the actual data. The deviation from the real data would negatively impact the prediction results or even render the model not trainable.

C. Physics-Based Modeling Dataset
The model is driven by rivers' inputs and outputs of water, phytoplankton, nutrients, and organic matter, and surface forces including winds, heat fluxes, and water fluxes. Considering the input and output of the rivers surrounding the lake, our physical model domain needs to cover the entire lake and part of the watershed. Therefore, our physical model uses coastlines to delineate lakes, and the model domain includes western wetlands. We choose to start with the ROMS model where the Lake Okeechobee ROMS domain covers the entire lake with a 386 × 386 horizontal grid (resolution 150 m) and ten vertical layers. River flow and water quality data are derived from the observations by South Florida Water Management District (SFWMD), which include daily averaged water flow and biweekly or monthly water quality data. Atmospheric forcing is derived from the NCEP North American Regional Reanalysis (NARR) products, which has a spatial resolution about 30 km. Temporally, the NARR model output has a 3-h interval, which resolves most of the diurnal variations.
Sediment fluxes are likely an important source of nutrients and a sink for DO. Sediment processes, however, are not directly simulated. Rather, particulate organic matter is allowed to sink out of the bottom, and, at the same time, the sediment input of nutrients and sediment oxygen demand are specified based on simple empirical formulations. Fisher et al. [34] estimate the DIN sediment flux to be 4500 metric tons annually with some spatial variation noted. This roughly equates to 0.5 mmol/m 2 /day. Therefore, we use a simple empirical model to specify the sediment DIN flux as spatially variable depending on water depth as follows: where H is the water depth. Note that this is constant throughout the year. Several modifications have been made to ROMS codes to accommodate specific needs for modeling inland water such as Lake Okeechobee because ROMS was designed for coastal and regional oceans. These include: 1) allowing outflows of water from the modeling domain for canals or rivers; 2) including surface water fluxes (precipitation minus evaporation) in the water volume budget; and 3) using model tracer (e.g., temperature or NO3) concentration at the river mouth for tracer export (for outgoing flow only). In addition, the ROMS evaporation algorithm underestimates evaporation in the lake. An empirical evaporation formula is adopted instead.
The values of biological parameters, such as phytoplankton growth rate and zooplankton grazing rate, are adapted from [35]. Concentrations of both particles and organic matter are very high (e.g., DON ≥ 50 umol/L), which includes a significant amount of CDOM. Therefore, the light attenuation is high. Model results are calibrated against available historical water quality data collected from Lake Okeechobee by the SFWMD, which include monthly measurements of NO 3 , NH 4 , Chl, DO, and organic matter at a number of monitoring stations around the lake. We chose eight representative stations for model calibration purpose. Model results at the same locations as the monitoring stations are directly compared with observations. Once a basic calibration has been undertaken for the year 2018, a three-year simulation is ran from January 2018 to December 2020. The result of this three-year simulation makes up the physics-based model dataset, which includes daily averaged output of key physical and biogeochemical variables including currents (u, v) and chlorophyll. The physics-based model Chl values are empirically converted to CI through the relationship CI = (Chl − 24.2)/(3.083 × 10 3 ). Note that this empirical linear relationship was developed using summer data because Microcystis blooms in the lake usually take place during the summer season. As an example, Fig. 4 shows a comparison of model Chl and remote sensing CI for a few days when relatively complete remote sensing data images were available. It can be seen that there are differences between the physics-based model results and the remote sensing data. Although the physics-based model dataset is less reflective of the real HAB conditions than the remote sensing dataset, it provides a continuous and clean dataset as additional information that can be used to create a hybrid dataset for training the deep learning model. In the following, these converted physics-based daily CI will be fused with remote sensing CI to derive the hybrid training dataset.

D. Fusion of Multisource Data as a Hybrid Dataset
Here, we attempt to fuse the remote sensing and physicsbased model data as a hybrid dataset. We have a total of 1093 physics-based model data from January 2018 to December 2020; therefore, we selected remote sensing data of the same time period for data fusion. Three types of data samples are combined: remote sensing Satellite A denoted as X A , remote sensing Satellite B denoted as X B , and physics-based model denoted as X P . After our manual statistics, there are 310 days of remote sensing data that are completely missing and 178 days of bad remote sensing data. So, in total, we have 605 days of remote sensing data available. Among them, there are 100 days when both the satellites have data. The remote sensing samples are manually scrutinized for the removal of corrupt data. On days that include remote sensing data from both the satellites, the samples X A and X B are combined such that the resulting where a and b are the horizontal and vertical coordinates of the remote sensing images, respectively.
The final remote sensing sample X RS is either X A or X B if only one exists, X C if both exist, or missing if neither exists. For each day, a hybrid data sample X H is then assembled by the imputation of X P (the physics-based model generated data) onto X RS . On days with missing remote sensing data, X H = X P ; otherwise, imputation is applied such that where α and β are weights to balance the importance of different data sources. In this article, we empirically set α = 0.8 and β = 0.2 to count more on remote sensing when quality images can be obtained. This results in a hybrid dataset that utilizes the physics-based model information while considering quality information from remote sensing. Fig. 5 shows an example of the data fusion process and result. As mentioned above, when remote sensing data are missing, the physics-based model data are used for the hybrid dataset. The spread and growth of algal blooms have certain rules, but due to the differences between physics-based model data and remote sensing data, this causes some parts of our data appear irregular. In order to improve this aspect, in the hybrid dataset, if the physics-based model data are used on a certain day d, that is, X H (d) = X P , and the data of the day before and the day after this certain day d are all hybrid data, X H (d ± 1) = α * X RS + β * X P , and we then take the average of the data before and after of this certain day d as X H (d) = (X H (d − 1) + X H (d + 1))/2.
The size of the physics-based model image is 386 × 386, while the remote sensing images have a lower resolution of 191 × 216. In order to fuse the data, the resolution of the physics-based model data needs to be reduced to match the remote sensing data. The image size of our hybrid dataset is 191 × 216. Moreover, both the physics and hybrid datasets contain a large amount of information that requires downsampling before it can be used as input to the machine learning models. Furthermore, all data undergo min-max normalization. The physics and hybrid data samples are individually resized to 112 × 112 × 1 and 83 × 88 × 1, respectively. Although the sharpness of the pictures is reduced, it is beneficial for speeding up the deep learning model training and online deployment when computational resources are limited.

A. Mathematical Background of ConvLSTM
We develop our HAB prediction model based on ConvLSTM, a spatiotemporal deep learning model that heritages the advantage of LSTM for capturing the temporal correlations in the daily lake condition sequence and the advantage of convolutional neural network for extracting spatial feature in the images. The foundation and principles of ConvLSTM are presented in the Appendix.

B. Model Implementation Details
An overview of the deep-learning-based HAB prediction model development is shown in Fig. 6. ConvLSTM implementation for HAB prediction in this article is based on [36] and [37], where the model is built using five ConvLSTM2D layers with batch normalization and is then followed by a Conv3D layer for spatiotemporal outputs. Fig. 7 shows the diagram of the model training and rolling window prediction. Since the ConvLSTM2D layer only accepts the inputs that have a specific shape (batch size, sequence, width, height, channels), our hybrid dataset needs to be preprocessed before using for deep learning model training. The dataset contains 1093 samples (i.e., 1093 days of lake images), but in order to reshape the data more easily, only 1080 were used. For each training sample, the input to the model consists of 14 days of consecutive daily images, which are processed to output a one-day image prediction. Specifically, Fig. 6. Flowchart of our proposed approach, including data fusion, preprocessing, deep learning model training and testing, and results reporting.
the hybrid dataset images have been reshaped to (72, 15, 83, 88, and 1) corresponding to the numbers of batch size, sequence, width, height, and channels. This produces 72 batches of data, each batch has a sequence of 15 data images, and each image has a size of 83 ×88 × 1. The overall dataset is divided into Fig. 7. Implementation diagram of the HAB prediction model based on ConvLSTM. For each training sample, the input to the model consists of 14 days of consecutive daily images, which are processed to output a one-day image prediction. To ensure a sequence learning and prediction, a rolling window mechanism is used. a training set and a validation/testing set, where 80% are for training and 20% are for validation/testing. We do the same processing for the physics-based model dataset, and the only difference is the width and height of the images. Note that the size of the images in the hybrid dataset is 83 × 88, while the size of the images in the physics-based model dataset is 112 × 112. The loss function used for model training is the mean squared error (MSE) (i.e., this will be further discussed in the following section), and the optimizer used is the Adam optimizer. It can be seen from Fig. 8 that the loss values (both training and validation) decrease to 0, in which the model convergences demonstrate successful learning and prediction.

A. Evaluation Metrics
The following image similarity measures are used [38]: 1) Root-mean-square error (RMSE) is a common comparison metric for measuring pixelwise differences. A value of 0 indicates that the ground truth image and the predicted image are the same. The formula is as follows: where G is the ground truth image and P is the predicted image. M represents the numbers of rows of pixels of the images and i represents the index of that row. N represents the number of columns of pixels of the image and j represents the index of that column. 2) Peak signal-to-noise ratio (PSNR) is often used to quantify the reconstructed quality of images and videos affected by compression. It measures the ratio between the maximum possible power of a signal and the destructive noise power that affects the fidelity of its representation, usually expressed on a logarithmic decibel scale. The higher the PSNR, the better the quality of the compressed or reconstructed image. To calculate the PSNR, first calculate the MSE where R is the maximum pixel value of the image. 3) Structural similar index measure (SSIM) quantifies image quality degradation caused by processing or data transfer losses [39]. SSIM measures the perceptual difference between two similar images. The value of SSIM is between −1 and 1, where a value of 1 means that the two given images are very similar or identical and a value of −1 means that the two given images are very different. The SSIM index is calculated on various windows of an image. The measure between two windows x and y of common size N × N is where μ x is the average of x, μ y is the average of y, σ 2 x is the variance of x, σ 2 y is the variance of y, σ xy is the covariance of x and y, and c 1 = (k 1 L) 2 and c 2 = (k 2 L) 2 are two variables to stabilize the division with weak denominator. L is the dynamic range of the pixel values. k 1 = 0.01 and k 2 = 0.03 by default. 4) Feature similarity indexing (FSIM) is mainly used to compare the structural and feature similarity measures between the recovered object and the original object [40]. Phase congruency (PC) is used as the primary feature in FSIM, and image gradient magnitude (GM) as a secondary  feature. The FSIM value is between 0 and 1, where 1 is perfect feature similarity. If the similarity between images f1 and f2 is to be calculated, the following formula is needed: where PC 1 and PC 2 represent the PC maps extracted from f 1 and f 2 , respectively, and G 1 and G 2 represent the GM maps extracted. T 1 is a constant that increases the stability of S PC and T 2 is a constant that depends on the dynamic range of the GM value. S PC (x) and S G (x) are combined to obtain the similarity S L (x) of f 1 (x) and f 2 (x). 5) Signal-to-reconstruction-error ratio (SRE) measures the error related to signal power [41]. Using SRE is more suitable for making errors comparable between images of different brightness. SRE is calculated as SRE = 10 log 10 μ 2 where μ x is the average value of x.  RMSE, PSNR, and SRE are measures of how different two images are, which can help us judge whether the predicted image is similar to the "ground truth" image, but they do not take into account the quality of the image itself. This is solved by considering image structure (SSIM) and display features (FSIM) [38]. To sum up, the closer the RMSE is to 0, the higher the PSNR and SRE, and the closer the SSIM and FSIM are to 1, indicating that the ground truth image and the predicted image are more similar. Table I shows the prediction performance on the benchmark datasets using the five similarity measures. There are two trials of experiments using two different testing datasets. In the first trial, we use the physics-based simulation data for the testing of the deep-learning-based HAB prediction model. This can help to judge whether the ConvLSTM model is able to capture the spatiotemporal information of HABs in the Lake and make predictions. We carry out a single-day prediction using July 3, 2020, since the remote sensing data on this day is complete with high quality. The ultimate goal is to make a two-week (14-day) prediction, so a rolling window is created and the prediction result is used as input to predict the next day. More specifically, when using the trained deep learning model for two-week predictions (i.e., days d to d + 13), the first day d will be predicted using the data in days d − 1 to d − 14, then these predicted data in day d will be used together with data in days d − 1 to d − 13 to predict the second day d + 1. This will continue until all 14 days have been predicted.

1) Physics Data for HAB Prediction Model Validation:
For ease of display and comparison, Fig. 9 shows the first four days of the 14-day prediction results for the model using the physics-based model dataset only. Since rolling predictions inevitably have information superposition, this results in the predicted image being blurry compared to the original image. Downsampling is used, which further causes the image resolution to decrease and become blurry. This "blur" can accumulate and further degrade the performance of the prediction. We observe that the model can successfully capture the spatiotemporal 'evolving" patterns in the whole lake. In terms of quantitative metrics in Table I, the RMSE is very close to 0, and the SSIM and FSIM are also fairly close to 1, which further demonstrates the effectiveness of our proposed prediction method.
2) Remote Sensing Data for Hybrid Data Validation: Physics-based model simulation data may be different from the real distribution of the HABs since it is a simplification of the real mechanism. Although it suffers from sparse observation issues due to large quantities of missing data, using remote sensing data is still the best choice to reflect the real situation. To validate the effectiveness of our proposed hybrid data for deep learning model training, we use sparse but complete remote sensing data as the testing dataset in the second trial. In other words, we use the hybrid dataset to train our deep learning model and use the remote sensing data for testing. We also compare it with using the physics-based model simulation as training and remote sensing as testing.
As mentioned above, some relatively complete remote sensing data are selected. Fig. 10 shows the remote sensing data on July 3, 2020, and the prediction results using the physics-based model dataset as training or the hybrid dataset as training. Fig. 11 shows the remote sensing data from June 22 to June 25, 2020, and four days of the 14-day model prediction results for both the datasets. June 22 to July 5 are chosen as the dates to be predicted since relatively complete and continuous remote sensing data can be found in this date range. From the figures, the prediction results of the model using the hybrid dataset are closer to the remote sensing data, whether it is a single-day prediction or a 14-day prediction. In the comparison metrics, since there are only four days of complete and continuous remote sensing data, the first four days from the 14-day prediction results of the two dataset prediction models are compared. In terms of metrics in Table I, all the metrics using the hybrid dataset outperform those using the physics-based model dataset. This result also shows that although the model using the physics-based model dataset learns better, the prediction results deviate from the real situation due to the discrepancy between the dataset and the real situation, while the model predictions using the hybrid dataset are closer to the ground truth (i.e., the remote sensing data).

V. CONCLUSION
Our study shows that the HAB prediction model built by ConvLSTM was effective and that ConvLSTM can capture the spatiotemporal dependencies for prediction. Moreover, multisource data fusion as a new solution to the problems of a large number of missing, discontinuous, and low-quality remote sensing datasets in prediction was proposed. The hydrodynamic model was combined with the biological model to create a physical model to simulate the growth and spread of algal blooms, generating a physics-based model dataset that is continuous and clean. Owing to the discrepancy between the simulation results of the physics-based model and the real situation, the physics-based model dataset was combined with the remote sensing dataset to generate a hybrid dataset for physics-informed data-driven modeling. It turns out that the prediction results of the model trained using the hybrid dataset are close to the remote sensing data, proving that data fusion is a promising approach.
There are also plans to improve the quality of the prediction, such as fixing the blurred prediction images caused by the model rolling prediction, improving the image blur caused by downsampling, and increasing the quantity of usable remote sensing data by patching and reconstructing the corrupt images. In terms of data fusion, a simple interpolation was used for this experiment, and the edges of some data are not as smooth as the original images. This issue will be addressed in future work.
Wind and temperature play a key role in driving the phytoplankton blooms, mainly M. aeruginosa. It appears that M. aeruginosa migration, while important, is not strong enough to overcome strong wind mixing. Field observations suggest that most of the strongest bloom events occur under low-wind conditions. This is particularly true for the central lake where water is deeper and strong winds should disrupt any potential blooms. Temperature is also important in affecting the growth directly. One combined effect of these two factors is that strong blooms mostly occur in the summer and early fall when water is warm and winds are typically weak. In other words, the growing patterns of algal are seasonal dependent, and this factor will also be considered when developing the prediction model in the future.

Mathematical Background of ConvLSTM:
ConvLSTM is a deep learning model built from the LSTM network and the CNN. The idea is the same as LSTM, which utilizes the previous layer's output as the input for the following layer (i.e., recurrent operation). The most significant change is that each layer's weight computations are a convolutional operation. The inner structure of ConvLSTM is shown in Fig. 12. With the addition of the convolution operation, ConvLSTM can not only establish a timing relationship similar to LSTM but also has a spatial feature  extraction capability similar to the CNN. The key equations of ConvLSTM are shown as follows [15]: where σ is the logistic sigmoid function, t represents the time step, i, f , and o are the input gate, forget gate, and output gate, respectively, X is the input, H is the hidden state, C is the cell output, and b is the bias. W is the weight matrix, where different subscripts have different meanings, for example, W hi is the hidden-input gate matrix. * is the convolution operator and • is the Hadamard product. The encoding-forecasting ConvLSTM structure illustrated in Fig. 13 is used to solve the spatiotemporal prediction. It is made up of two networks: an encoding network and a forecasting network. The forecasting network's initial state and cell output are replicated from the encoding network's final state. Both the networks are created by stacking several ConvLSTM layers. All the states are connected in the forecasting network and fed to the 1 × 1 convolutional layer to construct the final prediction since the target and input of spatiotemporal prediction generally have the same dimensions.