Multivariate Multi-step Agrometeorological Forecast Model for Rapid Spray

The timing of spray application plays an essential role in daily pesticides management. Proper wind speed, air temperature, and relative humidity are the main external factors to improve the efficacy of pesticides, reduce the amount of spray drift and environmental pollution. Very few previous studies have focused on the need for short-term weather prediction in rapid spraying decisions. In this paper, a Convolutional-LSTM encoder-decoder (ConvLSTM-AE) hybrid model for multivariate output and multi-step prediction with short time intervals is proposed to predict these three agrometeorological variables in advance. This model can predict daily weather conditions at 15-minute intervals and track the changes of time-varying systems effectively. This method was also compared with other methods such as CNN, multi-head CNN, LSTM encoder-decoder, and CNN-LSTM encoder-decoder models. The results show that the proposed model outperforms other models and is suitable for daily weather forecasts in a short time interval. The obtained rapid and accurate prediction results provide a reliable basis for precise spray timing in actual farming.


I. INTRODUCTION
The development of energy-saving, environmental protection, and environmentally friendly agriculture is highly recommended in contemporary society. In agricultural spraying operations, spray drift has a negative impact on the environment; even 85% loss was reported in different vineyard structures and sprayers [1]. Thus, reducing spray drift in orchards is a major concern for researchers. Some researchers focused on the form of spray vehicles such as airblast sprayers or air-assisted sprayers and developed spraying apparatus to improve spray quality and reduce drift loss [2,3]. Carpenter et al. [4] proposed a new spray system called permanent pesticide application system, representing a fixed spray system model. As a further development, solid set canopy delivery system (SSCDS) was widely studied as the fix spray system that offered comparable performance as the airblast sprayer, improving the efficiency of the spray operation and reducing drift [5][6][7][8][9]. The influence of temperature, humidity, and wind speed on the drift was also analyzed and studied to some degree [10][11][12]. Some researchers have applied intelligent spray systems to target spray to control drift [13,14]. Rakesh et al. [15] developed an automatic SSCDS that took advantage of a relatively short time window to automatically control spray in minutes, allowing rapid spray operations under suitable ambient temperature, humidity, and wind speed. Developing new spraying automation technology for precision agriculture makes the spraying process more rapid, significantly reducing the spraying operation time. The efficient spraying quality is more dependent on external weather conditions. Previous studies have shown that it is beneficial to accurately control spraying quality under appropriate external temperature, humidity, and wind speed conditions. Still, the proper timing of spraying needs to be judged according to the weather conditions, which requires weather prediction in advance and corresponding preparation before spraying.

II. LITERATURE REVIEW
As a time series problem, time series data of agrometeorological forecast contains air temperature, dew point, relative humidity, wind speed, solar radiation, etc., which is affected by the complex interplay of some factors. Because of the importance of meteorological conditions for spraying operation, it is necessary to develop a reasonable forecast of meteorological conditions in a short time interval to provide reliable guidance before automatic spraying operation. Scholars have developed lots of researches about temperature, humidity, and wind speed prediction in recent years. Chen et al. [16] set an unscented Kalman filter integrated with a support vector regression model to update the short-term estimation of wind speed sequence. Cao et al. [17] compared a recurrent neural network with ARIMA and found multivariate models perform better than univariate models, and the recurrent neural network models outperform the ARIMA models. Chen et al. [18] proposed an EnsemLSTM model that achieved better performance in short-term wind speed forecasting. Park et al. [19] fabricated an LSTM-based hybrid model that can predict the temperature with better performance than DNN-based or LSTM-based models. Kreuzer et al. [20] showed ConvLSTM had good performance in short-term temperature forecasts with multivariate inputs. However, these prediction models only achieve univariate output predictions. Liu et al. [21] built a large VAR model to forecast three weather variables: hourly solar radiation, temperature, and wind speed using a very short-term forecasting style. Nevertheless, these predictions should be based on the assumption that these variables are stable, and the model can learn directly from lag observations, limiting the application scope of this method. Lu et al. applied the NNARX hybrid model [22], and Shi et al. [23] used an improved BP neural network to forecast indoor air temperature and relative humidity. Still, the fluctuation of external environmental factors in the natural environment is more significant than that of the indoor environment, so the model's adaptability needs further study. Hewage et al. [24] took a stacked LSTM layer to forecast weather parameters over a given time. Zhang et al. [25] developed a new method for temperature and solar radiation prediction, which used a linear combination of the observation and the forecast data. These methods use hourly data and cannot meet more granular forecasting needs. Hence, it is necessary to predict in more fine-grained units of time, build appropriate methods, optimize and compare the performance of prediction models. In the past, few literature studies focused on the effects of short-term agrometeorological temperature, humidity, wind speed and made a combined prediction between multiple factors to guide the decision of rapid spray in the orchard. Therefore, it is necessary to study this problem to meet the needs of the spraying decision. Table 1 highlights the main characteristics of the existing schemes developed for weather forecasting.
The neural network can automatically incorporate nonlinear relationships between predictor variables and forecast distribution parameters in a data-driven way without prespecified link functions. The trained neural network can gain insight into the relationship of meteorological variables, which make predictions well. Deep learning algorithms, such as Convolutional Neural Networks (CNN) or Recurrent Neural Networks (RNN), are potential solutions for time series prediction. CNN was widely used in spatio-temporal features extraction, such as image and video, as well as time series prediction. In terms of time series prediction, CNN benefits from multi-layer perceptron; it supports multivariate input, multivariate output and can automatically learn complex functional relationships. By learning and extracting features from many input sequences most relevant to the prediction, the model achieves satisfactory prediction results [26]. As a variant of CNN, the multi-head convolution is a CNN in which each time series is processed on an utterly independent convolution. Each head was designed using one-dimensional CNN to extract features from each input channel and combine them together, called the convolution head. Multi-head CNN combined multi-dimensional input data well to represent the relationship between multivariate, which can be used for multivariate prediction [27]. LSTM model was first proposed by Sepp et al. [28,29]. As an extension of RNN, the unique structure of LSTM can capture the dependency features of sequence data and solve the vanishing gradient problem in neural network training. LSTM model, LSTM variant, and other improved methods were widely used in water quality forecasting [30], air quality prediction [31], traffic prediction [32,33], and so on. Some hybrid models methodology, such as LSTM-AE, CNN-LSTM, and ConvLSTM, showed better performance than that of the single model [34][35][36]. These deep learning algorithms provide an ideal choice for solving time series problems.
The weather data at a long interval, such as several hours interval, can not reflect the actual condition on time, so it is necessary to develop several minutes interval continuous prediction to meet the needs of fast spray. Whereas shorttime intervals will increase the number of daily prediction steps and the cumulative prediction error, suitable models should be built to deal with the peculiar time-series data to obtain accurate predictions. These peculiar time series data usually include the following characteristics: 1) Non-linear relationships among variables; 2) Periodicity and outliers exist in data; 3) Prediction needs for multi-input, multi-output and multi-step. Therefore, the model should be built to deal with the periodic, short time interval, nonlinear relationships, multistep and multivariable problems of meteorological data. Based on the analysis above, this study established a multistep, short time interval multi-variable combined prediction model to meet the need of forecast weather conditions in advance, compared the performance of the prediction results with these neural network-based algorithms: CNN, multihead CNN, LSTM Encoder-Decoder (LSTM-AE), Hybrid CNN with an LSTM-AE (CNN-LSTM-AE) structure model, and a hybrid convolutional LSTM encoder-decoder (ConvLSTM-AE) model. These models are investigated, evaluated, and compared using different statistical indicators to assure that the final model meets the goal that is built for.
The statistical indicators include mean absolute error (MAE), the mean absolute percentage error (MAPE), the mean squared error (MSE), the root mean square error (RMSE), and symmetric mean absolute percentage error (SMAPE), respectively. The final results of this study clarified the feasibility of the models, verified the excellence of the method, and provided an ideal weather prediction model to guide spraying operation in actual farming.

A. CNN
CNN with multivariate input was called multi-channels deep convolutional neural networks model. In this multi-channel input model, every individual one-dimensional time series is one of the channel inputs; the model learns features from the time series input variables by using different kernel mappings. This setting can solve the problem that the output sequence expresses some characteristics of the observations and expresses the trend of the future values.
The inputs are fed into a feature extractor to learn features through convolution layers, activation and pooling layers combination. After the pooling layer, the distilled feature maps are then flattened into 1-D tensor form with a multivariable combined output. Final data was reshaped into prediction results with 96 steps ahead and 3-D vectors. 32 and 16 filters were selected for two convolution layers separately; all layers are separated by Rectifier Linear Units (ReLU), kernel size is set to 24, the batch size is set to 96, and model is stopped training at 100 epochs.

B. MULTI-HEAD CNN
A specially designed structure is responsible for extracting features from 8 input sequences. Each input variable has a separate sub-model to extract features in the sequence separately. Then the outputs from each model with features are merged into one long vector. Finally, data are interpreted by fully connected layers and reshaped into multi-step and multioutput variables. This processing method is the same as that of CNN data processing. Parameters of multi-head CNN model are set as follows: the first two convolution layer uses 32 filters, kernel size is set to 16, the next convolution layer uses 128 filters with a kernel size of 5, all layers use ReLU activation, the training epoch is set to 100.

C. LSTM ENCODER-DECODER
LSTM algorithm performs well for time series data prediction. An LSTM cell contains an input gate, a forget gate, and an output gate. This gated structure model enables LSTM to solve the learning problems related to sequential data. It can be calculated using the following equations: Input Gate: Cell: (4) Output Gate: Where it, ft, ot are input gate. Wi, Wf, Wc, Wo, Ui, Uc, Uf, Uo are weight matrices respectively; bi, bf, bo, bc are bias vectors respectively; ht, ht-1 are layer output at time step t and t-1 respectively; Ct, ̃t and Ct-1 are the alternative state of cell at time step t and t-1 respectively; The δp is the gate activation function, which normally is the sigmoid function, and the tanh is the hyperbolic tangent function.
LSTM-AE adds an autoencoder model (AE) to the traditional multi-layer LSTM [35]. AE consists of an encoder and a decoder, which provides a solution for multi-step and multi-output problems. Features from the input sequence are learned through LSTM and encoded to feature vectors for the encoder function. For the decoder, it reconstructs the input to accumulate the internal state while outputting the prediction sequence. The final optimized model for prediction uses one LSTM layer and one encoder-decoder LSTM hybrid architecture. The first LSTM layer obtains time series data and extracts features. The encoder-decoder LSTM architecture reconstructs the previous outputs with ninety-six time steps and one feature. Then it outputs a sequence with ninety-six time steps and three features; a repeat vector layer and time distributed wrapped dense layer can achieve this. An LSTM layer inside encoder-decoder architecture generates the output values sequentially. Model parameters are set as follows: the number of units in LSTM is set to 200, sigmoid is used as activation function, the batch size is set to 96, and the number of epochs is set to 100.

D. CNN-LSTM ENCODER-DECODER
CNN-LSTM-AE utilizes a network structure similar to LSTM-AE. It obtains the trend of time series through LSTM-AE and realizes multi-step and multivariate output. By inserting the CNN structure before LSTM-AE, the critical features of input variables are further extracted to obtain the complex features among multivariate variables [37]. CNN output results are flattened into a 1-D tensor and sent to the repeat vector layer; this layer repeats the incoming inputs with a specific number of time steps to output multi-step results. The subsequent traditional LSTM layer is replaced by a CuDNNLSTM layer optimized with 100 units which GPU can accelerate to speed up the training process. Finally, one time distributed wrapped dense layer is added for multi-output values, and the epoch is set to 50.

E. CONVLSTM ENCODER-DECODER
Compared with the CNN-LSTM model, the proposed ConvLSTM and LSTM-AE hybrid structure is shown in Fig.  1. ConvLSTM is a variant of LSTM that includes convolution operations inside the LSTM cell; This unique RNN model can obtain spatial relations between multiple inputs and long-term dependence on time series data; ConvLSTM replaces matrix multiplication with convolution operations at each gate in the LSTM cell [36]. This process is done with a ConvLSTM2D layer; ConvLSTM outputs were flattened into a 1D tensor and passed through an encoder-decoder structure. By doing so, it captures underlying multidimensional relationships by performing convolution operations in multidimensional data. In this paper, data was reshaped to 5D tensor and fed to a single ConvLSTM2D layer with eight filters and kernel size as 1×5, the sigmoid function is used as the activation function, the batch size is set to 1024, the number of units in CuDNNLSTM is set to 10, and training epoch uses 50.

F. WALK FORWARD VALIDATION
Multi-step prediction can be achieved by iteration of the single-step forecast, but the error accumulation of one-step ahead will significantly degrade the prediction performance. A specialized walk-forward validation method was designed to solve the multi-step time series forecasting problem before data was fed into the model. Records with a minimum available interval of 15 minutes were divided according to the daily basis; models were provided with data at intervals of days to predict multiple future steps accurately. The implementation method is described as follows: the in-sample data are set to 192, and the out-of-sample data are set to 96. This method uses two days (192 steps) of in-sample data and walks forward one day (96 steps) at a time.

G. EVALUATION METHOD
In order to evaluate the experimental results of the model mentioned above, MSE, RMSE, MAE, MAPE (%), and SMAPE (%) were chosen for statistical metrics. Definition of as follows: Where YPred(i) and YTrue(i) are the predicted value and actual value, respectively, and N is the number of all predicted values.

A. INPUT VARIABILITY ANALYSIS
All experiments were conducted on the Google Colab platform, which provided free access to a 2×2.2 GHz CPU and a Tesla T4 GPU environment. Meteorological datasets were collected from the Washington State Agricultural Weather Network (AgWeatherNet. http://weather.wsu.edu/), data were recorded every 15 minutes. Data preprocessing deleted redundant data and replaced a small amount of outlier or missing values such as NAN or values beyond a given range with linear interpolation between normal adjacent values and screened out parts with large amounts of abnormal data.  Table 2. Many methods can be used to analyze time series characteristics. Time series decomposition is one of the methods for analyzing these data. The decomposition model reduces a time series into three components: trend, seasonal effects, and noise. One primary multiplicative model decomposition is shown in Formula (13). Where the trend is the increasing or decreasing value in the series; seasonality is the repeating short-term cycle in the series; noise is the random variation in the series. This multiplicative model can explain the increasing trend of the long-term AT observed values with 20k data in Fig. 2(c). AT series in Fig. 2(a), 2(b), 2(c) represented decomposition with different periodicity when the frequency was set to daily, weekly and monthly, respectively. These decomposed data clearly show multi-periodicity, nonlinear trend, and multiple seasonality in AT subseries, the same as RH subseries. Therefore, it is suitable for extracting features from the daily data to build a prediction model.
(13) VOLUME XX, 2017 The Pearson's correlation coefficient method was also used to analyze the correlation between AT, RH, WS, and other factors in the environment. This method was measured after data was normalized, and the correlation results were shown in Table 3. The analysis results showed that AT had a strong correlation with RH (-0.78) and DP (0.72), a moderate positive correlation with SR (0.52), a weak correlation with LW (-0.38), a very weak correlation with WS (0.19). RH had a moderate correlation with LW (0.58) and SR (-0.56), a weak correlation with WS (-0.33) and DP (-0.21). WS has a weak correlation with SR (0.22). The correlation analysis of multiple input variables showed that there was an apparent nonlinear relationship among the variables.
As a result, the traditional technique for time series predictions such as ARIMA is inappropriate for solving nonlinear problems. In contrast, artificial neural networks (ANN) such as CNN, LSTM, and hybrid models are better choices. The interquartile ranges (IQR) were computed to compare the variability of input variables. The results shown in Table 4 indicated the IQR of the normalized input variables was in the range of 0 to 0.95. RH had the largest IQR (0.57), which indicated the variability was relatively high and may have a greater impact than other variables predicted by the prediction model. The IQR of AT (0.28) and SR (0.26) was lower than that of RH. WS (0.08) had the lowest IQR. For example, Fig. 3 shows the AT curves from April 1, 2018, to April 7, 2018. In general, the AT curves for seven consecutive days reached the lowest value at about four to seven o 'clock and then increased gradually, reaching the maximum value of a day around 12 to 14 o 'clock, then these curves showed a downward trend to the lowest value, the reason depends mainly on direct solar radiation. However, the variation range and time of daily AT curves are also affected by other factors like RH, wind currents, cloud cover, etc. Some involve factors can be measured by measuring instruments, whereas others may find unsuitable to be measured in a small region. These factors lead to the differences in daily AT variation, and the multivariable prediction model can reflect the correlation between variables. Fig. 3 shows that the maximum daily AT difference values was 22.8 ℉ (12.7℃), and the maximum daily average AT difference was 12.2 ℉ (6.8℃). The significant differences require the prediction model have a strong ability to follow the trend and predict effectively.

B. PARAMETER SETTING
Five network models were trained using the preprocessed dataset to compare the differences in performance. Hyperparameters of models were tuned with the Grid Search method in the model training process. Possible parameter values were selected for combination, and the performance metric was measured by validation on the validation dataset. The ideal parameter combination was finally determined to learn features and reduce overfitting, and the relevant parameters were tested more than 20 times to achieve the best model results. Above mentioned models use the MSE for performance evaluation and use "Adam" optimizer for minimization algorithm. The final parameters selection of these multivariate output models are based on minimizing RMSE of AT, RH, and WS, respectively. Table 5 shows a model structure and a comparison of the parameters of these models. ConvLSTM-AE model had the minimum number of the parameters, thus reducing the model training time. In contrast, the multi-head CNN model has the maximum number of parameters due to the stack of a large number of convolutional layers.

C. MODELING PREDICTION ASSESSMENT
Based on the analysis above, multi-step ahead prediction models were performed. Multi-step predicted and observed values of AT of different models are shown in Fig. 4, Fig. 4(a) shows the predicted values versus observed values of the five models, Fig. 4(b-f) represent scatter plots of AT prediction values of different models. Compared with the actual values, the distribution of the CNN scatter in Fig. 4(b) is the most dispersed, which leads to a relatively large prediction error. The multi-head CNN model (R 2 =0.81) in Fig. 4(c) performs better than the CNN model (R 2 =0.37) with a better correlation because the multi-head structure with multiple convolutional networks in each channel can learn more features, which helps the model perform better. For LSTM-AE, the scatter in Fig.   4(d) is closer to the standard line (slope of 1.04), but predicted values deviate too much from the observed values. The CNN-LSTM-AE (R 2 =0.86) in Fig. 4(e) has a better correlation than the multi-head CNN model, and the overall predicted value was higher than the observed value. Fig. 4(f) showed that the predicted values of ConvLSTM-AE agreed with the observed values well, with a slope of 0.72 and R 2 of 0.92. ConvLSTM-AE in Fig. 4(f) shows a slimmer distribution form compared with CNN-LSTM-AE, and apparent data aggregation in ConvLSTM-AE indicates it could follow the range of predicted values more effectively, and the scatter is closer to the standard line, the cumulative error of multi-step prediction does not increase significantly. Therefore, ConvLSTM-AE for AT prediction is more reasonable.     6 shows WS forecasts of these models. Fig. 6(a) indicates that WS's stochastic uncertainty and fluctuations changed frequently, and the instability affects the corresponding prediction. The observed values of CNN (R 2 =-0.01) in Fig. 6(b) and multi-head CNN (R 2 =0.11) in Fig. 6(c) are more dispersive than other models, which generate a larger error than LSTM-AE in Fig. 6(d) and hybrid models. Scatter plots in Fig. 6(b-f) indicate that LSTM-AE and hybrid models can better learn the time series trend and reduce the influence of abnormal fluctuations. Although the performance of ConvLSTM-AE in terms of R2 (0.17) in Fig. 6(f) is lower than that of CNNLSTM (R2=0.26) in Fig. 6(e), it achieves higher data density than CNN-LSTM-AE, and predict values are normally distributed. VOLUME XX, 2017 The optimal model performance metrics of these five predictive models are listed in Table 6. For multi-input, multioutput (MIMO) prediction, the multivariate output evaluation demonstrates a comparative performance among these models. It is found that ConvLSTM-AE has the lowest RMSE of AT (1.88) and WS (0.95) among these models, which outperforms other models. SMAPE can achieve the same conclusion. As for RMSE of RH, CNN-LSTM-AE and LSTM-AE perform better than the ConvLSTM-AE model in this single test. This result indicates that the ConvLSTM model balances the multivariable output results of each indicator so that the error of continuous prediction outputs of each variable value does not change too much. Compared MIMO with the single-input and single-output (SISO) ConvLSTM-AE model, the MIMO prediction of ConvLSTM-AE still showed better performance than SISO prediction with the help of the correlation between variables, MIMO indicators outperform SISO indicators. Results demonstrate multivariate models perform better than univariate models. Multi-step ahead prediction is also used to measure the stability of these models for seven consecutive days' tests. The average RMSE of AT, RH, and WS in Table 7 indicates that ConvLSTM-AE has the lowest RMSE, superior to other models. The results also indicate ConvLSTM-AE is stable; this model still maintains a good predictive ability and can be used for multi-step ahead prediction on consecutive days. Different in-sample data with walk-forward validation were also tested, as shown in Table 8. In-sample data of one day, two days, four days, and six days were used for one day-ahead prediction with the ConvLSTM-AE model. In general, the RMSE of the proposed model indicates that corresponding continuous training data increase with the number of insample data increasing from one day to two days, which is beneficial for the model to learn data features.
However, when the in-sample data increased from two days to four days and six days, more and more variation values of in-sample data were involved, leading to a messy variation trend, more significant prediction error, and worse performance. Therefore, relevant parameters need to be adjusted according to specific prediction needs, and two days of in-sample data are best for the prediction purpose.

V. CONCLUSSION
In this work, five models with different parameters were designed, tested, and optimized through the Colab platform with GPU acceleration, and the optimized models average training time was significantly reduced; values are as follows: CNN 136 s, multi-head CNN 875 s, LSTM-AE 1096 s, CNN-LSTM-AE 100 s, ConvLSTM-AE 126 s. Although the ConvLSTM-AE model is relatively complex, with GPU acceleration and parameters optimization, the model has an obvious advantage in training time, which is 1 to 9 times faster than other models. The test results also indicate that the ConvLSTM-AE model with larger filters and kernel sizes will learn more abnormal weather values, leading to the overfitting of the forecast results, increasing the error of the forecast results. Therefore, the ConvLSTM-AE model achieved desirable prediction results by using smaller oddsized filter sizes and kernel sizes. The walk-forward validation method, multi-step prediction, multi-input, and multi-output prediction is also realized with five models; these models' performance was also obtained. The prediction results show that the proposed model can better obtain the internal relationship of spatio-temporal series data and solve the trend prediction of time-varying signals. The proposed ConvLSTM-AE model has generated more precise and reliable forecasting results than other reference models. It could satisfy the practical requirements for spray timing selection. For future work, we will investigate further improving the model's performance by using the hybrid algorithm to conduct data preprocessing more effectively and filter the outliers for the final forecasting series.