An Enhanced Naive Bayes Model for Dissolved Oxygen Forecasting in Shellfish Aquaculture

It is difficult to predict dissolved oxygen values because they are disordered and nonlinear. Accurate prediction of dissolved oxygen in shellfish aquaculture plays an important role in improving shellfish production, and a reliable model is needed to accurately predict dissolved oxygen values. Therefore, in this paper, an enhanced naive Bayes (NB) model is proposed. Due to the excessive number of different dissolved oxygen values, their direct use as input samples will result in overly few training set categories for each value, which reduces the prediction accuracy. Therefore, the dissolved oxygen differential series dataset is used as the input data to reduce the number of training set categories and improve the training accuracy. To increase the number of samples in the training set, the sliding window concept from network communication protocols is used to partition the differential sequence dataset and generate the features and labels of the training set. The values were predicted as categories, and the dissolved oxygen data were accurately predicted by selecting the labels that correspond to the posterior probability maxima of all training samples. Finally, the algorithm is used to predict the dissolved oxygen data from February 18, 2016, to January 31, 2020, in Yantai, Shandong Province, China. The dissolved oxygen data of a shellfish farm were trained and predicted, and the best values of the feature lengths were optimized by analyzing their effects on the predicted dissolved oxygen values. The proposed algorithm has significantly improved the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) compared to the advanced algorithms. The results of the Diebold-Mariano test and 10-fold cross-validation also show that the proposed algorithm has a higher prediction accuracy.


I. INTRODUCTION
The numerical prediction of dissolved oxygen in water bodies has been extensively studied by scholars. Dissolved oxygen data are nonlinear, cyclical, and nonstationary in nature. Ahmed [1] combined a feedforward neural network (FFNN) and radial basis function neural network (RBFNN) to evaluate and predict the dissolved oxygen parameters in the Surma River. Ji et al. [2] designed a model based on a support vector machine (SVM) to predict dissolved oxygen in anoxic river systems. Raheli et al. [3] used a multilayer perception integrated with the firefly algorithm (MLP-FFA) model to predict water quality parameters collected at a Malaysian hydrological station. Huan et al. [4] combined the ensemble empirical mode decomposition (EEMD) and a The associate editor coordinating the review of this manuscript and approving it for publication was Wentao Fan . least-square support vector machine (LSSVM) to predict the dissolved oxygen sequences. Li et al. [5] proposed a hybrid model of multiscale features based on EEMD and used it for dissolved oxygen prediction in aquaculture. Ren et al. [6] used a genetic algorithm-optimized fuzzy neural network for the hydroponic system prediction of dissolved oxygen. Although the abovementioned methods can better predict the dissolved oxygen indicator, they lack interpretability. Neural network-based learning algorithms have the problem of overfitting and underfitting, and the final result of the algorithm can easily fall into the local optimum, which cannot accurately predict the dissolved oxygen content changes in the context of practical applications in marine fisheries.
Bayes' equations provide a generative model for data classification from a statistical viewpoint [7]. On this foundation, relying on the assumption of strong independence, a naive Bayes (NB) algorithm is proposed, which shows VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ good results and stable classification efficiency in predicting class problems. The algorithm also has the advantage of robustness to missing data and low algorithm complexity. Saritas and Yasar [8] used the algorithm for breast cancer diagnosis. Granik and Mesyura [9] used the algorithm to identify fake news. In the multiclassification problem, Jiang et al. [10] classified Chinese texts into five categories and predicted new texts with a combination of NB equations, while Al-Khurayji and Sameh [11] predicted Arabic texts. Xu [12] discussed the differences in text classification performance among different event models compared to NB classifiers. Mubarok et al. [13] modeled product evaluations and classified user sentiments into four categories for prediction. Karthick and Harikumar [14] used an NB model to classify oral X-rays to predict different types of oral diseases.
In the field of multiclassification, NB algorithms commonly predetermine the number of categories in advance and then make predictions, which leads to improved classification results. However, traditional NB algorithms cannot be used to solve continuous value prediction problems, and no literature has used an NB algorithm for dissolved oxygen prediction. Based on the above analysis, the problems to solve are the low accuracy of the traditional algorithm for predicting dissolved oxygen and the inability of the NB algorithm to predict continuous values [15]- [21]. Thus, this paper proposes an enhanced NB model to predict dissolved oxygen in shellfish aquaculture based on previous studies. When the difference values of the dissolved oxygen parameter sequence are taken as classification categories and Laplacian correction is performed to correct the observed values in the differential sequence, the predicted dissolved oxygen values are closer to real values than previous models.
Predictions of water quality data, especially dissolved oxygen data, are often made by passing sensor-acquired values directly to predictive models as input parameters. The prediction accuracy is affected by the size of the historical dataset; therefore, choosing a suitable way to expand the training dataset can effectively improve the prediction accuracy. Moreover, the parameters of the algorithmic model are affected by the monitoring points. The selection of different regions will lead to large changes in the model parameters, and the selection of parameters will lead to changes in the prediction accuracy. Problems such as multiple parameters and complex tuning processes have plagued traditional methods. Furthermore, there is substantial randomness in the model building process of many algorithms. Therefore, an enhanced naive Bayesian prediction model is proposed, which simplifies the parameters and no longer necessitates additional parameters, except the sliding window length.
Traditional naive Bayesian algorithms are often used to handle classification problems with a small number of categories; however, predicting continuous values, such as dissolved oxygen values, is often not possible using the naive Bayesian algorithm. This is because the use of naive Bayes requires satisfying the need for a sufficiently large training set size for each predicted category, and a large number of predicted categories exist for continuous values. In this paper, by introducing the method of a sequence of difference values, the continuous values are transformed into a sequence of difference values, thus enabling the use of the naive Bayesian method. The contribution of this paper consists of the following three parts: 1) Continuous dissolved oxygen values are predicted by improving naive Bayesian algorithms. The traditional naive Bayesian algorithm can only classify a finite number of categories. In this paper, the prediction of continuous values by using the dissolved oxygen values as model input is achieved by using the naive Bayesian algorithm.
2) A method of differential series is proposed to enhance the regularity of the data samples. The prediction performance of the naive Bayesian algorithm depends on the selection of historical data samples. The traditional method is limited by the lack of regularity in the distribution of the number of samples per category, resulting in low prediction accuracy. In this paper, the differential series is taken as a categorical category for prediction. It decreases the number of categories and increases the number of studies per category, which enhances the regularity of the data sample distribution.
3) A sliding-window data generation method is proposed to increase the number of training samples. The traditional method directly divides the time series into two parts: the test set and the training set. Using this method will lead to too few training samples, but the data generation method with sliding window proposed can effectively increase the number of training samples and thus improve the prediction accuracy.
The remainder of the paper is organized as follows. In Section II, the derivation process of the NB method and Laplacian correction method are presented. In Section III, the methods in this paper are described in detail. In Section IV, an enhanced NB algorithm is used to predict the dissolved oxygen levels and verify the superiority of the algorithm in the paper in terms of prediction performance by comparing the errors with similar algorithms. We use Diebold-Mariano test and 10-fold cross-validation to verify the advantages of our algorithm. In Section V, the main ideas of the paper are summarized, and the outlook for future work is presented.

II. RELATED WORK A. NAIVE BAYES
Bayes' theorem [22]- [25] converts the ''probability of event Y occurring conditional on event X occurring'' to the ''probability of event X occurring conditional on event Y occurring'', where P(Y |X ) is the posterior probability, P(Y ) is the prior probability, and P(Y |X )/P(X ) is the likelihood function, which can be considered an adjustment factor, as shown in equation (1).
In reality, many factors influence the events, and the known events are closely related to one another, so it is very difficult to find the conditional probability P(Y |X ) for all events in Bayes' theorem.
Based on Bayes' theorem, the interrelationship among known events is no longer considered. A strong independent constraint is added to the set of events X, and each event in the set is considered independent of the other events, which leads to the general form of the NB model (2).
Introducing this equation into a Bayes' theorem yields the NB discriminant, as shown in equation (3).
For a sequence of features, Feature = {f 1 , f 2 , · · · , f L }, where f i is the value of each attribute in the feature. For the same set of feature values, the denominator P(X ) is fixed. Therefore, in the actual calculation, the denominator is ignored, and the class with the largest numerator is directly selected as the predicted value based on the size of the numerator, as in equation (4).

B. LAPLACIAN CORRECTION
When predicting dissolved oxygen levels, because some values of the sequence of differences in the test set do not exist in the training set, the probability calculation makes P(X i = x i |Y = y c ) = 0, which makes P(Y = y c |X ) = 0, so the probability of the final occurrence of this property y c affects the final prediction. Laplacian correction (5) was introduced to correct this effect [26]- [30]. In this equation, x i refers to the observed value of the attribute in the test set.
|D c,x i | refers to the number of observations of the i-th attribute of the observation feature X i equal to x i when the predicted value Y is y c . |D| refers to the number of samples in the training set. N X i refers to the total number of possible values of the event X .
In the equation, x i refers to the observed value of the attribute in the test set. |D c,x i | refers to the number of observations of the i-th attribute of the observation feature X i equal to x i when the predicted value Y is y c . |D| refers to the number of samples in the training set. N X i refers to the total number of possible values of event X .

III. CONSTRUCTION OF THE ENHANCED NAIVE BAYES ALGORITHM A. DIFFERENTIAL SEQUENCE
Dissolved oxygen values are easily affected by many natural factors such as the climate, season, altitude, and time [31]. Therefore, the overall dissolved oxygen data show nonlinear characteristics [32]. There will be large deviations from the dissolved oxygen data collected at different times, so it is necessary to preprocess these data and then make further algorithm predictions.
The most common method of data preprocessing is normalization [33]- [37], i.e., mapping the data to values of 0-1. However, normalization results in values with many decimal places, which cannot be classified into a limited number of categories by the NB algorithm. Therefore, normalization cannot be used with this algorithm; instead, the dissolved oxygen sequence is transformed into a differential sequence, and the differential sequence is predicted.
For a given dissolved oxygen sequence S = s 1 , s 2 , . . . , s N train , the differential sequence Diff is computed according to equation (6): where N train is the length of the dissolved oxygen sequence.
To further illustrate the effect of the method of differential series introduced in this paper on the distribution of the dissolved oxygen data series, the frequency distribution histograms of the original dissolved oxygen data series and the differential data series are plotted in Figure 1. Among them, the upper part of Figure 1 shows the distribution histogram of dissolved oxygen data, which is the value of dissolved oxygen, so there are only positive values; the lower part of FIGURE 1. Histograms of the frequency distributions of the raw and differential dissolved oxygen data sequences. VOLUME 8, 2020 Figure 1 is the histogram of dissolved oxygen difference calculated according to Equation (6), with positive and negative values (0 being in the middle).
By comparing the frequency distribution histograms of the raw and differential dissolved oxygen data sequences, we observe that the data after using the differential sequence have a smaller distribution range, which enables the use of the differential sequence as a classification category in the NB classifier to calculate the probabilities. Since the sampling period of the sensor is 10 minutes, the dissolved oxygen values do not significantly change between two adjacent samples, and the difference is concentrated near zero in the frequency distribution histogram. A possible reason for the large difference is that the acquisition is interrupted due to an unexpected power failure of the equipment in some periods, and the dissolved oxygen has greatly changed after restarting. To overcome the error caused by this factor and improve the prediction accuracy, only the data with an absolute value of the difference less than or equal to 0.01 were included in the model.

B. SLIDING WINDOW METHOD TO CONSTRUCT DATASETS
The sliding window technique [38]- [40] was originally a traffic control technique in computer network communication protocols. In the Transmission Control Protocol (TCP), two parties negotiate the size of the sliding window to determine the number of bytes of data sent. As shown in Figure 2, in the data transmission process, the window is constantly sliding backward; eventually, the entire data message is transmitted. Using the idea of sliding windows, a new dataset can be generated by sliding through the data sequence. This method is commonly used to preprocess time series data. In this paper, the method is used to preprocess a sequence of dissolved oxygen differences. Referring to Figure 3, the specific method is to specify a sliding window size k and then slide backward from the starting position of the difference sequence with length n. The first k − 1 dissolved oxygen difference records in the sliding window are used as features, the last dissolved oxygen difference record is used as a label, and the window is slid to the end of the difference sequence to generate n − k data segments with features and labels. All data segments form the dissolved oxygen differential sequence dataset. Let L = k − 1 and let L be the length of the selected feature. Selecting a larger L will cause the algorithm to focus on more features, but an excessively large L will make the model overfit. Selecting a smaller L will cause the algorithm to focus more on the mutated values of the dissolved oxygen difference sequence, but an overly small L will make the algorithm biased toward predicting the mutated values, which reduces the prediction accuracy. To improve the accuracy of the algorithm, an appropriate L must be selected.

C. DESCRIPTION OF THE ALGORITHM
The length of the selected feature is specified as L when the dataset is generated using the sliding window method. The percentage of the dataset for use as the training set is Rate train percent. The specific steps of the enhanced NB prediction algorithm are as follows.
Step 1: Dissolved oxygen data preprocessing. First, the dissolved oxygen data sequence is converted into a differential sequence. Then, the first Rate train percent of records of the dissolved oxygen differential sequence dataset is partitioned as the training set, and the last 1 − Rate train percent of the records is used as the test set. Finally, the sliding window method is used on the training set to generate data segments with features and labels, and the occurrence probability of each label is calculated.
Step 2: Construction of the enhanced NB prediction model. First, all values Y i of the labels in the training set with absolute values less than or equal to 0.01 are taken as labels, and a feature space of length L is opened in the memory. The initialization of model space T is completed. Then, with label Y i , the values of the first L feature elements of Y i are saved to the feature space corresponding to Y i in model space T .
Step 3: Iteration through each label in the feature space to find the occurrence probability of the corresponding label. First, a data segment with a feature and a label (i.e., the actual value) is selected from the test set using the sliding window method. Next, to obtain the predicted value of the dissolved oxygen difference sequence from the features, each label on model space T must be traversed. In the process of traversing each label, the number of features x i of the data segment with an identical position and value in the feature space for the label in model space T is recorded as D c,x i . The number of species in the feature space that are equal to feature position i is denoted by N x . The sum of the number of occurrences corresponding to all types of fetches at feature location i is denoted by D. According to equation (7), the conditional probability of that possible predicted value is calculated. Then, the number of occurrences of that label in the training set is recorded as D c . The total number of labels in model space T is recorded as N , and the total number of data segments generated from the training set is recorded as D. The probability of that label appearing is calculated according to equation (8).
Step 4: Completion of the traversal of all labels to make predictions for the given data segment.
Step 3 is repeated to calculate the probability of each label on the dissolved oxygen difference data segment in model space T according to equation (9). The label corresponding to the maximum probability is used as the prediction value of the selected data segment. Then, equation (10) is used to reduce the differential sequence to the original dissolved oxygen sequence. In this equation, s test k−1 refers to the dissolved oxygen to obtain the predicted value of s test k for the dissolved oxygen at the next moment. N test refers to the size of the training set.
s test k = Diff test k +s test k−1 , k = 2, 3, . . . , N test (10) Step 5: Prediction effect evaluation. Steps 3 and 4 are repeated until the prediction of the test set is completed. The error function is used to evaluate the real and predicted sequences.
In the context of this application of marine pastures, this paper uses historical data combined with an enhanced NB algorithm for modeling. After the sensor reads new water parameters (such as the dissolved oxygen levels), the model can quickly assess the water condition, and abnormal conditions are given as timely feedback to experts for evaluation. The experts then take appropriate treatment measures to achieve risk avoidance and effectively reduce losses.

A. DATA DESCRIPTION
The dissolved oxygen in the water of a shellfish farm in Yantai, Shandong Province (Figure 4), is affected by the  12: for TrainData i in TrainData,T i in T do 13: if TrainData i = T i then 14: temp i ← (TrainData i ).index − L 15: for i = 0 → L do 16: value ← TrainData[i] 17:  26: for T i in T do 27: for i = 0 → L do 28 N ← length(T ) 39: end for 42: predictValue ←max(result l ist.values).index 43: PredictData.append(predictValue) 44: end for 45: Error ←calculate_error(PredictData, TestData) 46: return Predicts, Error VOLUME 8, 2020  atmospheric temperature, humidity and other weather factors. Thus, the obtained sequence of dissolved oxygen differential values in the marine pasture significantly changes with nonlinear and nonstationary characteristics, peaks and troughs.

B. ALGORITHM IMPLEMENTATION AND TESTING
Python 3 is used to write a simulation program for the enhanced NB algorithm. First, the differential sequence of dissolved oxygen data is generated; then, the length of the differential sequence to be predicted is set to L = 3.
As shown in Table 1, all data slices formed the dissolved oxygen differential sequence dataset; then, the first 99.5% of records in the dataset were selected as the training set, and the last 0.5% of records in the dataset were used as the test set. Finally, the obtained training set and test set were TABLE 1. Dissolved oxygen differential sequence dataset when the length of the differential sequence to be predicted is L = 3.
used to model and predict, respectively, the dissolved oxygen differential sequence of shellfish marine pastures. Figure 6 compares the measured and predicted values of the dissolved oxygen differential sequence. The prediction results for the shellfish marine pastures based on the enhanced NB algorithm are consistent with the dissolved oxygen differential sequence of the actual marine pastures, which can better reflect the nonlinear variation pattern of dissolved oxygen. Figure 7 shows the relative error plot between measured and predicted values of the dissolved oxygen difference sequence. The smaller relative deviation based on the enhanced NB algorithm enables a more accurate prediction of the dissolved oxygen difference sequence.
Equation (10) is used to convert the dissolved oxygen differential sequence into a dissolved oxygen sequence. The predicted dissolved oxygen levels show that the algorithm generally can predict the trend in dissolved oxygen values with high accuracy.

C. COMPARISON OF THE EFFECTS OF DIFFERENT PREDICTIVE MODELS
Scikit-learn is a Python library that integrates a wide range of machine learning algorithms [41]. In this paper, the multilayer perceptron regressor (MPR) and support vector regression (SVR) algorithms provided by Scikit-learn are chosen to predict the dissolved oxygen time series of the marine pastures. In addition, the same dataset is predicted using the RBFNN, long short-term memory (LSTM) and the autoregressive integrated moving average with exogenous variables (ARIMAX) algorithms [42]. The prediction results are compared with those of the proposed enhanced NB model. It is concluded that the proposed algorithm obtains relatively good prediction results, as shown in Figure 8.
To further quantify the prediction performance of the algorithm, the algorithm errors were evaluated using the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) [43]- [45]. In Table 2, the MAE, RMSE, and MAPE decreased by 0.0141, 0.0043, and 0.002, respectively, when we compared the enhanced NB model with the MPR on the same dataset and test set. Compared with those of the RBFNN model [46]- [50] with 10 hidden layers, the MAE, RMSE, and MAPE of the enhanced NB model decreased by 0.0713, 0.3463, and 0.0076, respectively. The model proposed in this paper decreases the MAE, RMSE and MAPE by 0.0327, 0.0402 and 0.0038, respectively, compared to the ARIMA algorithm. Compared to the SVR algorithm, the MAE, RMSE and MAPE decrease by 0.03, 0.0099 and 0.0034, respectively. Compared to the LSTM algorithm, the MAE, RMSE and MAPE decrease by 0.03, 0.0099 and 0.0034, respectively. The results were 0.0713, 0.1236 and 0.01. Thus, the enhanced NB algorithm can make effective predictions of dissolved oxygen data in marine pastures.
The Diebold-Mariano test is a statistical test method with results that obey a normal distribution [51]- [53], and by comparing the companion p-values from the Diebold-Mariano test, it is possible to determine whether there is a difference between two time series prediction algorithms. The MAE or MAPE is commonly used as an error function, which is combined with the Diebold-Mariano test results to evaluate the predictive ability of the model. In the above Table 3, the results of the DM test are shown in the MAE sense regarding the statistics. The comparison data shows that the companion p-values of each DM-MAE statistic are less than α at the level of α = 0.05; i.e., the difference between the predictive ability of each similar algorithm and the model proposed in this paper for the dissolved oxygen data is large in the MAE sense. The confidence level and significant findings indicate that the predictive ability of the enhanced naive Bayesian model is indeed better than that of similar algorithms. In Table 4, the results of the DM test under MAPE significance indicate that each similar algorithm is significantly different in terms of predictive ability compared to the enhanced naive Bayesian model. At the level of α = 0.05, the associated p-values of each DM-MAPE statistic are less than α. The confidence level and significant findings indicate that the predictive ability of the enhanced naive Bayesian model is indeed better than that of the similar algorithms.

D. 10-FOLD CROSS-VALIDATION
In the previous section, we selected the first 99.5% of the dataset as the training set and the last 0.5% of the dataset as the test set and used a sliding window method to expand the training set sample. The prediction results were finally compared with a variety of existing time series prediction algorithms. With the use of this dataset partitioning method, the method proposed in this paper achieved a large improvement in prediction accuracy over other algorithms. To rule out the possibility that this improvement in prediction changes due to the training set partitioning Please ensure that the intended meaning has been maintained in this edit. and, thus, to further illustrate that the enhanced naive Bayesian algorithm proposed in this paper has a higher accuracy in dissolved oxygen time series prediction, the 10-fold cross-validation method was again used to partition the dissolved oxygen dataset used in this paper. According to statistics, the number of valid dissolved oxygen data entries in the dataset implemented in this paper totaled 125,883, which were divided into 10 equal parts; one was selected as the test set, and the remaining data entries were used as the training set. Then, the MAE, RMSE and MAPE errors of the prediction results were calculated using the enhanced naive Bayesian algorithm proposed in this paper. Consequently, the MAE, RMSE and MAPE results of the proposed algorithm are 0.06365, 0.13039 and 0.02173, respectively, after 10-fold cross-validation.

E. EFFECT OF DIFFERENT FEATURE LENGTHS ON THE PREDICTION PERFORMANCE
To further verify the prediction effect of the enhanced NB algorithm for dissolved oxygen values under different feature lengths L, enhanced NB models with different differential sequence lengths L were used to predict the samples obtained from February 18, 2016, to January 31, 2020. Their MAEs, RMSEs and MAPEs are compared in Table 6. The prediction accuracy of the enhanced NB algorithm for  dissolved oxygen first increases and then decreases when the differential sequence length L increases, which indicates that the differential sequence length L affects the prediction performance for different water conditions. When there are few data samples in the training set, setting a large L will create insufficient samples to fit the prediction model, which decreases the prediction accuracy. Therefore, when the amount of data is large enough, appropriately increasing L can achieve a better prediction accuracy.

V. CONCLUSION
To solve the problem of the inability of the traditional NB algorithm to predict continuous-type multicategorical variables, this paper proposes an enhanced NB algorithm and compares it with other models. The results show that compared with the traditional RBFNN algorithm, the proposed algorithm 1) greatly improves the prediction accuracy and 2) can predict continuous attributes. In addition, the prediction model proposed in this paper provides an important reference for dissolved oxygen data in shellfish pastures. The following two factors have led to an improvement in the predictive accuracy of the enhanced NB algorithm: 1) In this paper, we predict the dissolved oxygen (DO) differential series as a classification category, which enables the improved and enhanced naive Bayesian algorithm to predict continuous values. The advantage of naive Bayes is its ability to predict future trends from a probabilistic perspective based on historical experience, but using the naive Bayes algorithm requires a large number of training samples for each prediction category. In this paper, the dissolved oxygen values are differenced to reduce the number of prediction categories to meet the requirement of using naive Bayes. It also compensates for the fact that, due to the excessive number of different dissolved oxygen values, using them directly as input samples will result in overly few training set categories for each value, which improves the prediction accuracy. At the same time, the distribution of the data after using the differential series tends to be more Gaussian and more regular, which helps the algorithm to learn the regularity of the data itself, thus improving the prediction accuracy.
2) A novel training set sample generation method is used to increase the size of the training samples. To increase the number of training set samples to improve the accuracy of dissolved oxygen prediction, the sliding window concept from network communication protocols is used to partition the dissolved oxygen differential sequence dataset to generate the features and labels of the training set. The values were predicted as categories, and the dissolved oxygen data were accurately predicted by selecting the labels corresponding to the posterior probability maxima of all training samples.
In this paper, the enhanced NB model shows good results for the prediction of dissolved oxygen in shellfish pastures, and the generalization ability of this algorithm will be further discussed and investigated in future work. Considering the strong independence assumptions added to the conditions in the NB equation, which affect the final prediction accuracy, in the next study, application scenarios of the NB equation in this algorithm will be further investigated to obtain better prediction results. XUEYING WANG is currently pursuing the master's degree with the School of Computer Science and Technology, Shandong Technology and Business University, Yantai, Shandong. Her current research interests include computer applications, artificial intelligence, data mining, and computational intelligence. VOLUME 8, 2020