Short-Term Traffic Flow Prediction Using the Modified Elman Recurrent Neural Network Optimized Through a Genetic Algorithm

Traffic stream determining is an essential part of the intelligent transportation management system. Precise prediction of traffic flow provides a basis for other tasks, like forecasting travel time. While traditional methods have some merits for improving traffic prediction precision in some ways, high precision, considering different circumstances, is still difficult to achieve. This article presents a short-term traffic flow prediction model based on the Modified Elman Recurrent Neural Network model (GA-MENN) to deal with this practical problem. In GA-MENN, the algorithm of Elman Recurrent Neural Network is modified, optimized through the Genetic Algorithm (GA) and considered weather conditions, weekday, hour and day’s classification to forecast the vehicle velocity in Tehran streets and highways. The traffic data were collected from the online Google Map API service for 139 routs in 7 districts in Tehran. The method improves prediction precision and also lowers the prediction error rate, according to experimental results. Exploratory outcomes verify the superior performance of the proposed traffic condition prediction model over Regression Multi-layer Perceptron, Linear Regression, Logistic Regression, Probabilistic Neural Network, Regression Generalized Feedforward, Time-lag Recurrent Network, Support Vector Machine model, Elman neural network, K- NN model, ARIMA, Kalman filter model, Convolutional Neural Networks (CNNs), SARIMA, and Long Short-Term Memory (LSTM) model. To the best of our knowledge, this is the first occasion when that traffic stream is gauged in urban roads and avenues in this specific way.


I. INTRODUCTION
With the growth of city-dwelling populations, cities are in need of increasingly sophisticated transportation systems to address the transport needs of its citizens [1], [2]. In many cities, the failure to address this issue has led to increased use of private motor vehicles on the streets, leading to exacerbated traffic congestion problems [3]. Given the end effects of traffic congestion such as air pollution, prolonged travel time, and dissatisfaction of citizens, the prediction of traffic flows on the city streets has become a popular subject of research.
The associate editor coordinating the review of this manuscript and approving it for publication was Liang Hu . Traffic flow forecasting methods can be categorized based on whether they are model-based or data-driven, direct or indirect, and parametric or nonparametric. In the data-driven approach, predictions are based on historical data, or more specifically based on a comparison between current traffic conditions and the most similar historical instances in terms of average speed, traffic volume, the hour of the day, etc. Models and algorithms that operate based on this approach include ARIMA [4], [5], linear models, neural networks, support vector regression model [6], and non-linear time-series models.
The model-based methodology is to make an expectation dependent on demand estimates and capacity estimates for road segments. Given the reliance of this method on estimates, it is more complex than the other approaches and requires more careful supervision. Notable examples of this approach include DyanMIT-R, SBOTTP, TOPL (CTM) and OLSM [7].
In the indirect approach, traffic flow predictions are based on the differences in inputs (traffic flow, velocity, volume, etc.), but in the direct approach, traffic flow is predicted based on the past data. The parametric approach involves formulating a mathematical model for the traffic flows. The methods following this approach include Markov chain algorithms, Kalman filtering [8], autoregressive integrated moving average (ARIMA) model [9], exponential smoothing algorithm [10], chaos theory, wavelet algorithm [11] and Bayesian network. The nonparametric approach involves analyzing the historical data to extract some rules for the relationship between them and then predicting the future traffic conditions accordingly. This approach is used in many methods including artificial neural networks (ANNs) [12], support vector regression (SVR) [13], k -nearest neighbor (k-NN) [14], and reinforcement learning [15]. Reference [16] optimized the traffic flow congestion and predict it. They used Markov Decision Process and Q-Learning to learn policies. Reference [17] proposed a computational intelligent model.
Over the years, short-term flow prediction has been addressed by many studies, which have had many interesting results, but there is no a reliable research study that predicts traffic flow using weather conditions, weekday, hour and day's classification at the same time, which is affecting the traffic. This is the first attempt to produce a model for traffic flow predictions based on traffic information, spatial-temporal information, weather data, the day type in terms of being a working/weekend day, and the day type in terms of holiday status, all together. Considering the substantial impact of weather conditions and day type (working days/weekends, holidays) on traffic flows, incorporation of these variables into the prediction process can yield far more accurate short-term traffic flow projections. Therefore, this article introduces a GA-MENN model for traffic flow prediction by considering various factors together. This model makes use of the modified Elman recurrent neural network with delay effects incorporated into the formulation to achieve higher degrees of realism. Also, the genetic algorithm (GA) is utilized for process optimization. Not only does this model prevent falling into local minima, it is also very agile to find solutions thanks to GA. Likewise, GA-MENN is a dynamic NN that has the capacity of dynamic mapping function by saving inner history state information to adjust to time-changing attributes. The model uses road data, weather information, working days /non-working status (holiday status), day of the week, and hour of the day for a more accurate short-term traffic flow prediction. To consider holidays in the traffic prediction, days are classification in terms of the holidays into four types of working and non-working days. Traffic data of the roads is obtained from the Google Map API service, which is chosen because of its high accuracy and broad coverage.
The model performance is assessed on the basis of the modified mean absolute percent error (MMAPE) [75], Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean-Squared Error (MSE), and in comparison with over-regression, multi-layer perceptron, linear regression, logistic regression, probabilistic neural network, generalized feedforward regression, time-lag recurrent network, support vector machine models, Elman neural network, K-NN model, ARIMA, Kalman filter model, Convolutional Neural Networks (CNNs), SARIMA, and Long Short-Term Memory (LSTM) model.
The rest of this document is organized as follows. Section 2 examines the history of research on short-term traffic flow prediction. Section 3 describes the data collection method and the developed model. In Section 4, the model is applied to the existing data and the results are compared with seven other models. Finally, conclusions and suggestion for future works are provided in Section 5.

II. LITERATURE REVIEW
The history of research on short-term traffic flow prediction dates back to the 1980s. The most important part of predicting traffic flow is to construct the traffic prediction model, which can be done in many ways. In research by [18], variable-order Markov (VOM) model and probability suffix tree were used to develop an association rules based method with enhanced prediction performance. They calculated the overall transport transition of a region instead of calculating the velocity of a specific region. AR algorithm is used to extract road relations. Reference [19] proposed a novel model called pattern sensitive network that uses adversarial training to accurately predict the flow of traffic in typical and atypical conditions. In another work, [20] introduced a new method called the online learning weighted support-vector regression (OLWSVR) for the prediction of short-term traffic. For long, neural network models have also been extensively used in traffic flow prediction, as they allow researchers to use nonlinear equations and universal approximability of unknown functions. Contrasted with traditional factual forecast models, neural networks possess some inherent properties that make them more suitable for a variety of modeling applications ranging from multivariate modeling to flow modeling. First of all, they are self-adaptive methods driven by data and can identify the relationship with minimal a priori information [21]. Second, given their capacity to gain from information (notwithstanding while fundamental connections between factors are not known), their nonlinear nature, and their capacity for generalization (subsequent to training with test information, they can give expectations to the piece of the information that isn't utilized), these models are perfects tools for working with noisy databases, which is not uncommon in time-series modeling for traffic prediction in real-time. VOLUME 8, 2020 The above characteristics show how Artificial Neural Networks can theoretically model temporal as well as spatial traffic. In this area, the literature contains many models of neural networks consisting of structures that are fully static to highly dynamic. A large number of these models have been utilized in traffic congestion modeling. For example, [22] used a combination of neural networks and Bayesian inference for traffic prediction. They coupled the back propagation and the radial basis function into a Bayesian mixed neural network model. In another work, [23] introduced another technique, based on three traffic microwave sensors collected in Beijing, which included a fuzzy neural network structure for traffic prediction. In addition, they used a weighted recursive least squares estimator to optimize the parameters.
Recently, a number of improved algorithms such as Long Short-Term Memory Neural Network (LSTM) [24], [25], state-space neural network model [26], ensemble learning algorithms [27]- [30] and hybrid algorithms [31]& [32] have shown superior effectiveness and computational performance over traditional methods. Reference [33] proposed a LSTM method with feature enhancement in order to predict the traffic flow. The algorithm complements LSTM deficiencies at excessively long distances. In [38], they suggest a recurrent neural network based on Attention-based long-term memory (LSTM). They assess the forecast architecture of the Gray-Chicago-Milwaukee Transport Corridor (GCM) information from Chicagoland in real-time.
In another work, a novel fuzzy-based convolutional neural network (F-CNN) approach for predicting more accurate traffic flow is suggested by applying a fuzzy approach which reflects the features of traffic accidents when first introduced uncertain information for road accidents in the CNN [36].
Besides the mentioned NN models, other supervised learning methods are used in different studies as well. Reference [34] combined the swarm optimization with genetic algorithm to optimize the least square support vector machine and forecast the traffic flow. The model was compared with non-heuristic and heuristic algorithms and proofed its ability in predicting the traffic flow. Reference [35] proposed a prediction method by combining support vector machine model and five different denoising algorithms to improve traffic flow forecast precision. A Date Pattern and Likelihood Ratio Trials based for urban short traffic speed forecasting from large scale taxi GPS data is proposed in [37].
Reference [39] extends the applicability of the KNN technique to short-term traffic volume projections and offers a viable strategy to asymmetrical loss projections. Reference [40] implemented an enhanced K-nearest neighbors (K-NN) algorithm to propose a data-driven and non-parametric methodology in order to predict traffic flow congestion by identification of comparable patterns of traffic. While these mentioned studies have proposed different types of methods that have their own advantages and drawbacks to predict short-term traffic flow, they have not evaluated the Elman type of the recurrent neural network, which is a powerful network for dynamic and real-time applications [76].
Several researchers have worked to replace traditional regressions and time-series [41] & [42] with the use of the neural networks. All these works have pointed out the potential superiority of neural networks over the aforementioned approaches and recommended further research using larger and more realistic databases in different fields. Elman proposed in 1990 a model for neural network speech processing, which today is known as the Elman Recurrent Neural Network (ERNN) [43]. The outputs of the hidden layer in ERNN are passed on to the buffer layer, the recurrent layer, and then returned as feedback to the hidden layer. This feedback enables ERNN to track spatial and temporal patterns and learn them. Each hidden neuron is linked to one recurrent neuron with a steady weight of 1 and the number of recurrent neurons is equivalent to the number of hidden neurons. Hence, the recurrent layer is basically a copy of the hidden layer from a moment before [44] & [45]. Although [44], [45] proposed an ERNN based model for short-term traffic prediction, they haven't addressed the delays of the traffic management systems. Given the presence of delays in real-world systems, this study uses a modified version of ERNN, which is described in later sections.
The search for the optimal network structure often involves a process of trial and error. Research on network structure optimization has suggested that genetic algorithms could serve a highly effective tool for finding near-optimal network structures [46] & [47]. The most important factors of interest in learning optimization process are the speed of convergence and the convergence to local minima. Genetic algorithm, as an archetype of evolutionary computing, has shown superior performance in solving complex, non-linear, and parallel problems without a priori information [48].
In recent years, many researchers have used genetic algorithms to optimize neural network structures in different applications including real-time applications: water temperature prediction [77], parking space prediction [78], cognitive map [79], spatial accuracy [80], and short-term traffic prediction [45], [50], [81]. Reference [49] used the genetic algorithm in specific neural network structures. They proved the genetic algorithm's superiority over the backpropagation. Reference [50] developed a new method for short-term traffic prediction using advanced time delay neural network and then used a genetic algorithm to optimize it. Zhan Lijun combined the genetic algorithm and Elman recurrent neural network for use in the prediction of stock prices [51]. In another work, to optimize the weight and structure of Elman's recurrent neural network and to get good predictions on Dongfeng Motor stock prices, Wang Tianee used genetic algorithms [52]. Zhang Xiuling employed a genetic algorithm to optimize the initialization and thresholds of Elman recurrent neural network and employed the developed model to predict the capacity of MH-Ni batteries [53]. In another work, the genetic algorithm was combined with multilayer perceptron for prediction of short-term traffic [54]. The research of [50], [54] focused on optimizing the neural networks to predict the short term traffic flow using spatial and temporal information, but they have not considered other factors affecting on traffic, e.g. weather, weekday, and special events. Combining the genetic and neural network algorithms has been evaluated in other research fields. Kim et al. combined the genetic algorithm with the back-propagation network (BPN) model for use in cost prediction [55].
Considering different conditions influencing short-term traffic forecasting, such as work days and weekends [23], [36], weather [36], and special days (holidays) [13], [20], [39], [57], are crucial for accurate prediction of short-term traffic flow. Reference [31] proposed a hybrid algorithm using support vector regression, random forest, and genetic algorithms, while [33] proposed a LSTM method with feature enhancement to complements LSTM deficiencies at excessively long distances, but they only used weekday data and removed weekend data. In another work, a novel fuzzy-based convolutional neural network (F-CNN) approach for predicting more accurate traffic flow is suggested by applying a fuzzy approach which reflects the features of traffic accidents when first introduced uncertain information for road accidents in the CNN [36]. In addition, external factors were considered, including weather, wind speed, temperature, and weekday/weekend information, while the holiday condition was ignored in the study. Although the holiday conditions were addressed in some research, they have not considered holidays as an input. Reference [39] extends the applicability of the KNN technique to short-term traffic volume projections and offers a viable strategy to asymmetrical loss projections, evaluating the proposed technique in holidays and normal days. Reference [57] applied convolution neural network (CNN) deep learning to predict short-term traffic flow advocating weather and holiday periods as future works. Other researches are listed in Table 1.
In conclusion, several research studies have been conducted to predict short-term traffic flow using different factors and methods for prediction. Although the literature evaluated different NN models by considering some factors affecting traffic, this research will consider several factors together, GA-MENN model, and google maps data to predict traffic velocity. Most of the research considered temporal [61], [62] and spatial-temporal [18], [56]- [58], [60], [63] features of the vehicle velocity in the short term traffic prediction, but they lacked consideration of atypical conditions including the weather conditions, weekdays, hour and holidays. Some research addressed these factors; however, they have not been considered all together to predict the vehicle velocity. Among the four factors, the days' classification is defined in this research with regards to the impact of weekends and holidays on the traffic. These parameters along with the vehicle velocities are models using the ERNN model modified with delay effects and the GA optimization algorithm to improve the accuracy and degrees of realism in the field of short-term traffic follow prediction, which has not be applied in this field based on the previous research. Finally, the traffic data of the google map API will be used to train and test the model to forecast short-term traffic follow that the previous research mostly has used GPS, sensor, or simulated traffic data. Google map data is a good source of traffic especially for countries that do not have suitable infrastructure to collect traffic data.

A. DATA DESCRIPTION
Accurate prediction requires reliable traffic information about the area of interest, which was not always easily available. Nowadays, this information can be collected in several ways, for example by sensors and using online data transfer services. In recent years, a growing number of researchers have started to use Google Maps to collect traffic information. For example, [64] introduced a system based on Google Maps and Charts Application Programming Interfaces (API) to help traffic engineers identify traffic nodes without data mining. In another work, the Distance Matrix API of Google Maps was used to predict the urban traffic in Austin [65].
In this work, traffic data were collected from the Google Map API. This API provides various services in the form of map diagrams and route information (between 2 points). In this study, Direction API was used to collect route information, including travel time and route length. Using this API, fairly accurate information about the routes in different days of the week and different hours of the day were collected.
In total, data related to 139 routes in 7 districts in Tehran were collected. The routes were chosen to include 46 highways and 61 roads that had the most traffic throughout the day. In these routes, such as Hemmat highway, Imam Ali highway, and Bagheri expressway, the average speed during the day is between 30-50 km/h which is below the allowed speed rang, 70-120 km/h. To achieve comprehensive results, 32 low traffic routes were also included, among them Golbarg street, Janbazan street, and Farjam street. According to the collected data, the average speed of these routes is 20-40 km/h which is below the maximum speed allowed of streets, 50 km/h. Figure 1 shows the network traffic and selected routs. A web-based service written in C# was used to automatically collect data every 30 minutes between 06:00 and 23:00 for 11 days and store them in a SqlServer Database. In the end, 56,741 records were obtained.
In addition to route information, traffic information, time, the weather data and the day's classification were also added to the model. The half-hourly weather data collected from an online source (www.timeanddate.com) in four categories of Rainy, Clear, Cloudy, and Foggy were added to their corresponding records. Given the impact of weekends and holidays on the traffic in the preceding and following days, all days were divided into four classes (for easier comprehension, all weekend days and holidays, regardless of their nature, are simply referred to as non-working days). Figure 2 demonstrates the classes more clearly.
I-a working day that is followed by another working day, II-a working day that is followed by a non-working day, III-a non-working day that is followed by a working day, IV-a non-working day that is followed by another non-working day.
The day's class in terms of working/non-working status was also added to the records. For the sake of simplicity, the records were adjusted to have speed instead of time wherever possible. This was done by using the formula v=x/t, where v is the velocity, x is the rout distance, and t is the time to compute the velocity in each route. So, we calculated the speed for each route. Figure 3 displays the average speed in different weather conditions. The highest traffic volume is related to rainy weather and the lowest is related to foggy weather. Figure 4 shows the average speed at different days of the week. As can be seen, the highest traffic volume (lowest speed = 31.45 km/h) has been recorded in Tuesdays (traffic versus time of the day is plotted for 20 random routes). Figure 5 illustrates the average speed versus the day's working/non-working status. As this figure shows, days of class II have had the most traffic and days of class III have had the least. Figure 6 shows the average speed at different hours of the day. It can be seen that the hours between 17:00 and 20:30 have had the highest amount of traffic and the hours between 06:00 and 08:00 have had the lowest traffic [45] Figure 7 demonstrates the average speed in different routes. As depicted in the figure, 8% of routes have the highest traffic   flow, 39% of them have the average traffic flow and 53% of all have the lowest traffic congestion. Figure 8 shows scatterplots of sample data to represent patterns of data. Figure 8a plots sample velocities versus routes and weekdays. It shows the velocity was changing from around 15 to 80 at different days of the week. Figure 8b displays the patterns of velocities in different routes and weather conditions. As weather seldom is foggy in Tehran, the sample data in foggy condition is too small compared to other weather conditions. Figure 8c demonstrates the velocity patterns versus routes and hours of a day. Its shows that the hours between 6:00 to 7:00 haven't recorded the velocity  below than 20 km/h except some road ID such as 38, 68, 80, 101, and 138. Figure 9 shows scatterplots of predicted velocities by GA-MENN. Comparing the scatter plots of the predicted  velocities and the sample data (Fig. 8) reveals that the patterns the predicted velocities and input velocities are pretty similar, as expected. Therefore, GA-MENN is suitable model to predict the traffic based on the velocity changes at different routes, different weather conditions and different days of the week.

B. GA-MENN METHOD
This article introduces an algorithm called GA-MENN, which is developed by combining the Elman recurrent neural network with the genetic algorithm. The work attempts to address the fact that in many dynamic systems, the effects of influential phenomena may appear with a delay, and this delay can have a significant impact on the accuracy of analysis and results. The best way to deal with this issue is to consider the delays in the Elman recurrent neural network [66]. In this work, a modified Elman recurrent neural network model is used in combination with the genetic algorithm for short-term traffic flow prediction. GA is used to optimize the parameters of MERNN. GA improves the generalization feature of NN and then the prediction accuracy [82], [83]. Although GA does not guarantee finding a global optimum solution [84], this algorithm compensates for the limitations of static property given by NN models and inclines the training process to fall into a near-optimal solution [82]. In addition, GA makes it possible to model complex nonlinear functions using NN models [85].  1. Feature selection. The original data is supposed to have features. The main part analysis is used to obtain the parts with a total commitment rate of over 90%, thus giving the dataset S = {(x k , y k )(k = 1,2, . . . ,n)}, in which x k ∈ R r and y k ∈ R s and n is the number of data.
3. The modified ERNN model is used to predict travel time. The structure of this modified Elman recurrent neural network is illustrated in Figure 11. In the changed ERNN model, the signal for input (u(k −τ )) specifically incorporates the pure delay time (while in the old version of structure, the neural model itself has to approach the delay in the dynamic model). The weight of the network is the same as in the classical ERNN structure. The ERNN modified model output is as follows [66]: where x i (k) is the following hidden nodes output signal, (i = 1, . . . ,K), x 0 (k) = 1, and b h and b o are the biases of the hidden units and output units. Also, where the sum of the hidden nodes i is as follows: After combining the above equations, the final equation of the modified ERNN model will be 4. The genetic algorithm is then used to optimize the modified ERNN model through the following process.
1. Population initialization: The weights and thresholds in the definition of chromosomes should be inserted in order to optimize ERNN using GA. The N chromosome population is produced randomly here. The initial weights and thresholds of ERNN are composed of every gene of the chromosome. Below is the chromosome's floated-point coding length: K = P * m + P + P * n + n + P * P (5) VOLUME 8, 2020 2. Fitness function: Each chromosome in the population is evaluated for its fitness. The formula used for this purpose is where N is the number of chromosomes in the population; y t j,i and y j,i are the predicted and observed speed in the current route; and T is the amount of neurons output, that is set to 1 as speed is only variable being predicted. At this point, the minimum fitness value is determined. If it is lower than the overall fitness minimum value, the global fitness minimum value is updated.
3. Selection, crossover, and mutation of the parent population to create the next-generation population. a) Selection operation: After each chromosome has calculated its fitness, the new population is selected with several chromosomes. The chance of selecting a chromosome depends on its fitness. Given the formulation used in this study, the low fitness value chromosomes are chosen and the one with high fitness values are discarded. The roulette wheel selection operation is applied as follows: where F j is the fitness value and p i is the chance of chromosome i being chosen. b) Crossover operation: For the crossover process due to the actual number coding, the real number crossover method is used. The chromosome c k and chromosome c i crossover operation in j position are as follows: Here, c kj and c lj are the k th and i th values in the chromosome j, representing the ERNN weight or threshold, and r is a 0 to 1 number. C) Mutation operation: The mutation operation role is to diversify the population. The j th gene mutation in the chromosome i th (m ij ) is defined as follows: where m min and m max are the minimum and maximum value of weight or threshold in the j th gene of the i th chromosome, and r is a random number between 0 and 1. f (g) is given by where r 1 is an accidental number, g represents the current iteration number, and G presents the evolutionary generation value. 5. When the evolutionary generation value reaches a maximum, the process will stop and the latest weight and threshold values will be extracted. Otherwise, steps (b) and (c) will be repeated.
6. The obtained weight and threshold are applied to ERNN, and after training, it will be able to reach the desired accuracy or condition.

A. ANALYSIS OF THE PREDICTION RESULTS
The data obtained from the Google Map API was used to measure the performance of the proposed model alone and in comparison with several other prediction models. Table 2 shows the sample of data that was used for training, evaluating and testing the prediction models. In the table, the integer number from 1 to 7 were used to represent weekdays, Saturday to Friday. For weather status, the numbers 1, 2, 3, and 4 refers the rainy, clear, cloudy, and foggy weather, respectively. The time was calculated in the hour format. The holiday status was represented with 0 (class I), 1 (class II), 2 (class III), and 3 (class VI).
In this evaluation, 60% of the records were used for training, 15% of them were used for validation, and 25% were used for testing. Figure 12 displays the proposed model outputs in comparison with the actual records for the training set. Figure 13 makes a similar comparison for the records of the validation set. In Figure 14, the outputs of the model are compared with the observed values in the testing set.
As is apparent, the proposed model has a high degree of accuracy, as its predictions are very close to real values. This will be discussed in detail in the next section. In addition, the optimized weights and thresholds allow the model to overcome the problem of falling into local minima and enhance its ability to trace optimal solutions, which significantly improves the reliability of predictions.

B. ERROR ANALYSIS
The metrics used in the evaluation of the proposed model are the modified mean absolute percent error (MMAPE) [75], Root Mean Square Error (RMSE) [68], and Mean Absolute Error (MAE) [68] with the following formulas: where E i denotes the predicted values, M i denotes the observed values,Ē andM are the means of E i and M i , and n is the total number of records. RMSE is used to evaluate the performance of the traffic flow prediction models. Indisputably, the smaller the RMSE, the more accurate the forecasting [27]. MMAPE represents the error between expected values and real values and is typically appropriate for the measurement of non-linear large-scale data sets [31]. The MAE measures  the resemblance between the anticipated vehicle velocity and the real vehicle velocity [40]. For a better evaluation of the model, its performance was compared against regression multi-layer perceptron, linear regression, logistic regression, probabilistic neural network, regression, generalized feedforward, time-lag recurrent network, support vector machine model, Elman neural network, K-NN model, ARIMA, SARIMA, Kalman filter model, Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM) model. KNN is one of the easiest pattern identification techniques. The predictive stage is slow since the distance to each training data point needs to be calculated. Reducing the amount of features or using several prototypes could improve kNN's speed [69]. As a predictive model, the ARIMA model was widely utilized. For data with a brief range recurring pattern, this model is great [70]. Kalman Theory is used widely in traffic monitoring systems. This technique do not ensure complete precision, but is the most used technique for estimating traffic systems [71]. LSTM neural networks learns the time series for long periods and determines the optimal time lag for prevision automatically [72]. The CNNs have demonstrated their ability to extract spatial characteristics in a local or urban region [73]. The results of this comparison are presented in Table 3.
In addition, the model also evaluated using the time series models including Seasonal ARIMA (SARIMA). SARIMA is a time-series prediction model for periodic time series and the seasonal difference data especially in the traffic speed time series [74]. As the SARIMA formula has one time-dependent input, this model was implemented using the time and velocity parameters.
As shown in Table 3, the GA-MENN model has the best performance among the compared models. The second best performance belongs to generalized feedforward regression model with 2.04939 units higher RMSE and 2.32369 units higher MAE than the best model. Since the CNN and LSTM for the training, validation, and testing are higher than those of GA-MENN. Based on the results, the genetic algorithm improved the prediction model performance to 20%, reduced the error between the expected velocity and the real velocity to 13%, and enhanced the similarity between the anticipated vehicle velocity and the real vehicle velocity to 28%.Genetic algorithm, by avoiding falling into the local minimum and utilizing the parallel search ability, assists Elman neural network to achieve stability and high accuracy. Given the extremely low MAE of the GA-MENN model with respect to the experimental data, it can be claimed that this model has a strong generalization capability. The experimental results also confirm the ability of the proposed model to learn spatial-temporal features for traffic flow prediction. It was also observed that all models performed better than Logistic Regression, because it is a simple linear model that cannot properly reflect the nonlinear features of traffic flows.
Based on [59], the RNN and deep learning methods including LSTM are two types of the networks that have higher performance in time series prediction with ability of motorizing previous data. The previous studies that have proposed deep learning methods ( [57], [59], [60]) have successfully confirmed the performance of deep learning networks to predict short-term traffic flow using spatial-information features of the vehicle velocity. The results of this research confirmed the good performance of the GA-MENN model that is based on RNN using the various input variable (including spatial-temporal information, weather, working/weekend days, and holiday status), especially when the volume of available data are not large, due to better training, evaluating and testing precision of the GA-MENN model compared to those of the CNN and LSTM methods.
For a closer examination, the following formula was used to calculate the mean square error (MSE) over different epochs for the training and validation datasets. The results are plotted in Figure 13.
In Figure 15, it can be seen that from the Epoch30 onwards, MSE has decreased at a slower pace and has approached its final value. Overall, it can be claimed the proposed model is VOLUME 8, 2020 certainly among the best models for short-term traffic prediction and exhibits an excellent performance under variable conditions.

V. CONCLUSION AND FUTURE WORK
In our paper, we propose the GA-MENN model to enhance the accuracy of short-term traffic flow prediction with consideration of climate conditions, day's status and weekday and hour. Our paper centers around the forecasting of traffic flow, however, we incorporate the data of actual weather features, weekday, and day's classes in our models. First, we compare our GA-MENN models with classical machine learning techniques both parametric and non-parametric, including Regression Multi-layer Perceptron, Linear Regression, Logistic Regression, Probabilistic Neural Network, Regression Generalized Feedforward, Time-lag Recurrent Network, Support Vector Machine model, Elman neural network, K-NN model, ARIMA, Kalman filter model, Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM) model. Our model reaches the lowest error rate based on three evaluation criteria: MMAPE, MAE and RMSE. The prediction precision improved up to 12.3% in comparison to the best model among them (Kalman Filter) with the same hyper parameters. In addition, the lowest RMSE error among all models is achieved. Accordingly, we suppose that our paper could be valuable for the modelling, analysis and mining of large-scale traffic data and short-term traffic prediction.
Comparing the results showed improvement of GA-MENN algorithm in traffic prediction compared to the ENN algorithm, about 16.7% on average in the prediction precision. This result indicates the merit of the genetic based modified algorithm for improving the ENN model in the traffic flow prediction.
Future works should address the need for more advanced architectures for prediction of short-term traffic flow. Also, more sophisticated machine learning models can be used in other sub-branches of transportation research, including prediction and identification of accidents. Moreover, researchers can take advantage of other deep learning methods such as Deep Reinforcement Learning model. Considering the low cost of Google services, it is possible to spend more on purchasing more extensive data for developing more accurate models. Researchers can also try combinations with other machine learning models in order to reach more sophisticated and more widely applicable models.