Prediction of EV Charging Behavior Using Machine Learning

As a key pillar of smart transportation in smart city applications, electric vehicles (EVs) are becoming increasingly popular for their contribution in reducing greenhouse gas emissions. One of the key challenges, however, is the strain on power grid infrastructure that comes with large-scale EV deployment. The solution to this lies in utilization of smart scheduling algorithms to manage the growing public charging demand. Using data-driven tools and machine learning algorithms to learn the EV charging behavior can improve scheduling algorithms. Researchers have focused on using historical charging data for predictions of behavior such as departure time and energy needs. However, variables such as weather, traffic, and nearby events, which have been neglected to a large extent, can perhaps add meaningful representations, and provide better predictions. Therefore, in this paper we propose the usage of historical charging data in conjunction with weather, traffic, and events data to predict EV session duration and energy consumption using popular machine learning algorithms including random forest, SVM, XGBoost and deep neural networks. The best predictive performance is achieved by an ensemble learning model, with SMAPE scores of 9.9% and 11.6% for session duration and energy consumptions, respectively, which improves upon the existing works in the literature. In both predictions, we demonstrate a significant improvement compared to previous work on the same dataset and we highlight the importance of traffic and weather information for charging behavior predictions.


I. INTRODUCTION
Climate change has become a growing concern in recent years with thirty-three countries jointly declaring a climate emergency as of January 2021 [1]. Global energy consumption is a major contributor to the climate crisis, and in particular, the transportation sector accounts for over a quarter of the global energy consumption [2]. The United Nations (UN) projects that two thirds of the world's population will reside in urban areas by 2050 [3]. This would increase the demand for urban mobility, leading to further energy consumption and emissions of greenhouse gases. Studies have shown that electric vehicles (EVs) have the potential to reduce carbon The associate editor coordinating the review of this manuscript and approving it for publication was Claudio Cusano . emissions by 45% compared to conventional internal combustion engine (ICE) vehicles [4]. EVs were initially limited by factors such as reliability and battery range, which have significantly improved in recent years and led to an increase in EV popularity [5]. As a result, the trust in EV reliability has grown and satisfaction among EV owners are higher [6]. The driver flexibility has also increased with the addition of charging stations in many parts of the world, often lead by various government initiatives encouraging further adoption of EVs. These factors have placed EVs to be in a pole position with regards to providing a clean source of transportation.
There still remains a few challenges, most notably the charging time and public charging needs, despite the promising potential. Although EV charging time has significantly decreased over the years, it is still on average much higher than the refueling time for ICE vehicles. Emerging charging technologies such as extreme fast charging [7] and wireless charging [8] are promising but are still overcoming various challenges and will require years before being adopted. The constraints from charging infrastructure means that most EV owners rely on public charging stations, which poses a strain on power distribution grid due to the high-power requirements of the EVs [9]. To avoid power grid degradation and failures, un-coordinated charging behavior must be avoided. The optimal solution is to better manage the scheduling of charging stations. The research on smart scheduling using data driven approaches are plentiful and include optimization [10] and metaheuristic [11] approaches. Furthermore, psychological factors influencing charging behavior [12] as well as transactions data and interviews with EV drivers [13] have been used for charging behavior analysis. A comprehensive review of charging behavior analysis using machine learning and data-driven approaches is presented in [14], which concludes that machine learning based approaches are more suitable to scheduling approaches with the ability to provide quantification and more realistic representation.

A. RELATED WORKS
Although predictions of EV charging behavior can have various categories, the focus of this work will be on session duration and energy consumption. Examples of other charging behavior include the prediction of whether the EVs will be charged the next day [15], identification of the use of fast charging [16], prediction of the time to next plug [17], charge profile prediction [18], charging speed prediction [19] and prediction of charging capacity and the daily charging times [20]. These behaviors provide valuable insights, but the prediction of session duration and energy time is more valuable for scheduling purposes.
As will be defined in the following sections, session duration is directly related to the departure time. It is the arrival time, which is a known variable, minus the departure time. Therefore, one can assume the prediction of either the session duration or the departure time to have the same application. Lee et al. [21] introduced a novel dataset for non-residential EV charging consisting of over 30000 charging sessions. They used Gaussian mixture models (GMM) to predict session duration and energy needs by considering the distribution of the known arrival times. The testing dataset included the month of December 2018 and the reported symmetric mean absolute percentage errors (SMAPEs) were 14.4% and 15.9% for the session duration and energy consumption, respectively. In this work, only historical charging data was considered for obtaining the predictions. In [22], the authors used support vector machines (SVM) for the prediction of arrival and departure time for EV commuters in a university campus. Using historical arrival and departure times and temporal features i.e., week, day, and hour, the reported mean absolute percentage error (MAPE) was 2.9% and 3.7% for arrival and departure times, respectively. For comparison, a simple persistence model was used as reference and SVM hyperparameter tuning was not addressed in the work. Frendo et al. [23] predicted the departure time of EVs using regression models. Historical charging data was utilized, and eight features were used including, car ID, car type, weekday, charging point, car park location, parking floor and arrival time. For prediction, three regression models were trained namely, linear regression, XGBoost and artificial neural network (ANN). XGBoost achieved the best results with mean absolute error (MAE) of 82 minutes. In [24], ensemble machine learning using SVM, random forest (RF) and diffusion-based kernel density estimator (DKDE) was used for session length and energy consumption predictions. For training, historical charging records from two separate datasets were used, with one of them being public and the other being residential charging. The ensemble model performed better than the individual models in both predictions and the reported SMAPEs were 10.4% for duration and 7.5% for the consumption.
Xiong et al. [25] predicted the start time and session duration using mean estimation. Session duration was then used to obtain energy consumption predictions using linear regression. The charging behavior predictions were integrated to flatten the charging load profile and stabilize the power grid. However, the prediction performances were not evaluated quantitatively. In [26], several regression models were used to predict the energy requirements from public charging stations data for the US state of Nebraska. Besides historical charging data, parameters such as season, weekday, location type and charging fees were used as input features. On the test set, XGBoost model outperformed linear regression, RF and SVM obtaining a R 2 score of 0.52 and MAE of 4.6 kWh. The authors in [27] used k-nearest neighbor (k-NN) to predict the energy consumption at a charging outlet using data from a university campus. The problem was formulated as timeseries forecast whereby energy consumption prediction for the next day (next 24 hours) was made using energy consumption of previous days. The highest SMAPE was 15.3% using k value of 1 (1-NN) and a time-weighted dot product dissimilarity measure. Similarly, Majidpour et al. [28] also predicted the next day energy needs of a charging station based on previous days energy consumption using various algorithms including SVM and RF. They also experimented with pattern sequence-based forecasting (PSF) [29], where clustering is first applied to classify the days and predictions are made for that day. The PSF-based approach provided the most accurate results with average SMAPE value of 14.1%. Table 1 provides a summary of the related works in the literature.

B. OBJECTIVES
Although the above works from the literature have successfully applied machine learning for the prediction of session duration and energy consumption, they have mainly focused on utilizing historical charging data. In some cases, additional derived features such as vehicle information, charging location information and seasonal information were used.  This has motivated us in this work to investigate the use of additional input features including weather, traffic and local events and observe its impact on the accuracy of charging behavior predictions. The key contributions of this work are the following: 1) We propose a novel approach in EV charging behavior prediction that utilizes weather, traffic, and local events data along with historical charging records. 2) We use several machine learning algorithms including RF, SVM, XGBoost and ANN for predictions of session duration and energy consumption on the adaptive charging network (ACN) dataset. 3) We empirically show that the use of additional data has a positive impact on the accuracy of predictions and significantly improves upon the previous work on the same dataset that used only historical charging information.
The rest of the paper is organized as follows. Background information including key concepts in machine learning is provided in Section II. This is followed by a detailed explanation of the methodology, including dataset description, and experimental setup in Section III. Section IV presents and discusses the results of this work. Future research directions are provided in Section V, and Section VI concludes the paper.

II. BACKGROUND
This section summarizes the background information including the algorithms used in this work and the evaluation metrics for predictions.

A. SUPERVISED MACHINE LEARNING
The main objective in machine learning (ML) is to develop a learning framework that can learn from experience, i.e., the training dataset, without explicit programming. Primarily, ML algorithms are classified as either supervised learning or unsupervised learning. In unsupervised ML, the training data is not labeled, and the goal of the algorithm is to group similar data points. Conversely, in supervised learning, the models are trained from labeled dataset that contains the specified output or target variable, i.e., the variable to be predicted. The representation between the input and target variable is learned iteratively by optimizing a specific objective function. In this work, the target variables, i.e., the session duration and the energy consumption are both labeled, and thus supervised learning will be used. Furthermore, since both target variables are continuous values, we are going to use regression models as opposed to classification models which deals with categorical target values. The four regression models used in this work are RF, SVM, XGBoost and deep ANN. The following paragraphs describe each of them briefly.
A decision tree (DT) can be used to separate complex decisions into a combination of simpler decisions using split points from the input features. Leaf nodes are the points where no further split is made whereas a decision node is the point where decisions take place. Predictions are made by taking the average value of all the items in the leaf node in regression. Although simple to implement, a single DT is prone to overfitting. To overcome this problem, multiple DTs can be aggregated, and this is the essence of a random forest (RF) algorithm. Bagging method is used in this case where the trees are created from various bootstrap sample which is sample with replacement. The average value of the predictions across all the trees are taken as the final prediction for regression problems [30].
Similar to a RF, a gradient boosting algorithm [31] makes use of multiple DTs. However, in this algorithm each tree is built sequentially and as a result the errors made by previous trees are taken into consideration which often leads to superior performance. XGBoost [32] is a more recent variation of the gradient boosting algorithm. XGBoost has gained popularity over the last few years for its success in machine learning competitions mainly due to it being effective in dealing with the bias-variance tradeoff [33]. This means that the algorithm is able to avoid overfitting on the training data while at the same time maintaining enough complexity to obtain meaningful representations.
A support vector machine (SVM) [34] is used for both classification and regression problems. It is sometimes referred to as support vector regression [35] when exclusively applied to regression problems. SVM separates the classes with the best hyperplane that can maximize the margin between the respective classes. The key objective is to map the inputs to high dimensional feature spaces where they are linearly separable. This is achieved using kernels such as linear, polynomial, and radial basis function (RBF). SVM is not suitable for larger datasets due to its long training time.
Deep learning-based models contain a large amount of composition of learned functions. Using layered hierarchy of concepts, complex concepts are defined in terms of simpler concepts and more abstract representations are gathered using less abstract ones [36]. Variations of deep learning algorithms include convolutional and recurrent neural networks, which have been successful in image and audio classification tasks. In this work, we consider artificial neural networks (ANN), often referred to as a multilayer perceptron (MLP). MLPs utilize non-linear approximation given a set of input features and can be used for both regression and classification. An MLP consists of input layer which is fed with a given set of input features, the hidden layers which learns the representations and the output layer which makes the final predictions. When the number of hidden layers is two or more, the model is referred to as deep ANN. In ensemble learning, set of individually trained classifiers are combined and then used to predict new instances, often providing more accurate predictive performance than the individual classifiers [37]. Figure 1 illustrates the concept of ensemble learning. Both RF and XGBoost are examples of ensemble learning, where individual models (in these cases DTs) are first evaluated and then integrated into a single model. The motivation behind such approach is similar to asking multiple experts about an opinion, and then taking their votes to make the final decision [38].

B. EVALUATION OF REGRESSION MODELS
To assess the performance of predictions made by regression models, numerous metrics are used as discussed in [39]. In this work, we will define and use four measures that were commonly used in related works. Equations (1)-(4) defines the metrics that will be used in this work: Root mean square error (RMSE): Mean absolute error (MAE): Coefficient of determination or R 2 : Symmetric mean absolute percentage error (SMAPE): where y represents the actual value,ȳ is the predicted value, µ is the average of the actual values and n represents the groups of values in the dataset. Generally, lower scores of RMSE, MAE and SMAPE indicate accurate predictions, and this occurs when the predicted value,ȳ is very close to the actual value y. The R 2 value is a measure of goodness of fit for regression and is usually a score between 0 and 1. A score of 1 indicates perfect predictions and generally a higher value represents better performance. We do not consider mean absolute percentage error because it is inconvenient when the actual value y is close to 0, therefore creating a bias. Rather we consider SMAPE which is more suitable for EV charging prediction because both the original and the predicted values are in the denominator [24].

III. METHODOLOGY
In this section, we define the approach used for the prediction of charging behavior. We formulate the problem, describe the dataset, highlight the preprocessing steps, and discuss the methods for training the learning models.

A. EV CHARGING BEHAVIOR
Assuming t con represents the connection time when the car first plugs in, t discon represents the disconnection time when the car plugs out and leaves the station and e represents the energy delivered to the car during the session, we consider the session charging behavior B session as following: Based on the above, we can define the length of charging session or the session duration, S dur , as follows: S dur = t discon − t con (6) In this work, we predict both the session duration and the session energy consumption of an individual charging record and assume that the connection time is known.

B. DATASET DESCRIPTION
Besides the charging dataset, we also make use of weather, traffic, and local events data in order to predict the charging behavior. We will briefly describe the datasets used and highlight their attributes. VOLUME 9, 2021 Scheduling of EV charging is more significant in public charging structures due to the unpredictable nature of the charging behavior, especially in places like shopping malls. The ACN [21] dataset is among the few publicly available datasets for non-residential EV charging and will be utilized in this work. The dataset contains charging records from two stations in the university campus, namely JPL and Caltech. Unlike the Caltech station, which is open to public, the JPL station is only accessible to employees and therefore will not be considered in this work. Registered users can manually enter additional details, such as their estimated departure time and requested energy, by scanning a QR code through their mobile applications. The dataset can be accessed from [40] by either a web portal or python application programming interface (API).
Although there is a small weather station located at the Caltech campus [41], we did not consider it for this work due to missing values and irregular interval recordings for the wind variable. Additionally, this station did not record variables such as rainfall and snowfall which could potentially impact charging behavior. We therefore used the weather data from NASA's Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) [42] which provides data for the precise location of the charging station. The accuracy of satellite weather data in comparison to ground stations has been compared in [43]. Although it has been shown that given a specific location some weather parameters may be more accurately detected using ground stations, for the purpose of this work we do not require a high level of accuracy but rather a more general perception of the impact of weather on charging behaviors. For example, we are interested in observing how the charging behavior is impacted during heavy rainfall as opposed to drier conditions.
Obtaining historical traffic data for specific roads and regions is challenging. Conventional traffic collection methods include intrusive approaches such as road tubes and piezoelectric sensors and non-intrusive approaches including microwave radar and video image detection [44]. With most of these approaches, scalability is an issue, and in most cases, specific roads are not covered. For instance, the city of Pasadena (where the charging data originates from) provides an open data site [45] for the traffic count around the city. However, for most roads in the city it contains traffic count for some period of time and therefore is not usable in our case where we require regular interval data. Additionally, not all roads and streets are covered. As a result, we decided to use traffic data from Google maps, which has also been used in previous machine learning applications [46]. The data is collected by recording the location data from the commuter's mobile devices provided they use the application and have agreed to share their location. The data collected from individuals is anonymized and aggregated to address any privacy concerns [47]. The Google maps distance matrix API can be used to retrieve the data. Given a source and destination coordinates, the travel distance and the time taken is returned for a given departure time. We retrieved historical trip time for 9 of the closest roads and streets which one must take to access the charging station.
Since the charging station is located in the Caltech university campus, we decided to include campus events and find out if the number of events have an impact on the charging behavior. The number of events in an hour were obtained from the Caltech university website calendar [48]. For simplification, we decided to round the minutes to the nearest hour, therefore if an event started at 10.20 am, it was counted as an event starting at 10 am.

C. DATA PREPROCESSING
Cleaning and preprocessing the dataset is vital to ensuring the quality of the predictive models. These include removing faulty records and outliers.
The presence of outliers can negatively impact the model performance. A common technique of graphically detecting outliers is boxplots [49]. The boxplots for both target variables contained outliers, as shown in Figure 2. We notice that the outliers for both variables are not consistent, i.e., we have far too many outlier points for energy consumption than the session duration. It is possible that certain vehicles consume far greater amount of energy even if the session duration is not too long. As a result, we opted to perform multivariate outlier detection using the isolation forest algorithm which constructs an ensemble of iTrees for a given data set. The outliers are those instances which have short average path lengths on the iTrees [50]. By randomly selecting a variable and a split value between the minimum and maximum of the selected variable, the observations are 'isolated'. Partitioning of observations are repeated recursively until all of them have been isolated. After the partitioning, observations that have shorter path lengths for some particular points are likely to be the outliers. Figure 3 illustrates the process in detecting the outlier of the target variables, with the axes normalized for both response variables. A total of 697 outliers were detected which accounts for 4% of the total observations. For the charging data, we only considered charging records that were registered, i.e., contained user IDs, and this accounted for 97% of the records. For the weather data, the time of recording was in universal time and we used the pytz [51] library in python to convert the time zone to be the same as that of the charging records. We also converted the temperature units from kelvin to degrees Celsius. Then for each given hour, we also computed average of the previous 7 hours of weather and the average of the next 10 hours, experimentally determined to provide accurate representation. This would allow us to understand how the previous weather and the weather after charging impacts charging decision. For instance, heavy snowfall in the previous hours may account for shorter charging duration and so on. We also had to convert the time zone from coordinated universal time for the traffic data. We then aggregated the traffic for each hour across the nine selected roads and streets. It must be noted that we considered the average trip time as well as the maximum trip time as estimated by Google maps. Finally, we aggregated the total events in the campus for each hour.
To merge the various data, the time-series fields were converted to date-time objects using pandas [52] library. Then to obtain weather, traffic, and events for a particular charging record, we first obtained the nearest hour that the connection time belongs to. For example, the connection time of 22:11 belongs to 10 pm. This allows us to easily extract the other information. Instead of simply selecting the traffic level for a given time, we selected the total traffic after arrival until the end of the day. If a vehicle arrived at 2 pm, for instance, we accumulated the traffic from 2 pm until the end of that day. This would allow the model to learn how the traffic level impact the charging behavior. Similarly, we considered the total events after arrival until the end of the day.

D. FEATURE ENGINEERING
Feature engineering refers to the transformation of data into meaningful representation using human knowledge. This process is labor intensive but important nonetheless as this is a weakness of the learning algorithms. Feature engineering relies on human ingenuity and prior knowledge to compensate for the inability of the algorithms to extract and organize the discriminative information from the data [53]. We discuss the future engineering steps next.
Firstly, we convert the time fields that will be used by the models into numeric format by simply dividing the minute by 60 and adding to the hour. Then, for each charging record, we find out their average departure time, session duration and energy consumption. This is done by finding out the user ID of the charging record and aggregating his previous records. We use the arrival time as a numeric feature. However, the arrival time also has other components such as the date information. Using this, we extract the hour of the day, day of the month, month of the year, day of the week, whether the day is a weekend and whether the day falls in a US federal holiday. However, temporal information such as day, hour, and month are cyclic ordinal features. This is because the hour value of 23 corresponding to 11 pm, for example, is actually close to the hour value of 0 which corresponds to 12 am. To represent the proximity of these values, trigonometric transformation is performed as following: where f represents the cyclic feature to be transformed, f x and f y represents the first and second components of the cyclic feature, respectively. To transform other categorical variables, one-hot encoding was used, where a single variable with n points and k distinct classes is transformed into k binary variables with n points each. For numeric variables, feature scaling is a common transformation where the goal is to normalize the range of the numeric features. There are various scaling techniques, including scaling by domain where all the features are scaled to a specific range such as [0, 1] and scaling to minmax where the features are scaled to the range [0, R], in which case the minimum of the maximum value of feature in all directions is assigned as the radius of the sphere R [54]. However, in this work we have used standardization which ensures the values of each feature to have zero mean and unit variance. The transformations were performed using the preprocessing package of the Scikit-learn [55] library. Table 2 lists the features used for training.

E. MODEL SELECTION AND EXPERIMENTAL SETUP
We selected all charging sessions from the ACN dataset that belonged to the 2019 calendar year, which ensures we take the seasonal factors into consideration during training. The dataset was split such that 80% of the records were used for model training and 20% for evaluation. During the training phase, we performed K-fold cross validation where the algorithms are repeatedly trained K times with a fraction 1/K training examples left out for testing [56]. In this case, we selected the common K value of 10. To determine model hyperparameters, we utilized grid search method which determines the optimal set of parameters from a given list by trying out all possible values of the specified parameters [57]. We performed the grid search across K-folds, selected to be 5 in this case to speed up the grid search. We then evaluated all the models using the aforementioned regression metrics. Inspired by the success of ensemble learning methods in previous works, we also decided to experiment with ensemble learning. We used two variants of ensemble stacking, namely voting regressor and stacking regressor, using the ensemble  package of the Scikit-learn library. In a voting regressor, several base regressors are trained on the entire training set, and the average predictions made by the base models are treated as the final prediction. Stacking regressor is based on the concept of stacked generalization where predictions made by the base models are used as inputs to a final estimator, which is trained using cross-validation, to generate predictions [58]. Figure 4 provides a graphical representation of the framework.

IV. RESULTS AND DISCUSSION
We begin the experiment with RF algorithm which can be used to visualize the variable importance [30]. This is a method for feature selection where certain variables that are not important and can often hinder performance are removed. In this case, the inclusion of the least important variables had a very insignificant performance increase and hence we decided to include them in model training. Additionally, variables can be ranked in terms of their relative importance. This is determined by each feature's contribution in determining the most effective splits. In Figures 5 and 6, we plot the top 10 important variables for session duration and energy consumption, respectively. The two most important predictors of session duration are the maximum traffic after arrival and the time of connection. This indicates the usefulness of including traffic information for the prediction of session duration. However, for energy consumption, the historical average consumption is by far the most significant. This is because a specific vehicle will consume similar energy if the session duration is consistent.

A. SESSION DURATION PREDICTIONS
As mentioned earlier, the hyperparameters for the models were determined using the grid search approach. For the deep ANN training, we experimentally determined an architecture with 3 hidden layers of 64, 32 and 16 nodes respectively to be the most suitable. Rectified linear units (Relu) [59] was used as the activation function for all hidden layers and the output layer contained a linear activation as we are expecting the prediction to be a numeric value. The learning rate value was set to 0.001 and we used the Adam [60] algorithm for model optimization. The training batch size was 32 and the number of iterations were 15 epochs. Appendix 1 displays the training loss curve and Table 3 summarizes the 10-fold cross validation scores on the training set. The training scores are very similar for RF, SVM and XGBoost whereas deep ANN performs slightly worse.  Therefore, we aggregated the 3 best performing models in the training phase into 2 ensemble models, which resulted in improved cross validation scores. Next, we present the results on the test set. For reference, we also selected the user estimates of their departures as prediction. This value was collected through a smart phone app where users were asked to enter their estimates of their departure time and consumption upon arrival. We summarize the results on the test set in Table 4. As highlighted, the best results are obtained using the ensemble learning approach, which is consistent with previous works [15], [24]. Voting regressor performs best on 2 metrics and stacking regressor performs the best in terms of RMSE, whereas they both achieve the same R 2 score. The results are consistent with the training performance with RF, SVM and XGBoost resulting in similar performance and deep ANN performs the worst of the four base models. Predictions made by user about their own session length is also far off the actual session length. This indicates that perhaps relying on users to provide an estimate of their own departure time is perhaps not suitable.

B. ENERGY CONSUMPTION PREDICTIONS
Similar approach to the session duration prediction was also used here. The only exception was the deep ANN architecture which in this case contained 2 hidden layers with 64 and 16 nodes, respectively. The training batch size was 64 and the number of epochs was set to 20. Appendix 2 presents the loss curve from the training phase. Table 5 summarizes the 10-fold cross validation scores on the training set.
RF has the best cross validation scores whereas the other 3 models have similar scores. We selected the top 3 models, i.e., RF, SVM and XGBoost to form the 2 ensemble models. In this case, the ensemble models did not improve upon the best performing RF model but rather achieved similar results on training. The results from the test set are presented in Table 6. We also compare the results with user predictions about their consumptions.
The best results as highlighted were obtained using the stacking ensemble model. The improvement using ensemble learning for energy consumption prediction was perhaps not as significant when we compare with the session duration. The user predictions about their consumptions are not accurate in this case as well.

C. COMPARISON AND DISCUSSION
When we compare across both predictions, looking at the overall R 2 and the SMAPE, it appears that the prediction of energy consumption is perhaps more difficult. This is consistent with the previous work on the ACN data [21]. However, in another case the opposite was observed [24], i.e., the prediction of energy consumption was easier. Moreover, in both scenarios, it was also noticed that the user predictions about their own behavior is very different to their actual behavior, which further emphasizes the need for predictive analytics. The users' predictions in terms of their energy consumption are slightly more accurate when compared to their predictions of session duration as indicated by better R 2 and SMAPE values. This could be due to the users' lack of interest in entering their estimates every time they decide to charge their vehicles. We also noticed that the performance using deep ANN was the least accurate in both cases. Although deep learning models are proven superior in dealing with images and audio data where feature extraction is not performed, in applications such as this where we perform feature extraction, traditional ML models usually perform better. Furthermore, predictions made by ensemble learning outperformed predictions made by individual ML models in both scenarios, although the impact was more significant for session duration prediction. This is most likely because in the first scenario, the top 3 performing models had similar training performance and combining their predictions resulted in an improvement. However, in the latter scenario, one model clearly outperformed the rest in training and hence the improvement using ensemble learning was not significant.  Looking at the previous works in the literature, the results in this work outperformed all the previous works that reported similar evaluation metrics ( [21], [23], and [26]- [28]). We summarize the results from the previous works in comparison to the one achieved in this work in Table 7. In comparison to [24], the results obtained in this work for session duration is more accurate although we do not improve upon their results for energy consumption. This is most likely because the authors in [24] utilized both residential and non-residential data for their predictions, and residential charging behavior in most cases are more consistent. However, it must be noted that all previous works except [21] used a different dataset to this work and therefore a comparison is perhaps not suitable. Therefore, keeping the comparison across the same dataset, we can conclude that the utilization of the additional weather, traffic and events data resulted in an improvement in the EV charging behavior predictions.

V. RECOMMENDATIONS AND FUTURE WORK
We have quantitatively shown in the previous section that the traffic and weather data were important predictors in EV charging behavior, particularly in the case of session duration. Although the use of local events data (campus events in this case) had insignificant impact in terms of performance gain, it cannot be ruled out for future work. In this work, we obtained all campus events from the university calendar. However, perhaps only the significant events that draw more crowd should be taken into consideration. It is possible that events data may not impact predictions in a university campus setting. However, for other public spaces such as shopping malls for example, events like end of the year sale could be important predictors. Therefore, similar experiments on other public charging spaces should be carried out to find the impact of local events. Social media can also be explored as a means to obtain information about local events as well as driver behavior. For instance, social media has been shown to be a good tool for estimating human behavior [61] and also is a significant predictor of truck drivers' travel time [62]. It is also likely that the use of vehicle information such as the vehicle model and vehicle type can improve predictions, especially in terms of energy consumption. Some of the previous works have utilized vehicle information [23] but not in conjunction with weather, traffic, and events. Finally, to better understand the charging behavior during the COVID-19 pandemic, a case study should be conducted using the proposed approach to validate the predictive performance in uncertain situations.

VI. CONCLUSION
In this work, we presented a framework for the prediction of two of the most important EV charging behaviors with regards to scheduling, namely EV session duration and energy consumption. Unlike previous work, we utilized weather, traffic, and events data along with the historical charging data. We trained four popular ML models along with two ensemble learning algorithms for the prediction of charging behavior. The results obtained in terms of prediction performance is superior to the results in the previous works. We have also provided a significant improvement of charging behavior prediction on the ACN dataset and demonstrated the potential of utilizing traffic and weather information in charging behavior prediction.