Electric Vehicle Supply Equipment Day-Ahead Power Forecast Based on Deep Learning and the Attention Mechanism

Transports is one of the sectors that produce the highest emissions of CO2; in the last ten years, there has been a process of decarbonization which has led to a considerable increase in Electric Vehicles (EVs). However, the sudden introduction of a large number of Electric vehicle supply equipment (EVSE) supplying electrical energy to EVs could cause problems in the management of the electric grid which must cope with the consequent increase in the electrical load demand. In this context, the 24 hour ahead forecast of the power curve associated with the recharge of EVs becomes of vital importance to ensure the reliability of the electric grid. In this paper, different Machine Learning models based on Recurrent Neural Networks (LSTM, GRU) and with different architectures, are compared based on their capability to accurately predict the power curve of an EV charging station one day in advance. A Sequence to Sequence model has been implemented and a thorough analysis of an Attention layer has been detailed. The models are tested on a real world open dataset.


I. INTRODUCTION
E LECTRIC load forecasting has gained more and more attention during the last ten years along with the increase in global electric car stock that reached 26 million units in 2022 [1].The main reason behind this phenomenon is found in the growing awareness about climate change that led consumers to adopt more environmentally friendly choices.In 2022, the transport sector alone produced 7.98 Gt of CO 2 [2], the only ways to reduce its impact are decarbonization and electrification of transports, however, this type of evolution still has its own downsides when it comes to the huge increase of EV fleet that will cause a higher need of charging stations, numerous studies have been conducted to model the penetration of electric vehicles and their impact on the national grid [3].Electric vehicle supply equipment (EVSE) supplies electricity to an electric vehicle (EV) [4].It is usually called a "charging station" or "charging dock" and it provides electric power to the vehicle for recharging the EV batteries.The new power load demand will cause a significant rise in the peak load and a decrease in reserve margin [5] representing a threat to the security of the national power grids.In this context the forecast of power load demand from EVSEs becomes the turning point to ensure the perfect management of the national grid balance.
EV load forecast is a form of Demand Side Management (DSM) [6], [7], [8] that can be crucial to balance load demand and power production.It is particularly relevant when the vehicles are integrated into Distributed Generation (DG) such as a micro-grid [9].The management of energy dispatch is the main issue when dealing with DG, the optimization of energy flows is not trivial and is largely addressed in the literature using several different techniques [10], [11].Renewable Energy Sources (RES) play a key role in DG, a coupling between EV load forecast and RES generation forecast is strategic in this sense, a common example is the coupling with photovoltaic power forecast [12].In the framework of a micro-grid, an EV could also play a double role, behaving alternately as a load and as a source of energy (Vehicles-to-grid) or even acting as a decentralized source for energy trading, becoming Connected Electric Vehicles [13], [14].
Based on the time-horizon of the prediction, short-term, medium-term, and long-term forecasting can be defined.Although there is a lack of coherence in this classification in literature, Short-Term Load Forecast (STLF) usually refers to predictions of the load demand between one hour and one week ahead (crucial in the optimization of the grid operation), Medium-Term Load Forecast (MTLF) is used to forecast load from one week to one year in the future and is mainly important in the planning of maintenance operations; lastly, Long-Term Load Forecast (LTLF) has to do with predictions that exceed one year and are mainly involved in the decision process for big investments in new infrastructures and generation units [15].Another category that has gained importance in the latest years is the Super-Short-Term-Load-Forecasting (SSLTF) [16], also defined as Ultra-Short-Term-Load-Forecasting [17] or Very-Short-Term [18], corresponding to minute predictions or, in general, under one hour.
Electric load forecasting methods can be categorized into traditional statistical models and artificial intelligence models [17].Traditional models include time series analysis methods, Autoregressive Integrated Moving Average (ARIMA), regression analysis, Kalman filtering, and statistical methods, while artificial intelligence methods include Artificial Neural Networks (ANN), Support Vector Machine, and Deep Learning (DL) models.At the very early stage in EV load forecasting studies, statistical models were the most suitable choice as the lack of real, organized data about EV charging made it necessary to build realistic scenarios data through computational algorithms.With the availability of actual EVSE load data, the approach to EV load forecasting shifted totally from a probabilistic to a data-driven one.Nowadays the main direction of the research in EV power load forecasting is concentrated on ANNs in all their declinations [19], [20].
In this paper S2S (SequencetoSequence) based on Long-Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models are proposed, an attention mechanism inspired by Badahnau [21] with dot alignment score is added.Our contributions in this paper are: 1) a review of the state-of-the-art models for electric and EV load forecast.
2) The application of the proposed models to a real world EV charging dataset with the aim to test and compare the performances of EV charging load forecast with 24 hours in advance.3) A detailed analysis on the forecasting accuracy of more complex models compared to simpler RNNs architectures with a focus on the "Attention" mechanism.4) A thorough appraisal of hyperparameters tuning and their influence on the forecasting performances.
The research is based on one of the few available public datasets found in the literature.The remainder of this paper is as follows.Section II describes the related works.Section III outlines the methodology adopted in the study with a detailed presentation of attention mechanisms, Section IV presents the case study analyzed in the paper, it describes how the dataset is composed, how the data has been treated and reshaped to enter the models introduced and how the results will be evaluated.Section V displays and discusses the results of both study cases while Section VI draws the conclusions of the study.

II. RELATED WORKS
Although it is a common opinion that Recurrent Neural Networks (RNNs) have outperformed all other techniques, there is not a model that surpasses the others in every task, this leads to a large literature of comparative studies among different ML models, as listed in Table I with increasing time horizon.In the framework of SSTLF forecasting, Machine Learning and in particular LSTM seems to be highly effective in forecasting the load on minute-level.In [16], the LSTM model results are superior to all other models even on a second, more random, dataset that is less representative of a typical usage.In [17] a novel LSTM-based model is the best-performing model on two different time scales.Authors in [22] present a model based on the Machine Theory of Mind composed of three networks, the first two are built based on LSTM networks.The authors of [23] propose a self-attention-based machine theory of mind assessing its superiority on other 5 state-ofthe-art models using a quantile forecast evaluation metric as the loss function.
Moving to slightly longer horizons (24 hours) in the realm of STLF, a larger variety of models and techniques has been analyzed in the literature, RNNs still have high relevance being at the base of more complex architectures that are developed such as Encoder-Decoder(ED) models.Reference [24] presents a performance comparison of four DL-based methods.After tuning the number of hidden layers the GRU model with only one layer results to have the best performance.In [25] the analyzed and compared models are tested on three different synthetically generated time series plus three real-life datasets.The authors in [26] propose an LSTM model coupled with feature engineering Empirical Mode Decomposition (EMD).
In [29], an S2S model is proposed, additionally, two different types of attention (Bahdanau [21] and Luong [43]), are added to the S2S.They concluded that, as the prediction length increases, the accuracy decreases, additional layers do not improve the forecasting accuracy, and that overall S2S models with attention performed better than non-S2S models for all input lengths.A two-stage STLF model based on LSTM and Multi-Level Perceptron (MLP) is presented in [35].In [31] a model based on Wavenet is described and evaluated through a comparison with state-of-the-art ANN.Bedi and Toshniwal [33] propose a framework (D-FED) to forecast electricity demand, it is based on LSTM with a novel moving windowbased Multi-Input-Multi-Output (MIMO) mapping approach technique.
Examples of Reinforcement learning are proposed by [28] and [37].RL has been applied to EV in literature also to integrate driving behavior in the Energy Management System (EMS) as in [44] and to manage the forecasted charging load in vehicle-based mobility-on-demand systems [45] or Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I BIBLIOGRAPHIC REVIEW ON ELECTRIC LOAD FORECAST SORTED BY THE INCREASING TIME HORIZON
coordinated charging management of EVs based on local information [46].Prophet-BiLSTM is employed in the day-ahead forecast of EV charging load in [47], it performed better than transformer and DNN.Reference [48] is a review on EV scheduling, clustering, and forecasting.Another review on both load and occupancy day-ahead forecasts is presented in [49], which direct and bottom-up approaches with both statistical and machine learning models.It results that Machine Learning models are mostly preferred over probabilistic models.For example, authors [50] present a novel spatiotemporal Deep Learning network for traffic flow prediction problems, which is capable of modeling the periodicity of traffic flow data with a well-designed time embedding strategy from 15 minutes to 1 hour.Regarding the MTLF forecast, reaching a monthly horizon, more complex model structures are proposed to face the reduced accuracy that results from the increase in the time ahead interval.A medium-term demand forecasting S2S model is presented in [39], while [41] proposes a trilinear deep residual network (TDResNet) structure.Finally, in [51], authors propose an optimization procedure exploiting EV forecast to schedule the recharging processes under dynamic power prices such as real-time pricing, time of use, critical peak pricing, and prime times refunds.
In this study, we conduct a comparison of forecasting models employing RNNs for STLF with a prediction horizon of 24 hours.This approach aligns with similar investigations outlined in the literature, as indicated in Table I.The method presented here can be similarly applied to charging stations, mirroring how Machine Learning has been employed to enhance sensors' intelligence [52].In our work, the data utilized for training and testing the models consist of publicly available records from EV charging sessions [27].To enhance the accuracy of the RNN models, we introduce an ED structure and incorporate an attention mechanism.The performance of these enhanced models is then directly compared with the baseline RNN model, considering persistence as a benchmark.Additionally, we conduct a sensitivity analysis on the number of layers and the quantity of RNN neurons in each layer to evaluate their impact on the final forecasting results.

A. Encoder-Decoder/Sequence to Sequence
The power load forecast of the charging station is a Manyto-Many prediction as it usually takes in input the time series of power sampled at a certain time-step and tries to predict another sequence of values, it could require the two sequences to have different lengths.A proper way to handle a Many-to-Many prediction is an S2S model.S2S models are based on an ED architecture that consists of 3 components: an encoder, a context vector, and a decoder.Both the encoder and the decoder are constructed as a stack of several RNN units.The encoder takes a sequence as the input and transforms it into a context vector, an element with a fixed shape that acts as a condensed representation of the input time series.The output of the encoder, the hidden state, is usually the state of the last RNN time step.The length of the vector depends on the number of RNN cells in the encoder.The context vector acts as the initial hidden state of the decoder part of the model as it takes the output from the encoder and feeds it repeatedly as input at each time step to the decoder.The decoder interprets the context vector to make predictions for each time step required in the output.

B. Attention Mechanisms
Attention mechanisms were introduced for the first time in 1997 by Bahdanau in [21], and a similar, modified version was later presented by Luong [43].They were developed to solve the main issue that arises when dealing with S2S models: the incapability of the network to remember long sequences given as input due to the limited length of the context vector.In an ED model, at each time step of the decoder, important elements for the creation of the current output could come from any point of the input sequence, but the context vector is not capable of encapsulating the data "seen" at the beginning of the input.Attention mechanisms try to replicate the human brain's capability of focusing on small portions of the entire information provided at a time, paying attention only to the ones that are relevant to the answer that is required.The encoder functions exactly as in the case of a simple Encoder-Decoder model producing a hidden state h for each time-step.The context vector is not constant, but changes at each time step of the decoder, calculated as a weighted mean among all the hidden states of the encoder as shown in (1).
where c t is the context vector at the current time-step t and h m is the hidden state of the last time-step in the input with m being the length of the input sequence.The weight α t,i is the alignment weight computed at the i-th time-step.Each α is the result of the application of a softmax function to the attention scores e t,i to normalize them.The attention scores are computed through a proper alignment model a(.), an operation that scores how well the current hidden state of the encoder h i matches with the hidden state of the decoder state s t−1 at the previous time-step.Equation ( 2) and (3) illustrate how alignment scores and alignment weights are calculated.The weights favor the hidden states that are matching with the previous output, in this way the context vector is paying attention only to the parts of the input that are considered relevant by the alignment function solving the problem of the limited length.
The decoder takes in input the context vector, the previous hidden decoder state and the current output, to compute the final prediction.The attention model described is the one proposed by Bahdanau [21], in this case the alignment function is a single-layer perceptron as shown in (4): where W a ∈ R n×n , U a ∈ R n×2n and v a ∈ R n are weight matrices that are being optimized during the training process.

IV. CASE STUDY
Despite the growing popularity of EVs, the main issue when dealing with EV charging load forecasting is the scarcity of data.Only a limited number of sources that provide complete data on EV load demand may be found, and most of them are usually not publicly available.The study case proposed in this work is based on one EVSE recording EV charging sessions: the JPL dataset from Adaptive Charging Network (ACN).

A. JPL Dataset Description
The data collected by ACN have been made available to provide researchers working in the field of electric vehicle charging with a complete set of real data coming from existing facilities.For the purpose of this study, the records from the JPL (Jet Propulsion Laboratory) database will be used.This site currently has 52 EVSEs, or charging stations, and it is only open to employees making it a good representation of a typical workplace schedule.Only a subset of all the available data has been considered suitable due to the anomalies associated with the pandemic, it includes the sessions from 31/12/2018 to 04/01/2020.Some of the values provided for each of them are relevant to the study: session ID, station ID, connection time, disconnection time, done charging time, and time zone.The session ID is a univocal code identifying the single charging event, in the same way, the station ID is a unique code associated with a charging spot.The connection time expresses the date and time at which the vehicle has connected to the station, similarly, the disconnection time is the instant in which the EV has disconnected from the charger, while the done charging time is the date and time at which the charging procedure was actually stopped.The time zone is used to calculate the UTC time.The stations in the JPL parking lot are all near in space and used mainly by the university employees so the power curve of the whole parking lot is analyzed as aggregated power.The multi-step forecast of the power load will be performed using Machine Learning models based on different Recurrent Neural Networks (RNN) and architectures, the time horizon is 24 hours with 15 minutes time-steps for a total of 96 steps, the data of the previous day (96 steps back) is used as input to make predictions on the next day.

B. Pre-Processing on JPL Data
The data has been organized as a time series, creating a series of 15-minute time steps, each of them associated with the correspondent sum of the average power required by each Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.vehicle that was in charging mode during that time interval.The time series has been re-framed to enter an ML problem with input patterns (X) and output patterns (Y).The current time is addressed as (t), future time steps (t + 1, . . ., t + N ) are the ones to be forecasted, and past observations (t −1, . . ., t−n) are used to make forecasts.At each time step X and Y are computed shifting the original dataset n steps back and N steps forward.After the reshaping, each sample X in the input is a 3D matrix with dimensions (samples, n, f ), the corresponding output Y is a similar 3D matrix with dimensions (samples, N, F) where f and F are respectively the features of the input samples and the ones to be forecasted.The lag l is the number of time-steps between the current time t and the first forecasted step t + l.The training samples generation is schematized in Figure 2.After the reshaping the input data is scaled between 0 and 1 using a MinMax scaler to achieve higher stability and shorter training periods.Then the data is split into a training set and a test set, having to do with time-series problems the sets have been split sequentially.

C. Model Tuning and Evaluation on JPL Dataset
The models used to make predictions are all based on RNNs, an S2S model with LSTM/GRU layers has been implemented, and it is compared to simple LSTM and simple GRU.Additionally, an attention layer inspired by the one described by Badahnau [21] is introduced in the ED model to check whether or not it can improve its performance.Proper tuning is performed on the models to identify the best architecture for each one.The hyperparameters undergoing the tuning are the number of units in the RNN layers and the number of RNN layers.In the case of S2S structures, both the number of layers in the encoder and the number of layers in the decoder are changing.Three values are considered for the number of units in each layer: 32, 64 and 128; up to three layers are considered in simple models and two layers in the ED ones for a total of 36 different models.The main characteristics of each model are summarized in Table II.
The first architectures to be analyzed in the simulations were the S2S models with LSTM (ED LSTM) and GRU (ED GRU).The built model has an encoder, a context vector, and a decoder.The encoder consists of one or more layers of LSTM, or GRU, based on the different cases.After the analysis of the ED models their corresponding simple models were simulated to understand if, and eventually in what amount the S2S model could improve the performances.
Finally, an attention mechanism is introduced in the S2S models (LSTM Attention) with the aim of assessing the performances with respect to the previously analyzed models.
It has been chosen to build the attention model with LSTM since it has shown better performances and higher stability in previous cases.The employed attention layer is inspired by the one described by Bahdanau [21].The dot alignment score is used, it is calculated as in ( 5) where h T i is transposed of the i-th hidden state in the encoder and s t−1 is the previous hidden state in the decoder.
The benchmark used to evaluate the models is the simplest forecast baseline model, Naive Persistence (Persistence), it is based on the most trivial assumption that can be made when forecasting, it takes the value at the current time-step (t) and assigns it to the predicted value for the next step (t +1), in our case the value of the time-step (t) is assigned to the time-step that has to be predicted 24 hours ahead (t + 96) as described in (6), where y p is the predicted value and y the actual one.

D. Performance Metrics
The performances of all the models are assessed using four different evaluation metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Symmetric Mean Absolute Percentage Error (SMAPE).They are among the most employed performance metrics for time-series forecasting evaluation in the literature.MAE, MSE, and RMSE are scale-dependent errors, which means that the resulting error is on the same scale as the data, so their values are not capable of expressing the performances of the model in an absolute way but only relative to other models in the same problem, while the adoption of Symmetric Mean Absolute Percentage Error tries to solve this problem giving a share instead of an absolute value leading to a scaleindependent error.
Besides, it should be underlined that the training loss function is calculated on the MSE which is normalized between the minimum and maximum values of the time series in order to speed up the training process, therefore the training and test error represented in Figure 6 and 4 is a normalized MSE (nMSE).The employed optimizer in the machine learning models is Adam.Finally, the proposed algorithms have been tested on a workstation at Politecnico di Milano, equipped with an Intel® Core TM i9-10900KF CPU with 10 cores of base frequency of 3.7 GHz.

V. RESULTS
In Table II the performance comparison of the most accurate models among those analyzed is reported.They are listed according to their: architecture (naive, simple, S2S or mixed with the attention mechanism), model (Encoder-Decoder or not, LSTM, GRU, and Persistence), and number of layers and units per layer.In addition, the elapsing time in the training is reported.The best-performing model includes the attention mechanism (LSTM Attention) and consists of an S2S architecture with a double-layer LSTM both in the encoder and the decoder made by 128 units in each layer.This model scored the best MAE, MSE, and, RMSE slightly higher than Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the simple GRU and LSTM.However, these simple models suffer from poor stability in terms of error trends in the training process, compared to the relevant models with attention mechanisms.This can be clearly seen in Figure 4 where the training error curve (in blue) of the LSTM with attention shows an almost monotonic descending trend which is smoother than the corresponding S2S LSTM training curve in Figure 3.The LSTM Attention model shows one of the lowest SMAPE in Table II and when counting the time duration it is not fast in the single training epoch when compared to the others.This result is reasonable if considering the higher amount of trainable parameters needed for this model.However, it performed 26 epochs before the training was stopped, which is less than the relevant model without the Attention mechanism.After the tuning of ED LSTM and simple LSTM, it could be noticed that the simple model had outperformed the correspondent ED one leading to the conclusion that a more complex model was not advantageous.The same observation could be done for the ED GRU and simple GRU.However the introduction of the attention layer, in particular in the LSTM Attention model, completely overturned the results improving the performances of the ED LSTM, with a decrease in MAE of 8%, MSE was lowered by 19%, RMSE by 10%, and SMAPE by 10%.

TABLE II PERFORMANCE COMPARISON OF THE BEST CASES FOR EACH FORECASTING MODEL
Some general considerations can be drawn from the analysis of the tuning procedure which is shown in Figure 5. Regarding simple models, their performance metrics MAE, RMSE, and SMAPE tend to lower at first when increasing the complexity of the model and then rise when reaching 128 units, both LSTM and GRU reach the best performances with the model (1 layer and 128 units) as can be observed in Figure 5.
In ED structures, increasing the number of units and layers has different effects on LSTM and GRU, but both of them are performing worse when reaching 128 similar to the simple structures.The higher complexity of the model, when increasing the number of layers in the encoder and the decoder, increases the performances with 32 and 64 units but not in the case of the biggest model with 128 units.In general, the increasing complexity due to the higher number of units is not associated with an increase in the accuracy of the forecast while, on the other hand, it brings to longer training duration.The model with attention is the only one that benefits from an high number of trainable parameters.About the direct comparison between LSTM and GRU, it can be underlined that the second is performing worse on average and this is less evident in the ED architectures but still it consistently has lower training stability making LSTM more advisable.
Figure 6 shows the trend of the actual power and the trend of the forecasts from the LSTM Attention model (in red) and the ED LSTM (in green) during the week spanning from 28/10/2019 to 3/11/2019.
During this week, both models generally underestimated the value of the actual time series giving an under-prediction.However, the LSTM Attention model learned more accurately the daily pattern and is capable of forecasting the starting and the ending of the daily load.Besides, it also learned the weekly pattern being capable of identifying the start and the end of the working days.During the weekend the performances are lower than expected because user behavior is significantly less periodical and then more unpredictable.This worse performance trend is more highlighted during peculiar days such as national holidays.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Particularly low performances can be observed during the week from 25/11/2019 to 1/12/2019 (Figure 7), they can be easily addressed to the celebration of Thanksgiving day of that year, the same is true for the Christmas holidays.This behavior is reasonably due to the lack of holiday "examples" included in the training set, which have a lower statistical significance.Finally, Figure 8 shows the comparison between the best model with just LSTM trend and the best ED LSTM one, it can be noticed how the performances of the two models are comparable, the ED is slightly worse than simple LSTM, on the other side it is not able of properly predicting values close to zero and during the weekend, however, its training stability is a lot lower than the ED model one (ED LSTM).

VI. CONCLUSION
This paper presents and applies the LSTM model with the attention mechanism feature to EV charging load forecast 24 hours in advance.The model is here applied to a real-world Electric vehicle charging equipment public dataset, to test and compare its performances with other state-of-the-art Deep Learning models.
From the comparison with the benchmark of Naive Persistence, the best-performing model in terms of MAE, MSE, and RMSE, is the proposed LSTM with attention mechanism model.In addition, the obtained SMAPE is comparable to the simple LSTM and GRU models, but it showed better stability in terms of training error trend.
More in detail, the increased effectiveness of higher complexity models compared to simpler RNN architectures and the key role of the attention mechanism in improving S2S results are here inspected.Finally, the tuning of hyperparameters and their influence on forecasting performances is critically analyzed.
The obtained results highlight how the presence of the attention mechanism represents a turning point in EV charging load forecast models justifying the increased complexity of these models.
A notable limitation of this study is its dependence on the availability of limited data, thereby constraining the extent to which validation can be achieved.The lack of accessible open and expansive datasets impedes the progress of scientific research in this domain.However, further studies are directed towards enhancing the precision of forecasting models through the incorporation of supplementary inputs.
In future works, other inputs could be fed to the model to complete the available information.In particular, exogenous variables that could affect the periodical influx of electric vehicles and the normal operation of relevant supply equipment are of particular interest such as weather forecast, ambient temperature, and rainfall.Furthermore, inspecting the influence of peculiar days affecting the EV users' behavior, such as the occurrence of holidays, mass events, maintenance or roadworks, would be extremely useful to better model the EV charging station occupancy trend and the overall associated power supplied.These events are occurring occasionally and, even if they are generally planned for a long time by the municipalities, they are very difficult to detect and predict without prior knowledge and due to the Machine Learning model being data-driven and experience-based.
In conclusion, the presence of additional data would allow us to build a more complex model, with several trainable parameters of different types (numeric, boolean, etc. . . ) improving the final electric vehicle supply equipment day-ahead power forecast accuracy.

Fig. 1 .
Fig. 1.Context vector comparison between Classic ED and Attention ED.

Fig. 2 .
Fig. 2. Input and Target represented as 3D matrices after the reshaping procedure.

Fig. 3 .
Fig. 3. Trend of the nMSE during the train (blue) and test (orange) of the best performing model (LSTM Attention).

Fig. 4 .
Fig. 4. Trend of the nMSE during the train (blue) and test (orange) of the best performing model (ED LSTM).

Fig. 5 .
Fig.5.Trend of the error metrics used to evaluate the models MAE, RMSE and SMAPE as function of the complexity of the models during the tuning procedure, the number of layers and the number of units within the layers is increasing along x axis.

Fig. 6 .
Fig. 6.Trend comparison between the actual total power of all EVSEs in JPL dataset and the total power forecasted by the best model (LSTM Attention) and the direct competitor model (ED LSTM), during a typical week.

Fig. 7 .
Fig. 7. Trend comparison between the actual total power of all EVSEs in JPL dataset and the total power forecasted by the best model (LSTM Attention) and the direct competitor model (ED LSTM), on the Thanksgiving weekend.

Fig. 8 .
Fig. 8.Comparison between the trend of the actual total power of all EVSEs in JPL dataset and the total power forecasted by simple model LSTM and the ED LSTM on a typical week.