Customized Uncertainty Quantification of Parking Duration Predictions for EV Smart Charging

As electric vehicle (EV) demand increases, so does the demand for efficient smart charging (SC) applications. However, SC is only acceptable if the EV user’s mobility requirements and risk preferences are fulfilled, i.e., their respective EV has enough charge to make their planned journey. To fulfill these requirements and risk preferences, the SC application must consider the predicted parking duration at a given location and the uncertainty associated with this prediction. However, certain regions of uncertainty are more critical than others for user-centric SC applications, and therefore, such uncertainty must be explicitly quantified. Therefore, this article presents multiple approaches to customize the uncertainty quantification of parking duration predictions specifically for EV user-centric SC applications. We decompose parking duration prediction errors into a critical component which results in undercharging, and a noncritical component. Furthermore, we derive quantile-based security levels that can minimize the probability of a critical error given a user’s risk preferences. We evaluate our customized uncertainty quantification with four different probabilistic prediction models on an openly available semi-synthetic mobility data set and a data set consisting of real EV trips. We show that our customized uncertainty quantification can regulate critical errors, even in challenging real-world data with high fluctuation and uncertainty.

M ITIGATING climate change is a major global challenge and, as observed in our previous conference paper [1], Electric Vehicles (EVs) could play a major role in reaching important climate targets, especially when charged by a highly renewable energy mix [2].However, coupling this energy mix with an increased share of EVs causes new strains on our electrical system and can lead to grid instabilities [3].As a result, coordinated and intelligent charging approaches, so-called Smart Charging (SC), of EVs are required [3]- [5].These intelligent charging approaches involve integrating EVs into a smart Internet of Things (IoT) electrical grid, enabling bi-directional communication to manage power flow, and optimizing charging schedules [6].However, these SC approaches cannot be allowed to inconvenience the user by, e.g., resulting in extra charging stops due to insufficient state of charge or forcing the user to charge at an unknown destination.Therefore, SC applications can only be successfully applied if information regarding a user's mobility behavior is available and combined into the SC application [7].This mobility behavior includes common destinations, travel frequency, distance traveled, and how long a user stays at a specific location [7].Additionally, for SC to be fully accepted, this mobility behavior should be integrated into the SC application without the user manually feeding parameters, such as planned parking duration, next destinations, or risk preferences, into the SC algorithm [8].As a result, such mobility behavior must be automatically predicted.
Although mobility behavior typically follows certain patterns, there is always an aspect of this behavior that is random and unpredictable [9].For example, a EV user may leave for work every morning at a regular time, but due to fluctuating traffic conditions or unforeseen vehicle problems, the trip duration varies [10], [11].Similarly, fluctuations in parking duration may be caused by external factors, such as a varying meeting schedule or after-work commitments [10].The amplitude of these fluctuations depends on the individual EV user and their typical mobility habits [9].Furthermore, this individuality also extends to a users risk preference, e.g., some users may be willing to sacrifice a fully charged EV for a flexible SC schedule to maximize profit [12].
As a result, any prediction of a user's mobility behavior must quantify this uncertainty, account for a user's individual risk preferences, and integrate this information into the SC algorithm.However, a general uncertainty quantification with a probabilistic prediction may not be sufficient to enable SC acceptance.Not only does a general probabilistic prediction ignore a user's individual risk preferences, such predictions also consider all uncertainty regions equal.However, for a user-centric SC application, the uncertainty resulting in undercharging, i.e. the EV leaves earlier than expected, is more important than the uncertainty that results in a later than expected departure time.Therefore, a customized uncertainty quantification that specifically quantifies uncertainty for SC applications is required.Furthermore, this uncertainty quantification must account for known and frequently visited locations and allow for different levels of uncertainty in these locations.With such a customized uncertainty quantification and location information, stochastic SC algorithms can be applied to improve SC performance [13]- [15].
Thus, the present paper extends our previous work [1] and presents a methodology for customizing the uncertainty quantification of a EV user's predicted mobility requirements specifically for SC applications.More precisely, the present paper focuses on quantifying the uncertainty for predictions of the time a user spends in a given location, i.e. the parking duration [15].Our methodology first consists of data preprocessing to derive known and frequent locations for EV charging and to obtain two prediction labels: parking duration and departure time.We then create a general quantification of the uncertainty associated with these labels through probabilistic predictions.Given these probabilistic predictions, we perform a further customized uncertainty quantification to determine critical errors resulting in undercharging and non-critical errors.Furthermore, we define quantile-based security levels that can be used to minimize the probability of the EV being undercharged, given a user's individual risk preferences.We evaluate our methodology with four different probabilistic prediction models on an openly available semi-synthetic benchmark data set with reduced uncertainty previously presented in [1] and a real-world mobility data set from a single user.
The rest of the present paper is structured as follows.Section II considers existing literature and highlights the identified research gap.In Section III, we present our methodology for customizing uncertainty quantification in parking duration predictions.We discuss the case study used in Section IV before reporting all results in Section V. We analyze and discuss these results in Section VI, before concluding and providing an overview of future work in Section VII.

II. RELATED WORK
As observed in our previous conference paper [1], few researchers have focused on parking duration prediction for a single user [16].Instead, most research has forecast the demand of an EV either at a charging station, parking lot, or for a fleet of vehicles [27]- [31].Furthermore, when considering probabilistic predictions, almost all work focuses on electric vehicle charging demand [20]- [26], and does not consider the associated user-specific parking duration.
Considering arrival and departure time, machine learning methods are used to predict these labels in [15].However, the paper's main focus is on the effects of this prediction on the scheduling and not on the accuracy of the prediction itself.Furthermore, [15] focuses on deterministic departure time predictions for a single location (a workplace) and a fleet of vehicles without quantifying the associated uncertainty.Uncertainty in parking duration and energy demand is considered in [16] via quantile predictions for both quantities.These predictions, however, are only performed for a single location, i.e., the home location [16].Furthermore, the quantile predictions do not customize the uncertainty quantification for SC applications.The first daily departure time is predicted in [17], and whilst prediction intervals based on an assumed Gaussian distribution of the errors are created, the uncertainty associated with these intervals is not specifically quantified for SC applications.Mobility prediction for many vehicles is considered in [18] to analyze effects on a distribution grid, however, only deterministic predictions are considered.A review of scheduling, forecasting, and clustering strategies for EV charging is provided in [4] focusing on typical scheduling problems and coordinating the charging of multiple vehicles.Whilst probabilistic methods are discussed in [4], they again focus on EV charging demand or EV charging scheduling and do not consider the individual user's parking duration.Further deterministic mobility predictions are considered for a fleet of vehicles in [19] and in the form of next-place prediction in [32]- [34].
As shown in Table I, none of the above papers specifically quantifies the uncertainty in parking duration predictions for user-centric SC applications.Most papers are deterministic and only focus on a single location, or the locations are pre-labeled and not provided as GPS coordinates.When uncertainty is included, it is limited to a single location, only used to compare different predictions, and not customized specifically for use in SC applications.Therefore, we identify a clear requirement for customized uncertainty quantification of parking duration predictions designed specifically for SC applications.applications is shown in Figure 1.The first step is data preprocessing which involves cleaning the data, performing spatial clustering to determine key locations, and data engineering to calculate the prediction labels.The second step creates a general quantification of the uncertainty through probabilistic predictions.Then, in the third step, we customize this uncertainty quantification specifically for SC applications.The final step, which is not considered in the present paper, is merging probabilistic predictions for both labels and integrating this uncertainty into the SC application.In this section, we describe each of the first three steps in detail. 1

A. Data Preprocessing
The data preprocessing includes three steps: data cleaning, spatial clustering, and data engineering.In the following, we describe these steps and provide an overview of the associated hyperparameters in Table II. 1 An overview of the notation used in all mathematical equations is provided in the Notation section at the beginning of the paper.
Data Cleaning: The first step in the preprocessing chain is data cleaning.The data cleaning initially involves removing all trips with measurement errors, for example, trips with invalid GPS locations or corrupt time stamps.After removing invalid trips, we calculate the parking duration, i.e. how long the vehicle is stationary before the next trip begins.We then remove all trips shorter than 15 s, since we assume these to be measurement errors and unrealistic trip times.Finally, we aim to focus on parking durations relevant for SC applications.For a parking time of less than 2 h there is not enough flexibility to enable SC.On the other hand, if the duration exceeds 24 h there is too much flexibility meaning SC is trivial.Therefore, we filter the data to only include parking durations between 2 h-24 h.
Spatial Clustering: Given clean data, the next aspect of preprocessing is spatial clustering.Spatial clustering is necessary to determine key locations and account for small fluctuations in GPS coordinates.These fluctuations can occur when the destination is the same, but the exact parking location is slightly different, i.e. a different parking spot at the same supermarket.The spatial clustering consists of two steps.In the first step, we apply standard DBSCAN2 [36] to the GPS parking location of all trips.Although this initial clustering generates several suitable clusters, it creates multiple clusters less than 500 m apart.Therefore in the second spatially clustering step, we join clusters whose centroids are less than this predefined threshold of 500 m apart.Such clusters count as a single location for our charging purpose since a user would either walk between them (without using the EV) or, if they choose to drive, the energy consumption is negligible.Once these clusters are determined, we consider the noise points that do not belong to any location cluster.We assign a location labeled as noise to a given cluster if they are within a predefined neighborhood radius of 300 m.We assume this neighborhood radius is a reasonable distance for users to walk when parking at a known location.Finally, we label the clusters according to the frequency of their occurrence, i.e. the cluster that contains the most data points is "cluster 1", that with the second-most data points "cluster 2", and so on.We also label the data points considered as noise with "-1".
In the present paper, we only consider the eight most frequently visited locations for both the training and evaluation of the parking duration prediction.All trips with unknown end locations, i.e. those trips assigned to the noise cluster, are removed.This decision is made because a SC application is not possible if the location, and as a result, the charging infrastructure available is unknown.Therefore we only predict parking duration for commonly visited and known locations.
Data Engineering: We engineer specific features from the end time of each trip which are designed to provide useful information for the parking duration prediction.These additional features are shown and explained in Tab.III.
We also generate labels for the parking duration prediction.We select two separate labels for prediction, i.e.

Label A (Parking Duration):
The time delta between arriving at the current location and departing for the next destination.Label B (Departure Time): The point in time at which the next departure will occur, given the current location.

B. General Uncertainty Quantification
We create a general uncertainty quantification for parking duration predictions through probabilistic forecasts.Before defining such probabilistic forecasts, it is important to understand that deterministic predictions implicitly include uncertainty.Such deterministic predictions are formally defined as conditional expectations given the available information [37] where ŷ is the predicted expected value, Y a random variable modeling the parking duration y, g an arbitrary prediction model3 with estimated parameters Θ, and X the exogenous variables used in the prediction.However, whilst this prediction is an expected value that implicitly indicates the presence of variance and uncertainty, deterministic predictions fail to quantify this uncertainty.Therefore, probabilistic predictions actively quantify the underlying uncertainty.For probabilistic predictions, we no longer consider y as a realization of the random variable Y ∼ f y , with the Probability Density Function (PDF) f y , and the Cumulative Distribution Function (CDF) F y . 4Instead, we predict the entire PDF f y or a subset of information that conveys the uncertainty contained in f y [37].
Since the information contained in a PDF or CDF is difficult to interpret without expert knowledge [38] and also difficult to integrate into SC algorithms, we create probabilistic predictions in the form of quantile predictions and prediction intervals.We describe both of these prediction forms in the following.
Quantile Prediction: A quantile prediction ŷ(α) , with nominal level α, is a point prediction with the probability α that the observation y is smaller than the quantile prediction ŷ(α) [37], i.e.
For example, with α = 0.5, the probability of the observation being smaller than the quantile prediction is 50%, equivalent to the median prediction.Whilst the quantile prediction is a point prediction, we can predict multiple quantiles and later combine these predictions to convey information on the resulting PDF or CDF.
Prediction Interval: A prediction interval Î(β) , with nominal coverage rate 1 − β, is a range of potential values with the probability 1 − β of the observation y being contained in this range [37], i.e.
Usually, these prediction intervals are formed by considering the range between two quantile predictions, i.e.
where ᾱ is the lower quantile, and ᾱ the upper quantile.To ensure the prediction interval is centered on the PDF, we select symmetrical quantiles around the median, i.e., Ideal prediction intervals include (1−β)% of the observations.Fig. 2: A schematic representation of two options to customize the uncertainty quantification of parking duration predictions specifically for SC applications.The error outside the prediction interval shown in red in (a), assumes that the SC application is only at a disadvantage if the EV departs at a time outside the given prediction interval and thus only penalizes observations outside this interval (see Equation ( 6)).Critical and non-critical error decomposition shown in (b), makes a further distinction between predictions that overestimate the parking duration resulting in the EV leaving earlier than expected and possibly undercharged (critical errors, shown in blue) and predictions that underestimate the parking duration (non-critical errors, shown in green).This error decomposition is introduced in Equation ( 9).

C. Customized Uncertainty Quantification
Whilst the above-introduced probabilistic predictions quantify the uncertainty of parking duration predictions, they fail to account for regions of uncertainty that may be critical for usercentric SC applications.Therefore, we now introduce three options to customize the uncertainty quantification specifically for SC applications, namely the error outside the prediction interval, an error decomposition, and security levels.We describe each of these options in the following.
Error Outside the Prediction Interval: The first option to customize uncertainty quantification is by comparing the error outside of the prediction interval and the width of that prediction interval.The idea behind this quantification is shown in Figure 2 (a).More precisely, we define the error outside of the prediction interval E t,PI as for trip t, and the upper and lower quantile predictions for that trip ŷ(ᾱ) t , respectively.This quantification assumes that the only errors relevant for SC applications are those outside a given prediction interval, i.e., the red sections in Figure 2 (a).Therefore, the SC application is only at a disadvantage if the EV departs at a time that is not included in the predicted interval.The E t,PI is particularly useful when combined with the average width W of the prediction interval, defined as for N considered trips.By jointly considering E PI and the width of the prediction interval, a SC application can manage the trade-off between a range of possible parking durations and the possible error.
Error Decomposition: Whilst the combination of the error outside the prediction interval and the width of the prediction interval is useful for SC applications, it assumes that errors on both sides of the prediction interval are equal.However, in the case of a user-centric SC application, predictions that overestimate the parking duration, i.e., they predict the EV will depart later than it actually does are far more problematic.Such predictions could lead to an incomplete charging cycle and an EV that cannot reach its destination without additional charging stops.
Therefore, we define the critical and non-critical errors for a user-centric SC application, shown in Figure 2 (b).Beginning with the deterministic case, we consider the total mean error for a parking duration prediction as the mean absolute error between the prediction and actual parking duration for each considered trip t = 1, . . ., N .With this definition, we define the critical and non-critical components by rewriting Equation ( 8) as where E c is the critical error for the parking duration prediction, E nc the non-critical error, and 1 the indicator function.
With this definition of critical and non-critical errors we can customize the uncertainty quantification to minimize a certain error as shown in Figure 2 i.e., we aim to minimize the probability of a critical error occurring whilst ensuring that the total non-critical error is under a given threshold T .This threshold is, however, highly dependent on a user's individual risk preferences, and therefore a method is required to determine appropriate thresholds.
Security Levels: To help identify an appropriate threshold, we define quantile-based security levels that can be used to minimize the critical error depending on an individual EV user's risk preferences.Per definition, a quantile prediction ŷ(α) ensures that the probability of the observation y being smaller than the quantile prediction is α.For example, a quantile prediction with α = 0.9 should be larger than the observation 90% of the time.Regarding parking duration prediction, this concrete example would also result in a critical error 90% of the time.Therefore, there is a clear mathematical relationship between quantile predictions and the chance of a critical error.To take advantage of this relationship, we define security levels (SL) at level η, which minimize the critical error E c with increasing η.With this definition, a security level of η should guarantee that for η% of the observations, only non-critical errors occur.Whilst security levels do not determine a user-specific threshold T , they provide a general starting point that can be used to determine approximate thresholds given a user's risk preferences.

IV. CASE STUDY
We evaluate our methodology for customizing the uncertainty quantification of parking duration predictions specifically for SC applications on two data sets and with four probabilistic prediction models.In this section, we introduce these data sets and prediction models before explaining the evaluation metrics applied.

A. Data
To evaluate the proposed approach, we consider two data sets: an openly available semi-synthetic data set with reduced uncertainty that we created for [1] and a real data set based on two years of real mobility behavior that contains the full uncertainty of an EV user 5 .
Semi-Synthetic Data: We generate a semi-synthetic data set to create data representing a typical and predictable EV user.The semi-synthetic data set aims to replicate real user behavior, excluding unpredictable events that cannot be accounted for in the optimization process; for example, randomly visiting an unknown location.The semi-synthetic data set thus contains reduced uncertainty compared to the real data.
To achieve this goal, the semi-synthetic data set contains eight locations representing the eight most commonly visited places for a real EV user.Furthermore, there is no "noise" in this semi-synthetic data set, as we assume only known locations are visited.To generate the semi-synthetic data set, we take real travel times between locations from a routing service and multiply them with a normally distributed random factor k ∼ N (1, 0.05) to account for stochastic fluctuations in travel times.Given these trip times, we generate semisynthetic sequences of trips, including time and location scatter with normally distributed offsets, to replicate the temporal and spatial variation in the trips.Furthermore, the trip sequences include recurrences on four levels: daily, weekly, monthly, and seasonally and two random trips per week to the grocery store occurring with a probability of 50% each.As a result, the semisynthetic data set still includes uncertainty but unexpected events or trips to unknown locations are removed. 6eal Data: The real data is recorded from a personal vehicle over two years, from 2018 to 2019.During these two years, the vehicle was only used by a single person and was also the prioritized means of transport during this time frame. 7The vehicle was equipped with an onboard computer to record trip data and parking duration.This data was then communicated via an IoT system (using the MQTT protocol) to a backend database for storage (see also [6]).The final data set in the database consists of 2906 trips.Each trip includes GPS coordinates (which we use for spatial clustering) and timestamps (which we use to calculate the time-dependent features and labels).It is important to note that this real data set only mirrors the behavior of one individual and is therefore not necessarily representative of other EV users.Furthermore, since this real data is collected from an individual using their EV to fulfill their personal mobility requirements, the data set contains all uncertainty associated with an EV user.

B. Probabilistic Prediction Models
We use probabilistic predictions to generally quantify the uncertainty associated with parking duration predictions.When selecting probabilistic prediction models, we focus on robust models that are computationally inexpensive, openly available, and proven to perform well.Therefore, we exclude complex deep learning-based regression models that rely on extensive automated feature extraction, are computationally expensive to train, and are not openly available, e.g.[39]- [44].Furthermore, since there is no clear correlation between successive trips we also exclude time-series-based prediction models that consider auto-regressive terms, e.g.[45]- [48].
Based on our selection criteria, we identify four probabilistic prediction models that are shown in Table IV: Bayesian Ridge Regression (BRR), Gaussian Process Regression (GPR), Natural Gradient Boosting (NGBoost), and a quantile regression Neural Network (NN).Additionally, for each of these models,  we consider a location-dependent ensemble similar to our previous work [1].The following briefly describes the general idea behind these probabilistic prediction methods and how the location ensemble is created.We refer to the existing literature for detailed mathematical descriptions of the applied models and present an overview of the used hyperparameters in Table V.
Bayesian Ridge Regression: The simplest probabilistic prediction model we apply is BRR.BRR is a Bayesian statistics approach to linear regression that incorporates prior distributions over the model parameters to regularize the estimates [50], [51].Assuming a linear relationship between the input features and the parking duration target, BRR uses a prior distribution over the coefficients to quantify uncertainty.Since this assumed prior is a parametric distribution, BRR is classified as a parametric prediction method.We implement the BRR in Python [52] using Scikit-Learn [35] and assume the prior distribution to be a spherical Gaussian.For detailed information regarding BRR, we refer to [53], and [54].
Gaussian Process Regression: Another simple probabilistic prediction model is GPR.The GPR is also based on Bayesian statistics, however, instead of assuming a specific distribution for the prior, it assumes a Gaussian process prior [55].A Gaussian process is a collection of random variables of which any finite number has a joint Gaussian distribution [56].As a result, GPR calculates the probability distribution over all admissible functions that fit the data and can therefore model complex non-linear relationships.The considered Gaussian process is defined by a mean function and a covariance function, with uncertainty quantified with the covariance function.We implement the GPR in Python [52] using Scikit-Learn [35] with Gaussian process prior with a constant mean equal to that of the training data, and a radial basis function kernel with length-scale parameter equal to 1.For detailed information regarding GPR, we refer to [56].
Natural Gradient Boosting: The third probabilistic prediction model is NGBoost, proposed by Duan et al. [57].NGBoost applies gradient boosting [58] to optimize a probabilistic loss function.More specifically, NGBoost uses multiparameter boosting and natural gradients to estimate the parameters of an assumed parametric probability distribution.NGBoost is based on an arbitrary deterministic base learner capable of modeling complex non-linear relationships [57].In the training process, a separate base learner is trained for each parameter of the selected probability distribution using natural gradient boosting to minimize a proper scoring rule [57], such as the logarithmic score or continuous ranked probability score [38].We implement NGBoost in Python [52], with the same random-forest base learner as in our previous work [1] using Scikit-Learn [35], and the NGBoost [57] python package.We assume a Gaussian distribution [57] and apply the logarithmic proper scoring rule [38] for training.For detailed information regarding NGBoost and gradient boosting in general we refer to [57] and [58].
Quantile Regression Neural Network: The final probabilistic prediction model is a simple feed-forward quantile NN [59].A quantile NN is trained like any feed-forward NN with gradient back-propagation, however, this training is designed to directly approximate a given target quantile.This approximation is achieved by training a NN with the pinball loss, a proper scoring rule whose minimization results in optimal quantile predictions [60].For a given quantile α, the pinball loss L pinball,α is defined as where ŷ(α) is the prediction for the α quantile, and y the observation [60].In the present paper, we train multiple NNs to predict multiple quantiles and combine these predicted quantiles by sorting overlapping quantiles to achieve a non-parametric approximation of the full probability distribution.We implement the quantile feed-forward NN with two hidden layers of 100, and 50 neurons respectively.The hidden layers both use the rectified linear units (ReLU) activation function [61], whilst the output layer takes a linear activation function.Similar to [62], we predict 99 quantiles α ∈ {0.01, . . ., 0.99}, with individual NNs.The NN is implemented in Python [52] using TensorFlow [63] with Keras [64].For detailed information on quantile neural networks, we refer to [59], [65], and [66].ble uses a separate prediction model for each location to generate probabilistic parking duration predictions.Although this allows the models to learn the varying levels of uncertainty at each location, it also leads to higher computational complexity, increasing with the number of known locations.Furthermore, the amount of training data available decreases when only a single location is considered, which may lead to an inaccurate representation of the uncertainty at this location.For each of the four probabilistic prediction models we create a location ensemble, resulting in the BRR Ens, GPR Ens, NGBoost Ens, and NN Ens models.

C. Evaluation Metrics
As evaluation metrics, we first consider a qualitative evaluation of the general uncertainty quantification by visualizing prediction intervals.To further evaluate the general uncertainty quantification, we compare the predicted CDF F (ŷ), with the observed empirical CDF F (y).To this means, we integrate over the absolute difference between the two CDFs, i.e.
where the integral between 2 h and 24 h is due to the filtered considered parking duration.In this case, a perfectly predicted CDF identical to the empirical CDF would result in an E INT of zero, whilst the theoretical maximum E INT is 22 8 .To evaluate our methodology for customizing the uncertainty quantification, we consider the metrics defined in Section III-C and the mean error outside the prediction interval for all trips, i.e., 8 Note, this theoretical maximum can only occur in an unrealistic scenario.For example, a predicted CDF predicting the EV will directly depart and thus has a chance of departure of 1 for all parking durations whilst in reality the EV never departed in the considered time period resulting in an empirical CDF with a chance of departure of 0 for all considered parking durations.Since this scenario is unrealistic, we expect E INT values lower than this maximum.

V. RESULTS
In this section, we first compare the general uncertainty quantification from the probabilistic prediction models before reporting the results of our methodology for customizing uncertainty quantification specifically for SC applications. 9

A. General Uncertainty Quantification
To analyze the general uncertainty quantification from the four probabilistic prediction models we first compare the prediction intervals before reporting differences between the predicted and empirical PDFs.
Prediction Intervals: A comparison of NGBoost predicted and observed values for 25 randomly selected trips from the test data set is shown in Figure 3.The blue crosses are the deterministic predictions obtained as the mean of the probabilistic prediction, and the 90% prediction intervals are shown by light blue bars.For both labels on the semisynthetic data set, the mean predictions always lie on or close to the diagonal which indicates a perfect theoretical prediction.Furthermore, the prediction intervals for the semi-synthetic data are relatively narrow.Interestingly, prediction intervals for trips occurring in the middle of the considered time range, i.e. around 10 h for Label A and 12 h for Label B, are narrower than those for values at the edge of the considered time range.In contrast to the semi-synthetic data set, the mean predictions for the real data set do not often lie on the desired diagonal.Furthermore, for both Label A and B, the mean predictions overestimate the parking duration for short stops and underestimate this duration for long stops.The prediction intervals on the real data set are also much wider than those from the semi-synthetic data set.
Probability Distribution: To compare the true and predicted probabilistic distribution, we plot the predicted CDF and observed empirical CDF for two locations using NGBoost fin Figure 4.For both data sets, the accuracy of the predicted 9 Code to replicate the visualizations and error metrics presented in the present paper is available via GitHub: https://github.com/KIT-IAI/Customized-UQ-of-Parking-Duration-Predictions.CDF is highly dependent on the location.For example, on the semi-synthetic data set in Location 1 the predicted CDF is underdispersed and struggles to predict trips with either a very short, or very long parking duration.On the other hand, the CDF for Location 2 on the semi-synthetic data set is highly accurate.For the real data, similar results are observed.For this data set, Location 1 results in an accurate CDF, whilst Location 5 is difficult to predict.
To further analyze the deviations in the predicted and true PDF for each location we report the mean integral error, E INT for Label A and B in Table VI.With regard to the data sets, the errors are generally lower for the semi-synthetic data than the real data.However, for certain combinations of labels and locations, the errors are lower for the real data set.For both data sets, the location plays a major role.Not only do the errors differ noticeably across the locations, but also the bestperforming label changes.For example, Label A is generally more accurate than Label B for the semi-synthetic data, but Label B delivers better results for Locations 3 and 7. Since there was only one observation from Location 8 for the real data, we cannot calculate the empirical CDF for this location.

B. Customized Uncertainty Quantification
The main focus of our evaluation is the customized uncertainty quantification specifically designed for SC applications.Therefore, in this section we first present results for the error outside the prediction interval, before reporting the error decomposition results combined with different security levels.
Error Outside the Prediction Interval: We report the tradeoff between E PI and W in Table VII for Label A, and Table VIII for Label B. Comparing these tables, the errors on the semi-synthetic data set are lower than those from the real data, and at the same time the width of the prediction intervals is also smaller.As expected, as the width of the prediction intervals increases, the error outside the prediction intervals decreases.
Considering Label A and the semi-synthetic data set, a minimum E PI of 0.03 h (approximately 1.8 min) is achieved with the BRR location ensemble with a W of 10.39 h.However, on this semi-synthetic data set, the NGBoost location ensemble performs similarly with a E PI of 0.04 h (approximately 2.4 min) and a lower W of 2.33 h.On the real data set, the smallest E PI of 0.09 h (approximately 5.4 min) is achieved with the GPR location ensemble, however, the W is 15.92 h.
For Label B, the lowest E PI of 0.07 h (approximately 4.2 min) on the semi-synthetic data is achieved by NGBoost with a W of 3.67 h.For the real data, both BRR and the BRR location ensemble achieve the lowest E PI of 0.07 h (approximately 4.2 min), although the Ws of 13.92 h and 13.77 h respectively are far larger.
Error Decomposition & Security Levels: The error decomposition in E c and E nc for security levels from 10% to 90% calculated with the NGBoost location ensemble is shown in Figure 5.Although the total error for the real data set is much larger than the semi-synthetic data, a high security level of 90% results in a similar small critical error.Furthermore, for both data sets the minimal total error occurs at a security level between 40% to 60%.
To further analyze this error decomposition, we report the mean critical error E c , and non-critical error E nc for both the semi-synthetic and real data set for Label A, and Label B in Table IX and Table X, respectively.Although the total error is much larger on the real data set, a high security level results in a small critical error for both sets.For example, for Label A we can achieve an average critical error of only 0.03 h (approximately 1.8 min) on the semi-synthetic data set and 0.1 h (approximately 6 min) on the real data set, with a security level of 90%.Similarly, for Label B, the same security level can result in an average critical error of 0.01 h (approximately 0.6 min) on the semi-synthetic data set and 0.09 h (approximately 5.4 min) on the real data set.Comparing both labels, we observe that the error is not consistently lower for any one label but depends on the considered probabilistic prediction model.

VI. DISCUSSION
In this section, we first discuss the general uncertainty quantification before analyzing our customized uncertainty quantification specifically for SC applications.
General Uncertainty Quantification: The first observation from the general uncertainty quantification is that the increased uncertainty in the real data set is directly visible.When comparing the two data sets, the prediction intervals for the real data set are far wider than those for the semi-synthetic data set for all prediction models and labels.
The second observation is that uncertainty is highly location-dependent.For certain locations, the predicted CDF is similar to the observed empirical CDF, whilst in other locations, this prediction differs noticeably.Furthermore, both Label A and Label B can be beneficial depending on the location.This result is not surprising since a regular parking duration characterizes some locations (e.g., visiting the gym), whilst other locations are characterized by a regular departure time (e.g., leaving work at the end of the working day).
Finally, the general uncertainty quantification emphasizes the need for a customized uncertainty quantification specifically for SC applications.More specifically, although a prediction interval or CDF is useful for visualizing uncertainty, a SC algorithm will have difficulty optimizing a charging schedule Customized Uncertainty Quantification: We first observe that quantifying the trade-off between the error outside the prediction interval and the prediction interval width only has limited benefit.To achieve small errors outside the prediction interval, we observe that large prediction intervals are required, which may not be viable in SC algorithms.Furthermore, the error outside the prediction interval does not explicitly quantify critical scenarios that may lead to undercharging.
Therefore, we observe that security levels based on probabilistic predictions combined with an error decomposition provide the most useful quantification for SC applications.This customized quantification method reduces the critical error to acceptable levels, even for real data exhibiting high uncertainty levels.Furthermore, using security levels can account for individual user's risk preferences and can be combined with a SC scheduling to optimize the SC application for that individual user.Given information regarding a user's risk attitude, it may be possible to optimally select a SC schedule that perfectly fits their profile.
However, we currently focus on user-centric SC applications, but the impact on other participants in an IoT smart grid should be considered.For example, an EV charging station owner who aims to optimally schedule charging slots will consider errors resulting in underestimated parking duration critical since the charging slot is not free for an additional EV when expected.Therefore, multiple definitions of critical errors, their associated security levels, and the impacts on an IoT smart grid should be analyzed.

VII. CONCLUSION
To increase acceptance of Smart Charging (SC) applications, the present paper extends our previous work [1] by introducing a methodology to customize uncertainty quantification of parking duration predictions specifically for SC applications.We generally quantify the uncertainty with probabilistic forecasts before customizing this uncertainty quantification by decomposing critical errors that result in undercharging and non-critical errors.Furthermore, we define quantile-based security levels, which can minimize the probability of an Electric Vehicle (EV) being undercharged, given a user's risk preferences.
Using four probabilistic prediction methods, we evaluate our approach on an openly available semi-synthetic data set and a real data set.We show that uncertainty is highly locationdependent and that a general uncertainty quantification does not provide the specific information required by SC applications.However, our customized uncertainty quantification does provide such information by enabling critical errors to be reduced to acceptable levels for SC algorithms, even when high uncertainty exists in the data.
In light of these findings, probabilistic prediction models that automatically select the optimal label based on locationdependent uncertainty could be considered.Since the present paper focused on user-centric SC applications, future work should consider all participants in an Internet of Things (IoT) smart grid.Specifically, this work should investigate how the definition of critical errors varies for each participant and how these, perhaps contradictory, preferences can be combined to benefit all participants mutually.Furthermore, future work should focus on taking the customized uncertainty quantification presented in this paper and integrating it into stochastic optimization problems, similar to [67], or using it to detect unusual behavior, similar to [68].
Error Outside the Prediction Interval (b) Critical and Non-Critical Errors (b).Formally, for a given tolerance This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3299201This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/T we can solve the optimization problem minimize

Fig. 3 :
Fig. 3: A of the observed and predicted values from the NGBoost model for both Label A and B, on a subset of trips from the test data set.A theoretical perfect deterministic prediction is indicated by the red dotted line, the mean deterministic predictions are blue crosses, and the light blue lines indicate the width of the 90% prediction interval.
This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3299201This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3299201This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/

Fig. 4 :
Fig.4: The predicted CDF and the observed CDF for the semi-synthetic data set (a) and real data set (b) calculated with the test data for two different locations using the NGBoost model.The error between the two distributions is the highlighted area between the two curves.

Fig. 5 :
Fig.5: The total mean error broken down into critical and non-critical error components for different security levels for the NGBoost ensemble calculated on the test data using all locations.The critical error decreases as the security level increases.
This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3299201 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/

TABLE I :
Overview of related work that considers predicting a user's mobility behavior.None of the identified papers customize the uncertainty quantification of parking duration predictions specifically for SC applications.

TABLE II :
An overview of the hyperparameters for the data preprocessing.

TABLE III :
Additional features generated to provide useful information for the parking duration prediction.

TABLE IV :
Overview of the selected probabilistic prediction models.

TABLE V :
An overview of the used hyperparameters for the selected probabilistic prediction models.

TABLE VI :
The integral error, E INT , between the predicted distribution function and the actual observed distribution function for both labels.There was only one trip for Location 8 in the test data set, making the calculation of an observed empirical CDF for this location impossible.

TABLE VII :
The mean E PI outside of the prediction interval and the average width W of this prediction interval in hours calculated on the test data for parking duration (Label A).All prediction models and their associated location ensemble (Ens) are compared.

TABLE VIII :
The mean E PI outside of the prediction interval and the average width W of this prediction interval in hours calculated on the test data for departure time (Label B).All prediction models and their associated location ensemble (Ens) are compared.

TABLE IX :
The mean critical error E c and non-critical error E nc in hours on the test data set for different security levels (SL), when predicting the parking duration (Label A).All prediction models and their associated location ensemble (Ens) are compared.This article has been accepted for publication in IEEE Internet of Things Journal.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/JIOT.2023.3299201 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/

TABLE X :
The mean critical error E c and non-critical error E nc in hours on the test data set for different security levels (SC), when predicting the departure time (Label B).All prediction models and their associated location ensemble (Ens) are compared.