Visual Eco-Routing (VER): XGBoost Based Eco-Route Selection From Road Scenes and Vehicle Emissions

Traffic-related pollution significantly contributes to environmental degradation. The escalating demand for vehicles, coupled with consumer preferences for larger utility vehicles, poses a challenge to achieving targeted carbon emission reductions in common fleet vehicles. To address the inefficiencies in existing route planning methods, this paper introduces a novel eco-routing approach known as Visual Eco-Routing (VER). VER is designed to understand the non-linear relationships between road scenes and emissions data, providing a comprehensive insight into the real-time dynamics of roads and their influence on vehicle performance characteristics. On-road experimental cycles are conducted to gather data, creating a new dataset called the Vehicle Activity Dataset (VAD). To assess viability of the VER approach, a model named VER-XGB based on eXtreme Gradient Boosting (XGBoost) is proposed. Performance comparisons are made by individual training and benchmarking three selected models, both without VER association and separately with VER association. The comparison reveals significantly lower prediction errors in models with VER, with VER-XGB exhibiting enhanced reliability, yielding MAPE of 4.83% with VAD. Additionally, an aggregate factor termed the emission factor is introduced to explore the correlation between emission gases and distinct groups of visual features defined in the study. The analysis indicates a high correlation between infrastructure features such as traffic signals and stop signs on the road and vehicle emissions. Concluding the study, a qualitative examination is undertaken to evaluate the real-world applicability of the model by predicting an eco-route for a given origin and destination pair. The MAPE for this route for predictions from VER-XGB is found to be 6.21%, affirming the practical utility of the proposed VER-XGB model in real-world scenarios.


I. INTRODUCTION
The urban transportation is evolving, causing significant ecological problems from undesired emissions and high energy demands.With rapid rise in travel demand in recent years, traffic issues are obstructing congestion reduction, safety improvement, fuel efficiency, and emission reduction [1], [2], [3].The World Health Organization estimates the total vehicle number to grow up to 2.5 billion by 2050.
The associate editor coordinating the review of this manuscript and approving it for publication was Wei Quan.
Transportation emissions pose a serious growing contribution to air pollution, up to 30% of total CO 2 emissions and Particle Matter (PM) in the European Union (EU), with road transport contributing 75% of that share [4], [5], [6].The CO 2 emission targets for new vehicle sales established by the EU from 2020 to 2024 is 95gkm −1 [7].This might be difficult to achieve due to the shift in increase of attraction to Sport Utility Vehicles (SUVs), and the low market penetration of alternative propulsion technologies such as hybrid, plug-in hybrid, electric, and fuel cells [8].
The issues of related high energy demands and emissions need to be addressed not only by improving vehicle efficiency and developing alternative fuels but also by making roadway travel more efficient.This can be achieved through improving road infrastructures and deploying various Intelligent Transportation System (ITS) technologies.One of the major hits in ITS technology in the past years has been in the escalation of navigation tools that provide route guidance for drivers.Several internet-based navigational tools provide efficient directions from any origin to any destination in the roadway network.The shortest duration or shortest distance route is usually calculated based on the typical vehicle speeds and real-time road data.
In certain scenarios, the route with the shortest distance or duration may indeed result in minimum fuel consumption and emissions.However, there are instances where this relationship can be reversed, especially in cases where routes involve congested roads and substantial variations in road gradients.The route with the shortest duration might encompass longer distances, including road segments with high speed limits, which typically leads to higher fuel consumption and emissions compared to a more direct route at moderate speeds.Conversely, the shortest distance route might necessitate passage through excessively congested areas, resulting in increased fuel consumption and emissions.
This paper builds on a navigation concept called ''ecorouting'', first introduced in [9], [10], and [11].The aim is to find a route that requires the least amount of fuel and results in the least amount of emissions.The objective of this paper is to present a Machine Learning (ML) approach using real-world data collected from numerous driving cycles, to enable selection of eco-routes when multiple routes are available for a pair of origin and destination.
While numerous approaches exist, this paper presents a new pragmatic approach named Visual Eco-Routing (VER).VER takes advantage of visual features extracted from road scenes to establish functional relationships between Real Driving Emissions (RDE) data and GPS coordinates.By analyzing the interactions between the road environment and vehicle emissions, VER aims to offer a comprehensive understanding of the real-world factors influencing ecorouting decisions.To the best of our knowledge, no previous studies have explored this approach, making VER a completely new solution in the field of eco-routing.

II. BACKGROUND A. ECO-ROUTING
Studies have indicated that choosing different travel routes for the same origin-destination pair can lead to notable variations in fuel consumption and emissions produced [12], [13], [14], [15].Efforts have been in the past decade to develop eco-routing navigation systems that find a route that causes least amount of emissions and/or requires less amount of fuel consumption [9], [10], [11], [16], [17].It has been demonstrated that these eco-routes are not always the same as the shortest duration route [10], [11].The variability in fuel consumption and emissions resulting from different travel routes for the same origin-destination pair can be attributed to several factors.These factors include the non-linear relationship between travel speed and vehicle fuel consumption/emissions, specific characteristics of the vehicle, features of routes being traversed, prevailing traffic conditions, and behavior of driver.The complex interplay among these variables contributes to the observed variations in fuel consumption and emissions when different routes are chosen [2], [9], [10], [18].
It is crucial to distinguish between eco-routing and ecodriving and recognize their collaborative role in reducing energy consumption and mitigating the environmental impact of road travel.Eco-driving involves operating a vehicle in a fuel-efficient manner and lowering emissions without any focus on the safety of oneself and other road users.Therefore, it is not advisable, for instance, to drive at lower speeds than the prevailing traffic speed, to maximize fuel efficiency on a freeway.Such a practice may compromise the safety of other road users since speed variation has been identified as a contributing factor to vehicle crashes [19], [20], [21].On contrary, eco-routing builds based on historical, current, and predicted characteristics of each available route and not on the behavior of the driver.

B. EMISSIONS ESTIMATIONS
Cost factors of roadways are required to calculate the energy/emissions of each available routes.Estimation of such cost factors are based on several static and dynamic features of each roadway link that are determined from the available road and traffic information.Several tools are available [22], [23], [24], [25], [26] to accurately estimate energy/emissions from vehicles.However, these microscopic tools require a extensive set of data recorded at high frequencies, which in turn requires greater computing resources.Consequently, for real-time applications such as navigation, these tools may not be suitable.Alternatively, an adaptive approach that estimates energy consumption or emissions as a function of a set of dynamically available explanatory variables at roadway link level, can be a more feasible and pragmatic option.
Numerous research endeavors have been dedicated to accurately predicting vehicle emissions.These methods can be categorized into two main groups based on the data acquisition method.The first category involves studies that gather data through laboratory experiments, where vehicles are subjected to standardized cycles.For example, Wang et al. [27] constructed a Composite Line Source Emission (CLSE) model using data from dynamo tests to explore emissions under traffic-interrupted micro-environments.However, such laboratory testing may not fully represent the complex driving scenarios from real-world road conditions, leading to potential estimation inaccuracies [28].
Other research approaches involve measuring instantaneous emission data and driving states using on-vehicle 9670 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
equipment called Portable Emissions Monitoring Systems (PEMS) equipped with Global Positioning System (GPS).Liu et al. [29] collected data from vehicles using PEMS and Electric Low-Pressure Impactors (ELPI), revealing significant differences between lab certification cycles and real test cycles.Zhai et al. [30] employed data collected from PEMS to predict average emissions for diesel-propelled transit buses on roadway links, establishing a correlation between average emission rates and link mean speed.Similarly, Wang et al. [29] analyzed the influence of Vehicle-Specific Power (VSP) for public buses in Beijing, indicating a positive relevance of VSP to emission rates and fuel consumption factors.
VSP serves as a road load model that assesses RDE trips by considering the longitudinal dynamics of the vehicle on a second-by-second basis.VSP takes into account various factors related to the vehicle's behavior and performance, enabling a detailed evaluation of the road load during real-world driving scenarios [7], [30], [31], [32], [33], [34].In other works, researchers progressively focused on vehicle emission characteristics on different road segments [35], [36], [37], [38].
Various studies have shown that vehicular emissions are influenced by both current operating status and past driving states.For instance, Qi et al. [39] developed a regression model using the value and duration of acceleration and deceleration to capture operating dynamics.Various machine-learning techniques have been proposed to address emission estimation problems.Jaikumar et al. [28] proved the feasibility of an Artificial Neural Network (ANN) model to estimate real-time emissions using PEMS data.Wang et al. [40] extended this approach by building a VSP-based ANN model to predict emission rates for urban public buses with different fuel types.Pan et al. [41] addressed the non-linearity issues by applying a Gradient Boosting Regression Tree (GBRT) to predict emissions for Liquefied Natural Gas (LNG) fueled heavy vehicles, considering the time reliance and revealing significant differences in exhaust components for LNG fueled heavy vehicles compared to others.
However, hitherto-discussed models addressing time dependence often utilize historical driving state data, such as speed and acceleration, within only a limited time window serving as inputs.This approach is simplistic and nonadaptive, as not all past driving states significantly affect current emission rates [42].To address this limitation, Sun et al. [37] introduced the Long Short-Term Memory (LSTM) architecture, to account for adaptive forgetting of past states at each time-step.The results showed that LSTM contributed to a significant improvement in precision compared to traditional models.LSTM and its improved version gated recurrent unit (GRU), have also been applied for time-series forecasting of air quality pollutants, demonstrating favorable predictive effects [43], [44], [45].
Based on the above discussions, it is noteworthy that various other significant existing factors for vehicle emissions have not been fully considered and quantified in these models.Different weather conditions can lead to changes in operational resistance and gas density, subsequently influencing driving patterns and emissions.Also, road infrastructure, speed limits, time of day, maintenance works or road incidents, agents on road, and traffic density can trigger significant variations in vehicle emissions [46].

C. PROPOSED ECO-ROUTING APPROACH
Establishing functional relationships between collected road scenes and emissions data for various road links provides a holistic understanding of the real-time dynamics of the road and how these dynamics influence the vehicle's performance characteristics.This comprehensive analysis offers valuable insights into the interactions between the road environment and vehicle emissions, enabling a deeper understanding of how road conditions impact the vehicle's efficiency and emissions.
Figure 1 illustrates the architecture of the proposed approach.The Vehicle Activity Data (VAD) encompasses a diverse array of real-world scenarios with a substantial volume of second-by-second data, to offer valuable insights into vehicle operations during different driving cycles.
The VER model is designed to predict emissions based on GPS coordinates, visual features from road scenes, and any vehicle-specific data.Trained on the VAD, the model utilizes historical information to generate emission predictions for multiple possible routes.As shown in Figure 1, when a request for routes between an origin-destination pair is available, the routing algorithms generate multiple route options for the journey.
The VER model then searches VAD to retrieve relevant historical features associated with each of these routes.These historic features are combined with visual features extracted from live road scenes to predict emissions based on GPS coordinates of the selected routes.Segments of routes that are not associable with the data in the VAD are excluded from historic features association and only the road scenes are accounted.

III. VAD CONSTRUCTION A. DATA COLLECTION
Vehicle operational attributes primarily depend on both vehicle-specific parameters and operational environment [32], [33], [47].For this study, the data was collected in Rouen, Normandy, France, to develop VAD.The data was gathered from various driving cycles within the region, using a popular multi-utility diesel vehicle.The vehicle was equipped with a camera and PEMS to facilitate data collection.
The field tests were conducted on various days during both winter and summer seasons of 2022-23.Rouen, located on the banks of the River Seine, has an estimated population of over 0.47 million and encompasses a diverse range of road conditions, including highways, urban areas, and rural roads with flat and hilly zones.These conditions represent a typical medium-sized city in EU.The yearly estimated per km cost of driving in Rouen is 83kg CO 2 emission and in rush hours, time spent in congestion is 5h out of 15h of driving [48].
Throughout the test, data from four distinct sources was recorded for VAD.Real-time emission data for gases CO, CO 2 , NO × , and HC was collected using PEMSLAB [49], a PEMS unit developed by CERTAM.This unit also provided geographic information and vehicle data, including longitude, latitude, altitude, vehicle speed, ambient temperature and humidity.Simultaneously, road scenes were captured as RGB images at a rate of 1 frame per second with a resolution of 1920x1080, using the Intel RealSense D435 camera.The data from these four sensors were synchronized at a rate of 1 hertz.The instrumented vehicle used for the RDE cycles is depicted in Figure 2.
To ensure consistency, all driving cycles were performed by the same driver, with normal and relaxed driving behavior.During the test, the vehicle's HVAC systems were turned off, and the driving cycles were conducted on non-rainy days to facilitate good quality images.Weather conditions in the operational environment were characterized using ambient temperature measurements.The recorded range for these operational data spans from 11 • C to 24 • C atmospheric temperature and from 33% to 70% for humidity, encompassing a wide spectrum of real-world scenarios in the test region.Furthermore, origin-destination pairs for each driving cycle are selected with consideration to include various types of roads, and also that offer multiple route options, and represent diverse driving scenarios commonly encountered in realworld situations.Descriptive statistics of the collected data are shown in Table 1.
Data cleaning methods were implemented to eliminate abnormal data and outliers.Subsequently, the MinMaxScaler was employed to scale all the data using the following equation ( 1): Equation ( 1) transforms each feature x to a new value m such that the minimum value of m is min and the maximum value is max.All other values between the minimum and maximum of x are linearly scaled to the range between min and max based on their relative positions.This process guarantees the standardization of all features to a common scale, facilitating improved comparability and preventing the dominance of any features.

B. VISUAL ROAD FEATURES FOR VAD
The performance of a vehicle and driver behavior is significantly influenced by changes in operational road environment characteristics.By capturing visual scenes from  the driver's perspective, a comprehensive understanding of the operational environment can be achieved.Thanks to the advancements in environment perception models, particularly for road scenes, various interesting classes of objects can be easily identified in real-time.
For this study, the features identified from road scenes are classified into four groups of features.

1) OBSERVED FLOW FEATURES
The total count of vehicles, motorcycles, and bicycles identified in each frame is aggregated and recorded as a feature for the corresponding vehicle position.
2) SPEED LIMIT FEATURES Speed limit traffic signs are detected and recorded as the actual speed limit for the segment of the route in VAD until a new speed limit sign is recognized.

3) INFRASTRUCTURE FEATURES
Traffic lights, tollbooths, pedestrian crossings, animal crossings, stop signs, give-way signs, and school zone signs are detected and grouped together to correspond to each segment of the road.These features are deemed relevant for the next 200 meters of the route segment.

4) INTERSECTION FEATURES
Traffic signs for road intersections, including roundabouts, railway crossings, merge lanes, etc., are identified and associated with each positional coordinate of the records.These associations are considered only until traffic signs are detectable in images.

C. VAD EVALUATION ZONES AND FEATURE ASSOCIATION
As VAD constitutes a historical dataset of vehicle activity, the potential for recurring and updated data points exists.Consequently, it becomes crucial to gather and arrange these data points according to their relevance and association principles.For instance, vehicles engaged in daily commutes might traverse repeated segments of roads.These commutes may accumulate updated information based on the time of day and various incidents or events on the road.Effectively associating this new information is essential to preserve its relevance while considering it collectively.
To facilitate this, the data in VAD undergoes further grouping into evaluation zones based on GPS coordinates.These Evaluation Zones (EZ) are characterized by an experimentally determined fixed radius and a unique ID for each zone.Any new features within a previously defined EZ are consistently associated with the same EZ.In the case of features with GPS coordinates outside the radius of a historically defined EZ, a new EZ is established to encompass them.This approach ensures the preservation of relevant information while effectively organizing and managing the dataset.For this study, the radius of EZ is configured to be 50 meters and to calculate great-circle distances d between two GPS points the Haversine formula, as presented in (4), is employed.
where: Boosting is an ensemble technique that creates new models to adjust the errors made by existing models.This iteration continues until no further improvements can be detected.Gradient boosting is an approach that creates new models that can predict residuals of previous models.XGBoost employs a binary decision tree referred to as a Classification And Regression Tree (CART) as its fundamental learner.Additionally, regularization is incorporated into the loss function to enhance performance and mitigate overfitting [51].The loss function can be shown in equation (5).
where the term L is a loss function to calculate the difference between the actual value y i and the predicted value ŷi for each sample i.The additional term ω is used to penalize the complexity k of each model from each iteration.Mean Squared Error (MSE) is selected as a loss function as this implementation is essentially a regression problem.A series of decision trees is used for regression.The prediction for each data point is a weighted sum of the output from these trees.This can shown in equation (6).I (x i ϵR j )) is a function to check if the data point x i falls in the leaf of R j of the tree.

ŷi =
(y j × I (x i ϵR j )) XGBoost uses two types of regularization namely, Lasso (L1) and Ridge (L2).These control the complexity of the decision tree.L1 regularization enables sparsity in feature selection by adding a penalty term to the objective function based on absolute values of the weights of the leaves γ j , as in equation (7).L2 regularization controls the magnitude of the leaf weights by adding a penalty to the objective function based on squared values of the weights of the leaves, as in equation (8).
The overall complexity of each model from each iteration is found by the combination of L1 and L2 regularization terms.The overall objective function Obj(θ) to be minimized during the iteration includes both the MSE loss term and the two regularization terms, as shown in equation (9).
Then at each boosting round, a new decision tree is created to minimize the sum of weights of a data point w in the loss function and regularization terms within each leaf R j , as equation (10).The optimization step finds the best feature and threshold for splitting data points in a way that minimizes the Obj(t) function for each decision tree t within each leaf.
9674 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Gradient boosting is the core of XGBoost.As shown in equation ( 11), the model is updated during each boosting iteration t by fitting a decision tree to the negative gradient g for each data point i of the objective function.This negative gradient is used as the residual to fit a new decision tree, which is added to the ensemble.The model is updated by adding the output of the new decision tree, scaled using learning rate η to the current prediction, as shown in equation (12).
The learning rate controls the step size during the optimization and helps to avoid overfitting.After the model is trained and optimized, the prediction for a new data point is obtained from the equation (13).

B. IMPLEMENTATION
Many studies have shown that past driving states and live operational conditions will affect vehicle emissions [30], [31], [34].In this study, historical data from the VAD were selected to address the location and time dependence because the values in the VAD vary according to the previous driving patterns and various features detected from the road scenes.Essentially, this is a regression problem while using the factors such as these to predict emissions for a route.
The framework of implementation is illustrated in Figure 3. Series data from PEMS and cameras is processed for creating data points in VAD.The images from the camera represent road scenes are used to detect various class of objects.Vehicle specific data from PEMS and object detections from the road scenes are synchronized based on feature association rules, timestamps and GPS coordinates.Each type of feature listed in Figure 3 is stored in VAD according to the recording order.Furthermore, these data points in VAD are grouped into evaluation zones based on the evaluation zone association rules.All the yellow blocks in the figure represent the defined rules discussed in Section III-C.The VER model, based on XGBoost, combines all the features in VAD to fully capture the nonlinear relationships and predict emissions for each GPS coordinate along routes for an origin and destination pair.

A. MODEL DEVELOPMENT
This section details the VER-XGB model-building process.For model encoding, VAD is split as 80% of the data for model training and the remaining 20% for testing.
For the construction of the optimal version of the VER-XGB, the parameters listed in table 2 are experimentally estimated for the purpose of this study.The effect of change in each parameter on model performance can be summarized as follows: • learning_rate: Controls the step size during optimization.A smaller learning rate necessitates more boosting VOLUME 12, 2024 9675 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.rounds but often enhances generalization.Opting for lower values may bolster accuracy, albeit at the cost of increased training time.
• max_depth: Determines the maximum depth of a tree.Deeper trees can capture more intricate relationships in the data.As the value increases, the model becomes deeper and computationally more resource-intensive.
• n_estimator: Represents the total number of boosting rounds.While augmenting the number of trees may enhance performance, there exists a point of diminishing returns.Excessive trees can lead to overfitting.
• min_child_weight: Specifies the minimum sum of instance weight (hessian) required in a child.Higher values imply increased regularization, preventing the splitting of nodes with low weights and more instances.
• gamma: Dictates the minimum loss reduction necessary to make a further partition on a leaf node.An increase in values introduces effective regularization and avoids splits that do not significantly reduce loss.
• colsample_by tree: Represents the fraction of features (columns) to be randomly sampled for each tree.An increase in this value introduces additional randomness, potentially improving generalization.

B. MODEL VALIDATION
To assess the effectiveness of the VER-XGB model, two additional state-of-the-art models are considered.The primary objective of this study is to evaluate the feasibility of the VER approach.Consequently, results were obtained using all three models independently of the VER association.This implies that visual features derived from camera data were not considered for these models.The selected models and their functions are outlined below: 1) XGB: XGBoost, a high-performance gradient boosting algorithm, sequentially combines weak learners, primarily trees, employing effective regularization to prevent overfitting.A XGB estimator without considering visual features was constructed to quantify the influence of PEMS data on GPS data for predicting emissions.

2) GBDT: Gradient Boosting Decision Trees (GBDT)
is an ensemble learning method that constructs trees sequentially to minimize residual errors.A GBDT estimator is built without considering visual features to assess the influence of PEMS data on GPS data for predicting emissions.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.visual features generated from camera data.VER versions of the models were built to quantify the influence of various visual features and features from PEMS data (as listed in Fig. 3) on GPS data for predicting emissions.For a fair comparison, all the models are built and benchmarked on the VAD dataset.All essential parameters of these models are experimentally calibrated to achieve the best performance.For XGB, initiating the calibration process involves fine-tuning the learning rate to obtain a balance between training time and model accuracy.The number of trees (n_estimators) is optimized through cross-validation while avoiding the potential for overfitting.Tree-specific parameters such as max_depth, min_child_weight, and gamma are tuned experimentally to achieve an optimal trade-off in model complexity.GBDT follows a similar calibration approach to XGBoost.
In the case of SVR, a Gaussian Kernel is employed to capture complex, non-linear patterns in VAD.Regularization parameters and kernel coefficients (gamma) are then adjusted through cross-validation to achieve optimal values.
During the calibration procedure for each model, the parameter-tuning process employs a grid search technique, systematically exploring a range of potential values.The optimal value is determined through a 5-fold cross-validation approach, ensuring a robust evaluation of model performance across different subsets of the dataset.All models are implemented using Python 3.11 environment.
To quantify the performance of each selected model, three common evaluation metrics are used namely, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).Mathematically they can be defined as in equations ( 14), (15), and (16).
In these equations, Y i represents the actual values, Ŷi are predicted values, and n ins the number of observations in the VAD.Typically, a lower value of these evaluation metrics indicates more accurate and dependable predictions derived from the model.

C. EXPERIMENT RESULTS
The performance outcomes of the selected models benchmarked on the VAD, are summarized in Table 3. Notably, the proposed VER-XGB surpasses all other models across the spectrum of three evaluation metrics.Particularly noteworthy is the substantial performance enhancement observed across all models with VER association.This observation underscores the significant impact of capturing visual information that directly influences driver responses, thereby altering vehicle operations.It contributes significantly to a more refined understanding of the intricate relationships between emission characteristics and location information when accompanied by ambient data, for the prediction of emissions for routes.
As indicated in Table 3, the average improvement in MAPE across all three models for various emission gases is 19.43% when VER is incorporated.Specifically, the improvements in the prediction accuracy for individual emission gases with VER association from all models are 25.25% for CO 2 , 23.98% for CO, 10.91% for HC, and 17.60% for NO × .This underscores a significant reduction in prediction errors achieved by incorporating the pragmatic visual features in conjunction with PEMS and ambient data.
For a better understanding of the correlation among different groups of visual features outlined in Section III-B, an aggregate method called emission factor is devised.This factor serves to quantify emission values in connection with the various groups of visual features.Its defined as in Equation (17): Here, E vf represents the average emission values associated with each category of visual features (vf ) in the VAD, and d vf denotes the number of occurrences of the  same group of vf .The emission factor estimated for various models, for each emission gas and visual feature class are illustrated in Figure 4. Notably, the highest emission factor is observed for the infrastructure group, consisting of object detection classes like traffic signals, stop signs, and school zones.The second-highest correlation of emission values is associated with the intersection group, which involves vehicle maneuvers such as abrupt acceleration operations required in roundabouts and when entering a freeway.Features such as speed limits and observed flow characteristics exhibit a relatively lower correlation with emission factors.This phenomenon may be attributed to consistent flow patterns or linear changes in patterns, coupled with the skill of a professional driver to anticipate such variations and respond gradually.
These results align with the conventional knowledge regarding carbon emissions from internal combustion vehicles.Carbon emissions exhibit a direct correlation with irregular traffic flow patterns and situations requiring sudden speed changes to adhere to road design regulations.However, comprehending the characteristics of HC and NO × emissions proves more challenging due to substantial variations in values estimated by different models.Notably, among these models, the proposed VER-XGB demonstrates greater stability and reliability in predictions compared to the true values in VAD.

VI. CONCLUSION
This paper introduces an innovative approach for the selection of eco-friendly routes based on vehicle emission data and various classes of object detection extracted from road scenes.Initially, a novel dataset called VAD is developed, using an industrial PEMS unit and a RGB camera installed on a common diesel vehicle.An innovative method termed VER is devised from the VAD to identify diverse factors influencing vehicle emissions.For this investigation, a VER model based on XGBoost is constructed and experimentally validated.
The VAD was built from RDE cycles conducted in Rouen, reflecting traffic patterns typical of a medium-sized EU city.Using VAD, the proposed VER-XGB model underwent benchmarking against other state-of-the-art models, including iterations without the VER association.The outcomes showed that the inclusion of VER significantly enhanced emission predictions across all models.Notably, VER-XGB yielded the most favorable results for all four selected emission gases.
Furthermore, the VER-XGB was employed in the selection of eco-route during the planning of RDE cycles.Figure 5 9678 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
depicts the route planning undertaken before our final RDE cycle.The route highlighted in the snippet of OpenStreetMap in green is an eco-friendly route predicted by VER-XGB, while the red route represents a 'non-eco-route' or alternative route, as illustrated in Figure 1.From the data collected from this RDE cycle, the MAPE for the total predicted emission values from VER-XGB was 6.21%.The net emissions for the green plotted route were 2.35 g/s lower than those for the red plotted route.The green route exhibited a reduction in travel time duration by 7mins with an increase in total distance of 4.5 kms compared to the red route.This finding highlights the validation that eco-routes may not always align with the shortest-distance routes.
To generate routes between a origin and destination pair, the Google Maps API was utilized.The route coordinates provided by the API were associated with data points in VAD based on experimentally defined EZ, as discussed in Section III-C.The qualitative and quantitative results derived from this study underscore the capability of the proposed approach for the practical application of eco-routing.
While prior studies have showcased the extent of predicting vehicle emissions based on features such as weather, typical commute routes, driver behavior, and traffic density, none have delved into the potential patterns embedded in road scenes and their correlation with vehicle emissions.Our findings can provide theoretical guidance for developing more effective eco-routing techniques by understanding the real operational attributes that drivers respond to, subsequently altering vehicle behavior.

FIGURE 1 .
FIGURE 1.Proposed approach for eco-routing using VER Model.

FIGURE 4 .
FIGURE 4. Analyzing VAD data and model estimations to find correlation between various categories of visual features and emission factor.

3 )
SVR: Support Vector Regression (SVR) is a regression technique leveraging kernel functions to capture complex, often non-linear relationships.An SVR model is constructed without considering visual features to examine the influence of PEMS data on GPS data for predicting emissions.4) VER-XGB/GBDT/SVR: The aforementioned models are separately reconstructed, this time considering 9676 VOLUME 12, 2024

FIGURE 5 .
FIGURE 5. Prediction of Eco-Routes for a RDE trip using VER-XGB model.The green colored pin is the origin and the red colored pin is the destination.The Green plotted route is the predicted eco-route by VER-XGB.

TABLE 1 .
Descriptive statistics of emission and ambient recordings in VAD used for VER model development.
lat 2 : Latitude of two coordinates (in decimal) long 1 , long 2 : Longitude of two coordinates (in decimal) alt 1 , alt 2 : Altitude of two coordinates (in the same in meters) [50]VER MODEL A. VER-XGBFor this study, XGBoost is the proposed VER model for predicting emissions for a route, based on features collected from live road scenes and VAD.XGB stands for Extreme Gradient Boosting and is a scalable tree-boosting system that was first proposed by Chen and Guestrin[50]during 2016.

TABLE 2 .
Parameters for VER-XGB with corresponding values for each cross validation split k.Bolded values are the calculated optimal values for VER-XGB.

TABLE 3 .
Performance results of selected models for this study with VAD.The bold values are the best results reported.