Segment-based CO2 Emission Evaluations from Passenger Cars based on Deep Learning Techniques

The overall level of emissions from the Swiss passenger cars is strongly dependent on the fleet composition. Despite technology improvements, the Swiss passenger cars fleet remains emissions intensive. To analyze the root of this problem and evaluate potential solutions, this paper applies deep learning techniques to evaluate the inter-class (namely micro, small, middle, upper middle, large and luxury class) and intra-class (namely sport utility vehicle and non-sport utility vehicle) differences in carbon dioxide (CO2) emissions. This paper takes full use of novel semi-supervised fuzzy C-means (SSFCM), random forest and AdaBoost models as well as model fusion to successfully classify passenger vehicles and enable segment-based CO2 emission evaluations.


I. INTRODUCTION
More than 5 years after the adoption of the Paris agreement, which aims to limit global warming to below 2 °C (preferably 1.5 °C), global greenhouse gas emissions continue growing steadily [1]. According to the 2016 EU Reference Scenario, without an ambitious commitment towards decarbonization, transport related carbon dioxide (CO2) emissions are expected to decrease only by 8% between 2010 and 2050 and will reach their largest share by the end of the projection period (2050) [2][3]. Underlying this limited decrease are a significant increase in the number of passenger cars, the slow market penetration of electric cars and a limited shift towards alternative fuels.
Switzerland is responsible for less than 0.2% of global manmade fossil CO2 emissions [4]. However, the transport sector represents the largest consumer of fossil fuels in Switzerland and caused about 32% of Switzerland's CO2 emissions in 2019 (i.e., around 15 million tonnes CO2 eq., excluding international aviation and shipping). Road transport was responsible for 98% of these emissions, with only small contributions from national shipping (0.8%), aviation (0.8%) and rail transport (0.2%). Among the different forms of road transport, passenger cars accounted for almost two thirds of the total emissions (73%), followed by freight transport (21%), buses (3%) and motorcycles (2%) [5]. Therefore, in order to meet the CO2 reduction targets of Switzerland [6], fossil energy carriers in the mobility sector have to be substituted by ones based on renewable energies and the overall energy consumption has to be reduced. Although substituting fossil energy by renewable ones is essential to meet the CO2 reduction targets, decisions about investments and new policies are not moving fast enough to decarbonize the economy in compliance with the Paris agreement.
On the other hand, during the last decades there have been large technical and dimensional changes in new passenger vehicles, mostly related to technology improvements and intra-class variations. Particularly relevant are the changes in the dimensions of the vehicle segments (i.e., increased size of the vehicles in most segments), within single vehicle segments (i.e., increased share of SUVs), and other design parameters like increased efficiency of the engine and engine displacement down-sizing. Understanding the impact of these changes on the fuel consumption and CO2 emissions is crucial to develop successful strategies to decarbonize the road transport.
Since the division of vehicles into segments by experts is not standardized and therefore not always uniform, and some vehicle models have recently positioned themselves as "crossovers" between established vehicle categories [7][8], it has become increasingly difficult and inaccurate to segment the vehicle population using conventional classification methods. Using mathematical approaches, vehicles can be uniformly divided into segments based on similarity features. The development of a mathematical approach to accurately segment passenger vehicles is essential for determining the real CO2 emissions from road traffic in the future. While road traffic has so far had its own energy system, which was comparatively easy to assess in terms of CO2 emissions, increasing electrification of road traffic will difficult the distinction of energy consumption from road traffic and other stationary energy uses. Moreover, the estimated overall impact of the introduction of the world harmonized light-duty vehicles test procedure (WLTP) on average CO2 emissions is in the order of 15-25%, which would lead to on average 18-30 g/km higher CO2 emissions for the new passenger cars [9][10][11][12] (Fig. 1). Moreover, due to the limited informative value of the CO2 type approval values on the real CO2 emissions, the wide margin of uncertainty regarding vehicle classification and the type approval extension based on the new definitions in the test protocol [11], this segmentation is an important step on the way to a new CO2 assessment of road traffic.
In this study, by segmenting the passenger vehicles based on technical and dimensional characteristics, we aim to better understand the impact of inter-class (between classes of a multi-class) and intra class (within each class) variations to the passenger vehicle fleet CO2 footprint [14]. In our approach, several semi-supervised clustering algorithms are compared and used to predict labels from unsupervised clustering algorithms based on a feature learning technique, which is a highly useful method for representation learning with highdimensional datasets containing high-level uncertainties [15][16][17][18][19][20][21][22][23][24]. This paper is an extension of a previous work originally focused on developing a machine learning based methodology for the mathematical inter-class and intra-class segmentation of passenger vehicles. Here we improved the classification performance of this method by adding emission and technical features as an input. Based on this novel approach, we can then predict accurate segment-based CO2 emissions, which allows for detailed analyses of the main factors influencing the average fleet CO2 emissions. Our results show that the proposed method is a viable and effective to categorize vehicles based on their technical, emission and dimensional features.
Section II briefly introduces the Swiss transportation system. Section III presents the related research. Section IV describes the methods. Section V provides concise details on the used datasets, the algorithms, the performed experiments and the discussion of the results and last, section VI provides the majors findings of our work and recommendations for further research.

II. SWISS TRANSPORT SECTOR AND CO2 EMISSIONS
In terms of mobility, Switzerland can be divided in three main regions, namely urban, suburban, and rural areas. There are major differences in the sustainability challenges posed within these regions due to the urbanization. Fig. 2 illustrates that the growth of the number of cars has placed additional pressures on traffic congestion and parking spaces, particularly in higher density areas. This creates opportunities for offering alternatives to cover the existing transportation needs, including public transit network and shared mobility services. In contrast, in rural areas, which represent about one third of the total Swiss population, due to the lack of attractive and feasible transport alternatives, private automobile remains the most common form of transportation. In addition, despite the high rate of the population accepting public transport modes in Switzerland (59%), two thirds of the total passenger kilometers are still completed by car [25].
Over 6.5 million motor vehicles were registered in 2018 in Switzerland, more than 4.6 million of which were passenger cars. Around one third of all passenger cars were more than ten years old and over 1.6 million cars were completely outdated. Only 0.4% of all passenger cars had purely electric propulsion systems. Among the new registrations, petrol was the most common fuel (68%), while almost 30% of the sold vehicles had a diesel engine [28]. The mean type approval CO2 emissions of newly registered cars continuously dropped from around 190 g CO2/km in 2003 to around 134 g CO2/km in 2016. After this steady decline, the mean CO2 emissions of the new registrations rose again to 137.8 g CO2/km in 2018. As a result, the specified target value of 130 g CO2/km that came into force in 2012 was not entirely met. Fig. 3 shows that new cars in the Southeast of the country are generally less fuel efficient and produce more CO2 emissions compared to the new cars in the Northwest.
It is forecasted that the share of electric vehicles in the Swiss passenger car fleet will increase from 1.5% in 2021 to 38-74% in 2050, depending on the considered scenario [29]. In addition to the considerable savings in terms of fossil fuel consumption, the increasing share of electric vehicles will drastically reduce the CO2 emissions compared to vehicles powered by fossil fuels [30].

III. RELATED WORK
Over the last decades, as a result of new European CO2 regulations, car manufacturers have used a whole range of technical and dimensional solutions to meet specific annual CO2 emission targets. The review of related literature shows that the large changes in the passenger car models over time poses an additional challenge to the accurate vehicle classification [31][32][33][34].
In spite of the partial achievement of the targets based on the type approval CO₂ emissions (laboratory tests), real-world CO₂ emissions have decreased by only about 10% [35]. Subsequently, the gap between the calculated and real-world CO₂ emissions has widened from 9% to 42%, resulting in 31 g CO2/km of fake emissions savings [36][37]. This gap varies considerably across countries due to the significant variation within vehicle classes. However, the lack of a standard classification method, hinders the comparison of these results between different countries [38][39].
In order to close the gap between the CO₂ emissions results estimated by two major techniques (top-down approaches focusing on fuel market interactions and bottom-up approaches focusing on technological details), researchers have developed multiple simulation programs, such as greenhouse gas emission models and vehicle energy calculation tools, for the compilation of emission inventories [40][41][42][43][44][45]. From this point of view, the simulation is useful to compensate the limitations of the laboratory test methods. For example, Seo et al. [46] developed a vehicle type classification simulation method using a bottom-up approach to calculate national CO₂ emissions. This study concluded that CO₂ emissions of medium and heavy-duty vehicles (MHDV) represented 25.5% of the total on-road emissions, although only 4.2% of all vehicles were MHDV. Jimenez et al. [47] reviewed the influence of vehicle classification, vehicle characteristics, vehicle brand and registration year on the realworld CO₂ emissions. They employed a database of 650 passenger cars. This study explained the impact of these factors on the gap between real-world and type-approval emission values. Ntziachristos et al. [48] reported that the deviations in fuel consumption are directly reflected in CO₂ emissions. This study computed the observed 11 % gap for inuse petrol and 16% for in-use diesel with the type-approval procedure by controlling engine capacity, vehicle mass and power. They used a database of 924 passenger cars from Europe. The results indicated that the large vehicle class has the highest deviation in test score.
All these studies show that simulation techniques are capable to overcome some of the limitations faced with fuelbased approach in terms of estimating the CO₂ emissions of each vehicle class. However, the simulation techniques cannot consider intra-class variations in CO2 emissions, they are difficult to use when conducting a detailed analysis and they require expert knowledge.
Lately, feature learning techniques have shown an outstanding performance for addressing uncertainty problems for clustering and classification [19,[49][50][51][52][53][54][55]. The classification performance highly depends on a quality of features generated from the data as input to the classifier process. However, only a limited number of studies have combined feature learning techniques to improve the classification performance on a high dimensional dataset and predict the vehicle CO₂ emissions. For more details about the feature learning techniques, refer to the article by He et al. [56], in which the authors implemented feature learning classification to analyze vehicular emissions. In particular, they applied decision tree, random forest, AdaBoost, and XgBoost models based on the fuel type and registration date. This study achieved a prediction accuracy of 70 % by artificially controlling the registration date for different users. Saleh et al. [57] used deep learning with a support vector machine (SVM) model to predict CO₂ emissions by monitoring energy consumption. The low value of Root Mean Square Error of the model indicates the high accuracy of the prediction. Ghahramani et al. [58] proposed an unsupervised learning approach to estimate CO₂ emissions from road transport with a focus on taxi trips. This study identified the most polluting trips and the vehicles associated with these trips in order to replace them with alternative alternatives powertrains, such as electric vehicles.
The classification method proposed in this paper is a new semi-supervised clustering scheme (SSFCM) that incorporates semi-supervised information in fuzzy C-means (FCM) algorithm to considerably improve its effectiveness [59][60][61][62][63]. In this field, Jiang et al. [64] combined several feature extraction methods with a support vector machine classifier to group the vehicles in six categories "large bus", "passenger car", "motorcycle", "minibus", "truck" and "van". This study achieved a classification accuracy of 97.4%. Balid et al. [65] implemented deep learning-based classification using the vehicle length as a key feature. Their method classifies vehicles into passenger vehicles, single unit trucks, combination trucks, and multi-trailer trucks and achieved a classification accuracy of 97%. Maungmai and Nuthong [66] used a convolutional neural network method to classify the vehicles type as "small", "medium", "large", and "unknown", and vehicle color as "black", "blue", "white", "green", "yellow", "red", and "unknown". The results comparison shows that, using decision trees, random forest, and densely deep neural network classifier, the classification accuracy of vehicle type and vehicle color increased by 1.8% and, 0.8%, respectively. Dong et al. [67] proposed a vehicle type classification method using a semi-supervised convolutional neural network using high-resolution vehicle frontal view images. The algorithm achieved 88.1% accuracy.

A. SEMI-SUPERVISED CLUSTERING
Semi-supervised clustering aims to boost the accuracy of the defined clusters by identifying better clusters than the ones obtained from the unsupervised learning algorithm [19,[68][69][70][71][72][73][74][75]. Typically, semi-supervised clustering methods result in a worse representation of the results in the original feature space. To make the semi-supervised clustering more efficient, it is reasonable to combine semi-supervised clustering with deep feature learning [76][77][78][79]. The framework of the proposed clustering approach is depicted in Fig. 4.
Unlike the most widely used approaches in semisupervised clustering based on the feature extraction technique, we consider three types of information (diffusion labels, extracted core data, and extracted feature vectors) in order to improve the classification accuracy and mitigate the class imbalance and multi-class overlapping problems. This framework includes three main layers. First, labeled data is divided into train set and test set in order to build a classifier and evaluate its output, respectively. Then, recordings from the train set along with the unlabeled data are used as input to the feature learning process. The output of the feature learning step are the cluster centroids that are used to project data from train and test sets into a new learnt space and extract feature vectors in the feature extraction step. In the classification step, AdaBoost [80], Random Forest [81], and SSFCM models are built on the vectors of the train set and then used to predict the labels for the feature vectors of the test set. Finally, the performance parameters of the three single models are compared to the model fusion ones to evaluate their performances in terms of data classification and prediction.

B. SEMI-SUPERVISED FUZZY C-MEAN CLUSTERING
Fuzzy C-means (FCM), as an overlapping clustering algorithm, is one of the most popular fuzzy clustering methods [82]. FCM is a soft clustering algorithm, meaning that each data point has a probability of belonging to each cluster with partial membership values ranged from 0 to 1. How-ever, due to the non-convexity of its objective function, it may fall into a local optimal solution during optimization. To address this issue, we propose a semi supervised fuzzy Cmeans clustering (SSFCM) that incorporates deep feature learning in FCM to further improve its effectiveness and eliminate redundant information [83][84][85]. This method aims to minimize the objective function (J) as follows: Where N is number of data elements, C is the number of clusters; Xk represents the data k of X= {X1,X2,X3,…,XN} in the i th cluster; uki represents the weighted squared errors function known as membership function; is a weighting exponent that determines the degree of fuzziness and that was set to 2 in order to ensure high membership values for each data point to its closest cluster; A is a positive and symmetric (n × n) weight matrix; U is the fuzzy partition matrix of the dataset X into c cluster; vi is vectors of center in i th cluster; K denotes the features, and ‖ ! − " ‖ # $ denotes to the Euclidean distance function and it is computed in the A norm between j th data and i th cluster center.
The SSFCM methodology is composed of the following four steps. First, with algorithm 1 we find the FCM memberships and centroids:
Next, algorithm 2 is used to calculate deep FCM memberships and centroids: 2. Update t = t + 1 3. Compute "1 , "2)1 4. Compute "1 !6( , "345 !6( 5. If a stopping criterion, t > T or ||Jt − Jt−1 || < ε, is fulfilled for all labeled and unlabeled objective functions separately then stop; otherwise 6. Repeat from step 3. Then, using algorithm 3 we select the features (s ⸦ K) by using the random oversampling (ROS) technique. The purpose of the ROS approach is to maintain a balance between the feature subsets of labeled classes and unlabeled data elements [86][87].
5. for all L and UNL features do 6. Return Q Next, we apply the Euclidean distance technique, which is the most applied (dis)similarity or distance metric to measure the similarity between the labeled and unlabeled feature vectors. The outcome is the maximum average of the maximum relevant and minimum redundant features between each selected feature of unlabeled data and labeled classes [88]: max Sim " ( 7 , 1 9 ) = min 7"1 = minH 7 − "1 9 H (1 ≤ i ≤ c), Xj ϵ XUNL Last, in algorithm 4 the maximum average of the maximum similarity between the selected features are estimated and used in the classifiers.

C. STATE-OF-THE-ART METHODS
Two ensemble learning methods, Random Forest and AdaBoost, are used to enhance the accuracy and performance of the classification [89][90]. The Random Forest model is a parallel learning process that uses a bagging technique for the data training [91]. This data sampling technique aims to reduce the variance and bias in the model by generating multi-sets (multiple decision trees) for training from the original data. In the parallel process none of these decision trees is dependent on other trees.
On the other hand, the AdaBoost model is a sequential learning process that uses the training data to make subsequent decision stumps [92]. In the sequential process the decision stump is dependent on the previous decision stump. In fact, the error made in the first decision stump through mis-classification of few datasets influences the next decision stump by assigning higher weights for those training data.

D. Performance Measure
To assess the performance of the different algorithms, we compute the confusion matrix and use it to determine the Here, TPi (True Positives) is the proportion of data points classified correctly to each class i; FNi (False Negatives) is the proportion of data points that are not classified to class i but actually belong to class i; TNi (True Negatives) is the proportion of data points that are correctly not assigned to class i; FPi (False Positives) is the proportion of the data points that are incorrectly assigned to class i.

E. Model Fusion
The Model fusion method is a deep learning process, by which different classification predictive modeling algorithms associated with individual weights are trained and combined in order to enhance the final estimation. This method turns out to be a stronger meta-classifier as it combines different classification models using a majority voting classifier estimator, partially overcoming the weaknesses of single classifiers and achieving higher classification accuracy. The commonly used voting classifiers include the hard voting classifier and soft voting classifier. The hard voting classifier takes the majority vote applying equal weights to each classifier (mode of all the predicted la-bels is taken) while the soft voting classifier takes the majority vote based on applying different weights to each classifier (probability of all the predicted labels is taken) [56,93]. The voting classifier predictions can be defined as: where Hvote(x) denotes the vote result of hard voting, lab (x, j, c) is an indicator function that shows if x belongs to label c calculated by j th classifier, Svote(x) is the vote result of hard voting, p (x, j, c) is the probability for classifier of exceeding some threshold values, nT refers to the total number of classifiers and k is the number of labels.

A. DATA PREPARATION
The core dataset of this work is the Swiss Motor Vehicle Information System (MOFIS) [94], which contains over 6.5 million passenger vehicles along with their type approval numbers, geometric and weight properties, ownership details, technical information and date of registration. In addition, we also use vehicle technical specifications and vehicle expert segmentation data from the Technical Type Approval Information from the Federal Roads Office (ASTRA) [28] and a Vehicles Expert Partner [95], respectively.
The data-mining framework consists of three main components: filtering of raw data, extraction of the vehicle sample by registration year, and identification of suitable clustering attributes. In a first step we filter the dataset by removing the vehicles that do not meet the definitions of typical passenger cars, such as small pickup trucks, standard pickup trucks, vans, special purpose vehicles (SPVs), sports cars and multi-purpose vehicles (MPVs) [18]. By considering the goal of this paper, the dataset is then separated into two parts, a training part and a testing part. The training dataset contains 308,824 new registered passenger cars in 2018 along with 30 features including emissions (carbon dioxide (CO2), carbon monoxide (CO), nitrogen oxides (NOx), particulate matter (PM2.5), etc.), weight properties, dimensional features (length, height, width, axle, etc.) and vehicle technical specifications (power, engine capacity, drive, torque, etc.) for each car. It is important to note that we used two different values of CO2 emissions, namely the average type approval values provided in the ASTRA database (measured CO2) and the vehicle specific type approval values that also consider the vehicle weight and the gearbox and are reported in the MOFIS database (calculated CO2).

B. EXPERIMENTAL SETUP AND RESULTS
In the first step of the learning process the training dataset is considered to contain two types of patterns: unlabeled and labeled data. The labeled dataset results from applying the unsupervised FCM clustering algorithm to the total 366 unique new registered passenger cars (based on make, model and manufacturer code) based on the dimensional features: micro class containing 18 samples, small class containing 50 samples, middle class containing 110 samples, upper middle class containing 84 samples, large class and luxury class containing 104 samples. The average accuracy rate and adjusted rand index of the FCM clustering algorithm in comparison to the Swiss expert classification was approximately 79% and 75%, respectively [18]. Due to some limitations of the unsupervised FCM clustering algorithm, only the labeled data with true labels (in vehicle class, measured CO2 and calculated CO2) with a membership degree greater than 0.95 were used as the core dataset to extract the accurate classification of misclassified samples and provide the base for the later step of training. Following this, we selected 10% of the data from each class as training labeled samples.
The preliminary statistical analysis on the correlation between the emissions, vehicle segments, sub-segments and preselected influencing factors demonstrated a high correlation between the features. In the feature learning process, the unlabeled data and the previous labeled data along with the labels of the core dataset are used as input. Each group of labeled and unlabeled data has a set of features in common. In order to eliminate multicollinearity, principal component analysis (PCA) was performed on the data. Prior to model development, new features are extracted to reduce the number of features. In the feature extraction step, the cluster centroids are defined using algorithm 2 and each patch is transformed to a feature vector.
In the feature selection step (algorithm 3), the resampling (ROS) technique is used in order to in-crease the number of extracted features from minority groups until it equals the number of features in the majority group. Then, algorithm 4 (based on the Euclidean distance) is used to select the best features and remove redundancy from the feature vector. After we initialized all parts, pseudo labels of labeled data are assigned to the unlabeled data in the training data. Following, this unlabeled data with pseudo labels is used to pre-train the SSFCM, random forest (algorithm 5) and Ada-Boost (algorithm 6) classification algorithms by extracting discriminative features. Finally, model fusion is applied using only the labeled data with true labels.
The experimental results show that the single clustering models using SSFCM, random forest and AdaBoost algorithms and the fusion model all enhance the classification accuracy in comparison to the traditional FCM algorithm (overall accuracy of 79%). Among them, the soft voting fusion model and the SSFCM provide the most accurate results, 94.2 and 95.4% respectively. The Fmeasure value (F1), which represents the model performance, is 91.6% for the SSFCM clustering algorithm and 91.5% for the fusion model with soft voting (Table 1).
From the results of the model fusion, we extract the final features reported in Fig. 5 which we use to re-run the single algorithms and select the final classification model.
The underlying assumption of feature extraction is that it leads to improved classification results in comparison to the initial classifier's predictions with the original features. To verify that this assumption holds for our task, we use the prediction accuracy and other verification measures to check the classification performance of traditional FCM with the original features and the SSFCM, random forest and AdaBoost algorithms with feature extraction. It can be seen in Table 2 that, compared to the FCM-based classifier, the use of feature extraction techniques increases the classification performance. Among all tested approaches, SSFCM provides the best results in terms of both prediction accuracy (95.2%) and verification measures (90.4%) and is therefore selected as our final classification model.
The experimental results demonstrate that the SSFCM algorithm can extract richer information from the vehicle dataset and obtain more discriminative recognition rates than other classifiers do. Therefore, the proposed approach can not only effectively address the problem of multi-class imbalanced data but also improve the prediction performance.

C. DISCUSSIONS
Using the SSFCM model, we estimate the average CO2 emissions of all new passenger vehicles registered in Switzerland in 2018 to be 138.9 g CO2/km, which only deviates by 1.1% from the official estimate of the Swiss Federal Office of Energy (SFOE) of 137.8 g CO2/km [96]. Moreover, for all 26 Swiss Cantons, we find that the correlation between our estimates and those from the SFOE are very high (R2 > 0.95). Thus, although slightly different approaches were used to estimate the CO2 emissions in both cases, the results are highly correlated.
The overall level of emissions from the Swiss passenger cars is strongly affected by the fleet composition, which is shifting in time between classes (from the upper-middle class to the large and luxury classes) and within each class (from non-SUV to SUV). Fig. 6 shows the distribution of the CO2 emissions among the different vehicle classes. Both, the median CO2 emissions and the spread around the median, show a clear upwards trend with the size of the vehicle class. Fig. 7 shows the average CO2 emissions (in g CO2/km) calculated for each vehicle segment resulting from our interand intra-class classification. For each vehicle class we report the results based on the interquartile range distributions of the engine power. Overall, we see a significant variation of the CO2 emissions between vehicle classes, sub-classes and power ranges. Comparing the different vehicle segments, we see that the CO2 emissions increase with the vehicle size. Moreover, SUV vehicles tend to have significantly higher emissions than non-SUVs. In terms of engine power, it can be seen that within each class, an increase in the engine power generally leads to significantly higher CO2 emissions. Fig. 8 shows the spatial distribution of the share of SUVs within the different vehicle segments. It can be seen that the share of SUVs is the lowest for the micro and small classes (between 0 and 20%), followed by the middle class (between 20 and 35%) and the large and luxury class (between 20 and 45%), and is the highest for the upper-middle class (between 30 and 50%). Moreover, we observe that in general the share of SUVs within each class is higher in the southern and center cantons. Fig. 9 reports the spatial distribution of the average CO2 emissions for each vehicle class and sub-class. As previously observed, the results demonstrate a high variability in the CO2 emissions between the different inter and intra-classes. Among the different segments, the CO2 emissions increase with the size of the vehicle, from around 100 g CO2/km for the micro class to around 200 g CO2/km for the large & luxury class. Moreover, SUV vehicles exhibit generally higher CO2 emissions than non-SUVs. This difference is extremely remarkable in the case of micro class vehicles, although, as shown in Fig. 7, the share of SUVs is very small in this case. However, this indicates that shifting from a middle class non-SUV vehicle to a micro class SUV could lead to an increase in the CO2 emissions. In general, the spatial distribution of the CO2 emissions for each vehicle class and sub-class is quite homogeneous and we do not see any significant trends between different regions of the country.

VI. CONCLUSION
To This paper develops a novel approach to mathematically segment new registered passenger cars and assess the segment-based spatial distributions of the CO2 emissions. A variety of semi-supervised clustering algorithms are adopted to classify a dataset of new registered passenger cars based on multiple technical, dimensional and emission features. Among all tested classifiers, the SSFCM technique was the most accurate, providing a classification accuracy of about 90.4% for verification measures.
The proposed approach enables accurate automated vehicle classification of large databases, which in turn facilitates the analysis of fleet changes. Another important advantage of the clustering based mathematical segmentation is that it removes the subjectivity factors affecting expert-based segmentations, reducing classification errors and making databases from across the world comparable. Finally, the automatized clustering approach also reduces classification costs and training time.
Despite technology improvements, the Swiss passenger car fleet remains emission intensive. Our results indicate large variabilities in the average CO2 emissions of different vehicle classes. While a shift of the fleet towards smaller vehicles is likely to diminish CO2 emissions, the emissions intensity could be more effectively reduced by shifting the vehicles proportion within each class (e.g., switching from SUV to non-SUV or to lower power vehicles in the same vehicle class). Therefore, the combination of the inter-class and intraclass classification provides crucial insights for developing fleet transformation strategies to decarbonize the passenger vehicle fleet. A further area of potentially fruitful research would be to use CO2 estimates from real world measurements instead of type approval values for a more precise evaluation of the fleet CO2 emissions.