Loading web-font TeX/Math/Italic
One Versus All Binary Tree Method to Classify Misbehaviors in Imbalanced VeReMi Dataset | IEEE Journals & Magazine | IEEE Xplore

One Versus All Binary Tree Method to Classify Misbehaviors in Imbalanced VeReMi Dataset


OVA Binary Tree Method to Classify Misbehavior in VANETs.

Abstract:

Nowadays, transportation networks depend heavily on the technology known as vehicular ad hoc networks (VANETs). VANETs enhance traffic control and road safety while also ...Show More

Abstract:

Nowadays, transportation networks depend heavily on the technology known as vehicular ad hoc networks (VANETs). VANETs enhance traffic control and road safety while also enabling vehicle-to-vehicle communication using basic safety messages (BSM), which are susceptible to different kinds of attacks. This study focuses on techniques for detecting and classifying misbehavior in VANETs while dealing with unbalanced data. In order to ensure equal treatment of minority and majority categories, we provide a novel method called One vs. All Binary Tree (OVA-BT). This approach separates binary classifiers for each kind of misbehavior and provides specific assessment metrics for each kind of misbehavior. We evaluate our experiment using five-fold cross-validation with six individual models of ML and an ensemble classifier. The findings demonstrated that the use of OVA-BT enhances the classification accuracy when compared to a traditional single multi-class model and that the classifier ensemble’s classification performance is greater than the best individual model on the testing set.
OVA Binary Tree Method to Classify Misbehavior in VANETs.
Published in: IEEE Access ( Volume: 11)
Page(s): 135944 - 135958
Date of Publication: 28 November 2023
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Intelligent transportation systems currently require the use of VANETs, which offer effective interaction and cooperation between infrastructure and vehicles. Several advantages of VANETs include increased traffic control efficiency, enhanced road security, and improved conductor experience [1].

Various infrastructures form the basis of VANET such as On-Board Units (OBUs), Roadside Units (RSUs), and Certificate Authorities (CAs) [2], [3]. A vehicle is equipped with an OBU which periodically diffuses its state to those nearby. An RSU [4], a crucial component of the infrastructure, is installed on the side of the road and communicates with neighboring OBUs by sharing traffic and weather information while assisting OBUs in establishing internet access.

There is a distinguished infrastructure known as the Central Authority (CA), which offers services including the administration and registration of any network components and also the cancellation of certificates in the case of misbehavior [5]. VANETs support a variety of interaction methods, including vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I). A Basic Safety Message (BSM) [6], including the sender’s actual location, speed, acceleration, and direction, as well as a temporary pseudonym to determine the sender, is broadcast every 100 milliseconds to ensure communication between vehicles.

However, because of the V2X connection, VANETs are an open-access environment that is much more vulnerable to misbehavior, causing significant safety problems.

Misbehavior in VANETs is described as any intentional or unintentional activity by a vehicle that diverges from the established norms, protocols, or guidelines of the VANET system. Misbehavior may manifest itself in a number of ways, such as careless driving, moving violations, illegal access, forgery of data, or even dangerous assaults. These illegitimate or dishonest behaviors have the potential to affect VANET’s regular functioning, affect the security of vehicles and passengers, and damage the network’s reputation for efficiency [7].

In the context of challenging issues with categorization and when dealing with an unbalanced distribution of instances of misbehavior across various categories, it might be difficult to identify and classify misbehavior in vehicular communications.

Unbalanced databases occur when specific kinds of misbehavior are considerably underrepresented in comparison to others.

Complex issues with categorization cannot always be handled with single multi-class categorization methods; they struggle with this imbalance because they tend to perform better for the majority category while performing less well for the minority misbehaving category. As a result, it is difficult to accurately identify and categorize occurrences of misbehavior, causing possible weaknesses in security.

In contrast to single multi-class categorization, this research suggests the One-Versus-All Binary Tree (OVA-BT) strategy as an efficient solution to these problems.

The OVA technique divides the multi-class categorization problem into a group of binary sub-issues [8], [9], considering every category as an independent binary classification work. This allows binary classification algorithms to handle multi-class classification problems.

Numerous OVA-based multi-class classification techniques have been suggested using this concept [10], [11].

We can successfully tackle the issue of imbalanced data to identify and classify the misbehavior in VANETs by utilizing the OVA-BT approach.

In the context of unbalanced data issues in VANETs, the OVA binary tree technique has a number of benefits. First of all, it enables the representation and categorization of every misbehavior category separately. The classifier can concentrate on extracting the distinctive patterns and features of every kind of misbehavior by treating every category as a distinct binary categorization problem. This specialized knowledge improves the precision of categorization, especially for minority misbehavior categories that might not be well represented in the database.

The OVA-BT method, on the other hand, offers flexibility in choosing the best binary classification algorithms for each category. For accurate detection, specific algorithms must be designed for many kinds of misbehavior that may exhibit different features. The OVA-BT methodology makes use of specific binary classifiers to allow the use of methods specifically created for each type of misbehavior, improving the effectiveness of classification.

Additionally, by organizing the multi-class imbalanced dataset into a hierarchical structure, the OVA-BT technique successfully overcomes this issue. The classification process is streamlined by this hierarchical decomposition, which also lessens the complexity involved in handling unbalanced data directly. The OVA-BT approach’s interpretability also makes it possible to track down misclassification choices, which helps in comprehending and improving the categorization algorithms.

In this study, we will provide a thorough examination of the security risks provided by improper behavior in VANETs, focusing on the situation of unbalanced datasets. We will look at the drawbacks of traditional classification methods and emphasize the advantages of using the OVA-BT technique. Different machine-learning approaches are used and compared for learning and detecting malicious behavior. This paper employs an artificial intelligence (AI) methodology. The key contributions of this study are, in particular:

We suggest an efficient OVA-BT method to solve the problems caused by unbalanced datasets while categorizing multi-class misbehavior in vehicular communication. By adopting a binary tree structure, this method expands on the traditional OVA technique by providing for the distinct analysis and classification of each misbehavior category.

We provide thorough experimental assessments between the OVA Binary Tree approach and traditional classification methods in terms of area under the curve (AUC), recall, F1-score, and precision. We present in-depth simulation findings describing the functionality of six distinct machine learning algorithms (Support Vector Machine (SVM), Logistic Regression (LR), Naïve Bayes (NB), K-Nearest Neighbors (KNN), Random Forest (RF) and Multi-Layer Perception (MLP)) and ensemble learning classifiers.

The remainder of the paper is structured in the following manner: A literature review is presented in Section II. Section III describes the generated dataset, the proposed OVA-BT technique, the classification methods, and the performance evaluation metrics. Section IV presents the findings and discussions, while Section V provides the conclusion.

SECTION II.

Literature Review

In order to address the multi-class classification issue when dealing with unbalanced data, numerous experiments have been carried out. This Section begins with an outline of the most recent advances in artificial intelligence for multi-class categorization, followed by an overview of imbalanced data in VANETs environment.

A. Handling Multi-Class Classification

Many works have dealt with the multi-class classification issue. The decomposition-based methods such as the OVA strategy have drawn a lot of interest. The OVA division approach splits a k-category multi-class categorization issue into k distinct binary algorithms, which are trained to differentiate instances of one category against instances of all other categories.

In [10], Rifkin and Klautau show the effectiveness of the OVA technique, highlighting its competitive precision and ease of use, especially when using well-tuned regularized models such as SVM. Practical circumstances frequently involve complex databases that may not conform to the assumptions of the One vs. All technique, like unbalanced category distributions and extensive connections. Although the work offers an insightful viewpoint, it failed to address the range of difficulties observed in real-world multi-class categorization jobs, in which different approaches would be more appropriate.

To solve the shortcomings of the earlier suggested Weighted Linear Loss Twin Support Vector Machine (WLTSVM) for binary categorization, the Weighted Linear Loss Multiple Birth Support Vector Machine (WLMSVM) was introduced in [12]. The benefits of WLMSVM include improved multiclass categorization efficiency, which is attained by granulating data and using the “all-versus-one” technique. Comparing this method to the OVA approach employed in multiple WLTSVM, the computational complexity is greatly reduced. The solution of linear equations is also made simpler by the addition of weighted linear loss. WLMSVM’s success is shown in the research through findings from experiments on a variety of databases; however, it’s vital to take into account any potential limits WLMSVM may have when addressing situations that are more complicated and challenging than benchmark databases.

A revolutionary One Versus All multi-class classification technique centered on the cooperative evolution of SVM was created in [13] and is known as MC2 ESVM. The method divides an N-categories issue into N sub-issues, which are then cooperatively improved. It has been demonstrated that the strategy works well for multiclass categorization issues while maintaining a manageable number of support vectors. The method can, however, be computationally costly, difficult to converge, and hyperparameter selection-sensitive.

The authors of [14] discuss the difficulty of carrying out diagnostics while concurrent problems exist. The suggested OVA class binarization technique has a number of benefits. It improves defect detection precision while lessening the requirement for a comprehensive set of data instances. It optimizes the data collection procedure by forecasting concurrent faults using training examples of typical and singular faults. The prospective use of the One versus All evaluation technique is shown through an investigation utilizing a support vector machine and C4.5 decision tree. Nevertheless, as laboratory-gathered data might not completely reflect the nuances of actual bearing situations, it is insufficient to determine if this method is beneficial for more complicated and varied circumstances in the real world.

Khairudin et al. [15] developed accurate detection and classification methods for classifying 4 kinds of human intestinal parasites using the OVA technique and tested their solution utilizing k-NN, SVM, and ensemble learning. The database’s constrained nature, especially its particular collection of helminth species and ova pictures, should be taken into account. Real-world scenarios can contain a wider variety of parasite types and changes in the quality of images, demanding additional validation and database development to guarantee the algorithms’ efficacy in different healthcare environments.

In the framework of Random Forest, Adnan and Islem [16] study the use of the OVA binarization strategy. Binarization approaches can make multiclass categorization issues simpler and enhance the performance of the Random Forest algorithm, which has the opportunity to increase the precision of predictions. The paper uses ten databases from the UCI Machine Learning Repository for a thorough experimental evaluation to show the efficacy of this method. The main shortcoming of this study’s assessment is that it is concentrated on a small number of categories within each dataset instead of taking into account a wider variety of categories.

In [17], Allwein et al. presented an approach for dealing with multi-class classification issues by breaking them down into numerous binary issues. Margin-based binary learning models are streamlined by this unification, which also offers a universal technique for mixing models. The flexible architecture can be used with several categorization techniques, like SVM, AdaBoost, regression, logistic regression, and decision-tree models. The experimental findings of AdaBoost and SVM demonstrate weakness in validation, underlining the necessity for broader methods of categorization to increase the legitimacy and application of the methodology.

For the purpose of designing neural network ensembles, McDonnell et al. [19] suggested a novel cooperative ensemble learning system (CELS). The concept behind CELS is to motivate various single networks in an ensemble to train various components or elements of a training database so that the ensemble can more effectively learn the entire training database. Both the Australian Credit Card Assessment Challenge and the Mackey-Glass time series prediction issues have been used to test CELS.

For the challenge of data categorization, Chen and Alahakoon [18] suggested a hybrid learning strategy called Learning-Neuro-Evolution of Augmenting Topologies (L-NEAT). The NEAT method is used on L-NEAT to streamline evolution by breaking down the entire problem domain into smaller tasks. Smaller tasks are then learned. The efficacy of NEAT is nevertheless affected by the selection of the evolutionary parameters and the particular database, which is inadequate.

In [19], the multi-class categorization degradation of NEAT is addressed by applying the class binarization approaches of One-vs-All and One-vs-One, where binary models are the various NEAT-evolved neural networks. OvA-NEAT and OvO-NEAT, two ensemble techniques, are created to exceed the regular NEAT in terms of precision and effectiveness.

This paper presents a novel method for detecting multi-class misbehavior in vehicular communication using the OVA-BT technology and the VeReMi extension database. Although there are other versions of the database, our research focuses on using OVA-BT for the first time on the original VeReMi extension database, a less-used technique at the time. The lack of benchmark data from other OVA-BT approaches on this database prevented direct comparisons, despite the fact that we understand the value of benchmarking against current ideas.

B. Tackling Unbalanced Data

The unbalanced class number is a significant issue with supervised Machine Learning techniques. It typically happens once the database traces data relating to a real environment. In fact, in such a setting, the data is frequently unbalanced, and the models developed from the data might be more accurate for the majority category but incredibly inaccurate for the rest of the categories. Similar problems are seen in the VeReMi Extension database, which has been widely utilized to learn ML classifiers to classify misbehaving nodes in VANETs. There are several approaches to solve this issue: changing the machine learning approach, adding an inaccurate categorization fee, and data sampling [20].

In the context of VANETs, the researchers [21] attempt to oversample the minority category by employing the Synthetic Minority Oversampling Technique (SMOTE), an approach to data augmentation for the less category [22], [23]. It duplicates instances from the less categories in order to avoid the issue of unbalanced datasets. With this approach, DDoS Detection in VANETs can be realized efficiently.

The researchers in [24] used the ToN-IoT network database and proposed a detection system for vehicular communication.

Two distinct problems concerning the network data in this database are unbalances in categories and missing values. The Chi-squared and SMOTE techniques were used to tackle this problem. The ToN-IoT database is vulnerable to misbehavior than earlier datasets like NSL-KDD, KDD-CUP99, and UNSW-NB15 [25], [26]. SMOTE is used by the authors in [27] to address the issue of an imbalanced database. In SMOTE, new instances of the minority categories are combined to bring their total number to a level that is comparable to or equal to the total instances of the dominant category.

In [28], the authors utilized SMOTE and DSSTE to prevent class unbalance. The DSSTE approach is used to raise the minority occurrences and reduce the majority instances within a challenging set.

They employed the unbalanced VeReMi initial database to demonstrate the effects of an unbalanced dataset on the classification of misbehavior in Vehicular communications. This study’s primary weakness is that it only considers a small subset of misbehavior types when discussing the identification of DDoS for vehicular communication. The work is helpful in tackling this particular subset of security issues, but it falls short of offering a comprehensive answer to the complex and always-changing threat environment that VANETs must contend with. To guarantee the security system’s durability in a variety of real-world settings, a more successful strategy would require taking a wider range of misbehavior types and variants into account.

When using the unbalanced VeReMi Extension dataset, containing a large number of misbehavior types, in our earlier research [29], we discovered insufficient accuracy in the single multiclass classification of misbehavior in vehicular communication. In this study, we attempt to employ the OVA Binary Tree technique to avoid these issues in order to enhance the outcomes, and we track changes in the model’s output. The goal of this work is to clearly demonstrate how an unbalanced dataset affects the development of intelligent algorithms for classifying cyber-attacks.

In addition, the utilization of a real-world database with a wider variety of classes is one of the key advantages of our study. Our work makes use of a more varied and complicated database with 20 classes, whereas the researcher’s study may have used synthetic or smaller databases with fewer classes. One of our research findings is that we have handled a wide range of real-world classification scenarios by using various categories, which helps us better grasp the problems and solutions for multiclass classification.

SECTION III.

Materials and Methods

This Section describes the generated dataset, including its attributes and statistical data, the applied methods for classification, the data preparation, the suggested approach based on an One Versus All binary tree for classification, and the performance evaluation metrics used to assess the system’s effectiveness across all options and methods.

A. Generated Dataset

The Vehicular Reference Misbehavior Dataset Extension (VeReMi) is used in this study, which was developed using OMNET++ [30] and VEINS [31] on the Luxembourg SUMO traffic scenario (LuST) [32]. The simulations in this dataset range in density from low to high traffic times. Each simulation is made up of log and ground truth files. There is a single ground truth file explaining how a vehicle really behaves in the interconnected system, and it is used for running simulations. An attacker type is included in the ground truth file as well, separating genuine vehicles from misbehaving vehicles. On the other hand, in a simulation, the amount of log files equals the amount of the network’s nodes. Each node generates a log file, including all the Basic Safety Message received from other devices.

Since there are as many log files as receivers, the first phase is to merge all of the different log files into one file. Then, log files and ground truth files must be joined by mapping the ground truth file to the log files for each simulation.

The unique message ID identifier is present in both ground truth files and log files. The attacker type in the ground truth file should be transferred to the information in the merged log file to build a merged dataset for a single scenario, as shown in Figure 1.

FIGURE 1. - Log file and Ground truth file data extraction to generate VeReMi Extension.
FIGURE 1.

Log file and Ground truth file data extraction to generate VeReMi Extension.

In order to produce a labeled database, we introduced an identifiable characteristic named “class” for our target class into the merged database.

We assigned the class number 0 to the genuine vehicles while for misbehaving vehicles, the class number is between 1 and 19.

The database includes a number of errors caused by faulty On Board Units or vehicle sensors that result in inaccurate results for position, acceleration, velocity, and heading, as well as a variety of cyber-attacks when the vehicle sends an incorrect message [7]. For easier reading, Table 1 (a, b) summarizes additional details and statistics regarding the created dataset.

TABLE 1 (a) Detailed Information on the Data and (b) Statistics in the Produced Database
Table 1- 
(a) Detailed Information on the Data and (b) Statistics in the Produced Database

However, this dataset has an unbalanced class structure; 59,488% of the dataset is made up of

records in the normal category, while classes like DoS Random Sybil and Dos account for 1,322% and 1,312% of the total dataset, respectively.

You can access the generated database of this paper at https://dx.doi.org/10.17632/k62n4z9gdz.1 which is an open-source online data repository maintained by Mendeley Data [33].

B. Classification Approaches

This Section describes the various machine learning classifiers that are used in this investigation.

1) Random Forest (RF)

It is [34] and [35] a supervised ML technique that blends various decision trees to produce predictions. By choosing portions of the training data and characteristics at random, it generates a series of decision trees. For classification or regression, the results from each tree are combined by voting or averaging to produce the final prediction. Accuracy is increased, high-dimensional data is handled, and overfitting is robustly prevented via Random Forest. It also provides measurements of feature relevance. For classification and regression applications, Random forests are an effective and widely utilized method.

2) Logistic Regression (LR)

It is [36] a supervised ML method employed to address multivariate classification issues. It models the association between the input variables and the likelihood of a favorable outcome using a logistic function. In the training stage, the model learns the most suitable weights, and a decision boundary is created to distinguish the categories. Interpretability is a benefit of logistic regression since the coefficients show how each input variable affects the likelihood of the outcome. To avoid overfitting, it can use regularization techniques. Although Logistic Regression is straightforward, understandable, and effective, it may struggle with unbalanced classes or complex correlations.

3) Support Vector Machine (SVM)

In this classifier [37], suppose that T is a training set for binary labeled categorization: T is equal to {(x1 , y1 ), (x2 , y2 ),...(xn , yn )}, in which y_{i} is either +1 or -1. A support vector machine looks for a hyper-plane with the maximum margin separating it from the nearest point. The definition of a hyper-plane is as follows:\begin{equation*} w^{T}x-b=\mathrm {0}, \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where b is the bias value and w is a norm vector. The data points can be split up if x_{i} meets the criteria listed below:\begin{equation*} \left ({\frac {w^{T}x_{i}+b\ge +1, y_{i}=+1;}{w^{T}x_{i}+b\le -1,y_{i}=-1.} }\right). \tag{2}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

The ones that are nearest to the hyper-plane are referred to as support vectors. When two parallel hyper-planes are separated, the distance is represented as \begin{equation*} \gamma =\frac {2}{\left \|{ w }\right \|}. \tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

The best hyper-plane is determined by its highest value of \gamma . This issue can be thought of as the related optimization method:\begin{align*} &\mathop {max}\limits _{w,b}{\frac {2}{\left \|{ w }\right \|}} \tag{4}\\ &s.t.y_{i}\left ({w^{T}x_{i}+b\ge +1 }\right),\quad i=1,2,\ldots n. \tag{5}\end{align*}

View SourceRight-click on figure for MathML and additional features.

4) K-Nearst Neighbours (KNN)

For classification and regression problems, the K-Nearest Neighbors (KNN) [38] supervised machine learning classifier method is utilized. In the training set, it assigns a new observation to the majority category of its KNN. The approach entails picking a number for K, determining the separations among the new observations with the training points, determining the K nearest neighbors, determining the majority category, and making predictions for each data point in the test database.

5) Naïve Bayes (NB)

It is (NB) [39] a supervised ML technique for classification that is founded on the Bayes theorem and presumes feature independence. To predict new occurrences, it calculates probabilities using training data. The Naive Bayes algorithm is frequently utilized for categorization tasks since it is easy to use and effective. When the independence assumption is broken, it cannot perform as well.

6) Multi-Layer Perceptron (MLP)

A sort of artificial neural network used in artificial intelligence is called an mlp (multi-layer perceptron). It is made up of layers of connected nodes, with an input layer for taking in data, hidden layers for doing complicated alterations, and an output layer for displaying findings. Mlps learn through data by modifying the biases and weights of the nodes during training to produce precise predictions. They are frequently employed in tasks such as regression, categorization, and pattern recognition in different fields and are well renowned for their capacity to model complex patterns in data.

7) Building Ensemble Learning

To provide an effective model, the ensemble model is built by carefully merging base models. In order to address a classification issue that cannot be easily handled by either of the single classifiers, [40] ensemble learning applies a variety of learning classifiers, which is more efficient than single models. Because it is effective and simple to use, the majority voting technique was employed [41]. Suppose class wc\ast is selected by the ensemble classifier. The mathematical formula for majority voting, commonly referred to as plurality voting, is given below (6):

With: dt,c appartient 0,1 is the t-th model’s decision,

t = 1,\ldots , T, and the model number is T.

c = 1,\ldots , C, and the category number is C.

We used training data to train basic classifiers such as NB, SVM, MLP, LR, RF, and KNN. We used testing data following training to assess the accuracy

of our algorithms, with each classifier providing a unique prediction. The predictions made by these algorithms serve as extra input for the classifier ensemble, which functions as a merged classifier trained to generate the ultimate prediction. The suggested ensemble classifier is illustrated in Figure 2.\begin{equation*} \sum \nolimits _{t=1}^{T} d_{t,c} =\max _{m}\sum \nolimits _{t=1}^{T} d_{t,c} \tag{6}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

FIGURE 2. - Suggested ensemble learning classifier.
FIGURE 2.

Suggested ensemble learning classifier.

C. Data Preparation

To prepare the raw data, which contains noise, inconsistencies, and unnecessary information, before it can be utilized for evaluation or the modeling procedure, we started with pre-processing, as explained below, before applying the previously mentioned algorithms into action. As seen in the details below, we then picked the relevant features for the dataset.

In addition, 70% of the database was used for training and 30% for testing.

Using a computer with 8 GB of RAM and an Intel Core i5-10300H CPU, the experimental findings were accomplished.

The Jupyter Notebook was then used to create the different ML models for misbehavior detection that were discussed in the Section above.

To select the most suitable hyperparameters for every method of classification and to prevent overfitting issues, as described below, we looked at hyperparameter tuning techniques. Our proposed methodology’s steps are shown in Figure 3.

FIGURE 3. - Diagram of proposed system 5-fold cross validation (CV).
FIGURE 3.

Diagram of proposed system 5-fold cross validation (CV).

1) Pre-Processing

Preprocessing refers to turning unstructured raw data into a clean, organized state that may be used for modeling or analysis. To assure the accuracy and suitability of the data, we discovered, dealt with missing data, and eliminated duplicate entries from the VeReMi extension dataset as part of the essential processes. Additionally, normalizing data entails scaling numerical data to a uniform range. It makes sure that each characteristic has a comparable scale and prevents some traits from outweighing others because of their higher values. A min-max scaler was used to rescale our data to a particular range (for example, between 0 and 1).

2) Feature Selection Models

The selection of the best features has a big impact on efficiency when it comes to feature extraction because it greatly reduces train time and increases accuracy. On the other hand, maintaining features that are only marginally relevant can have a detrimental impact on performance.

In previous work [29], we tested three methods to select the relevant features, such as the Recursive Feature Elimination (RFE) [42], the f-test in one-way Analysis of Variance (ANOVA), and the impact of each attribute on the labeled class feature. When comparing the three methods, the RFE gives the best precision in terms of classification of misbehavior. So, we used it in this study.

Recursive Feature Elimination is a feature selection method that repeatedly removes less significant characteristics to find the ones that are most pertinent to a machine learning assignment. A model is chosen, the attributes are ordered by relevance, the least significant attributes are eliminated, and the procedure continues until the halting requirements are satisfied. RFE takes attribute interactions and dependencies into account, which enhances the efficiency of models while reducing learning time and improving interpretability. This method results in the selection of 12 characteristics from the VeReMi extension dataset: spdx, spdy, spdx_n, spdy_n, posx, posy, hedx, hedy, hedx_n, hedy_n, aclx, acly.

3) Classification of Misbehavior

The OVA technique is combined with artificial intelligence algorithms in the suggested method for classifying abnormal behavior, as described in Figure 3, to overcome the difficulties provided by unbalanced data and enhance the precision of the outcomes of classification.

Misbehavior here alludes to an attack or a fault in vehicular communication. The goal is to develop a framework for classification that can correctly identify and categorize various forms of misbehavior in the unbalanced VeReMi extension dataset.

D. Tuning Hyper-Parameters with K-Fold Cross-Validation

The models whose hyper-parameters must be tuned for this investigation are MLP, SVM, RF, and KNN. Past research suggests that fivefold CV can be performed without losing power, essentially reducing computational time in half as compared to ten-fold CV. Thus, we tweak hyper-parameters using a 5-fold CV [43] as shown in Figure 4.

FIGURE 4. - OVA-binary tree classification.
FIGURE 4.

OVA-binary tree classification.

The 5-fold cross-validation procedure is below:

  1. First, divide the input data into five groups.

  2. For every team:

  3. Select one group to serve as the foundation for testing data collection.

  4. Use the rest of the groupings as your train set.

  5. Fit the classifier to the train set, and then utilize the testing set to assess the model’s effectiveness.

  6. The average of the 5 hyper-parameters will then be used to create the final hyper-parameters.

The optimal hyper-parameter outcomes are outlined in Table 2.

TABLE 2 Optimal Hyper-Parameters
Table 2- 
Optimal Hyper-Parameters

E. Proposed Approach Based Ova Binary Tree for Classification

In this part, we introduce the suggested approach, the OVA Binary Tree (OVA-BT), which entails the building of a binary tree, the partitioning of non-leaf nodes in the tree using the OVA-BT structure, and the investigation of the binary tree for categorization.

1) Elements of the One Vs. All Binary Tree and Node Production Process

In this part, we provide a binary tree-based method

Of hierarchical multi-class categorization [44]. In hierarchical multiclass categorization, the tree structure and its construction phase have a considerable impact on classification results; hence, it is crucial to create an effective tree design [8]. It is crucial to perfectly split the trained examples into two groups (one class against the rest) for every node of the tree in order to build an ideal binary tree.

In this study, we solve multi-class categorization issues using the OVA binary tree strategy and optimize the tree structure using a top-down method.

The OVA-BT is built by beginning with a node with K categories and selecting one distinguishable category from the rest (k-1) categories on a regular basis to split a node with (\text{K}\ge 2 ) categories into two sub-nodes.

As will be shown in (7), when the database is unbalanced, the minority classes are prioritized as the “One” class at specific levels and are chosen as the “One” class at each level of the OVA binary tree. This may decrease the possibility of false negatives and increase the binary classifier’s accuracy for minority classes.

Every node of a binary-tree is made up of classes and patterns that set details. Based on the data present, nodes can be classified as root, leaf, or internal. The binary tree’s root node is found at the top level, containing all classes and serving as the beginning point for training. Leaf nodes only contain one category. Rest classes are found in internal nodes denoted by the symbol nleaf. A level n (where n = 0; 1; 2;...; K -1) represents a layer. Only the root of the tree is present at the topmost level (n = 0). We find 2 leaf nodes at the lowest level (n=K-1). A leaf node and an internal node are both present at the intermediate levels.

2) Partitioning Non-Leaf Nodes

The technique used for splitting a nleaf node into two child nodes in a binary tree is important for the classification results [8]. The One vs. All binary tree is used in this experiment to set the conditions for dividing nleaf nodes and building the binary tree while keeping the efficacy and efficiency of tree building into account. To establish a class for producing the leaf node (left child), as defined in (7) and shown in Figure 5, \begin{align*} &\hspace {-1pc}{Partition}_{n} \\ &=arg{min}_{c}\left ({w_{1}.\vert P_{n}^{c}\vert -w_{2}.AR\left ({P_{n}^{c} }\right)+w_{\mathrm {m}}.AG\left ({P_{n}^{c} }\right) }\right), \tag{7}\end{align*}

View SourceRight-click on figure for MathML and additional features. where the binary tree’s nleaf node in step n contains a collection of P_{n}^{c}\left ({\forall n=0,1,\ldots,k-2 }\right) OVA type designs for category c ,

FIGURE 5. - Performance of Different Models on VeReMi Extension Dataset in Classic Approach (C-LAMC) and New OVA-BT Approach.
FIGURE 5.

Performance of Different Models on VeReMi Extension Dataset in Classic Approach (C-LAMC) and New OVA-BT Approach.

{\vert P}_{n}^{c}\mathrm {\vert } shows the amount of patterns corresponding to the amount of distinct patterns observed in the database,

AR\left ({P_{n}^{c} }\right) represents the average coverage rate of patterns that indicates how well the patterns that have been found cover the occurrences in the database. It is determined by taking into account the proportion of occurrences covered by each pattern and then averaging those coverage rates across all patterns, and AG\left ({P_{n}^{c} }\right) represents the average degree of patterns, measuring how well or extensively the discovered patterns are used or implemented within the framework of a pattern-based organization. It displays the average number of nodes, or structural elements, for each pattern.

The weight \text{w}_{\mathrm {j}} , (\forall \text{j} =1, 2, \ldots m) indicates the significance or impact of each feature when making predictions.

A level n nleaf node may be decomposed in a leaf node (left child node), utilizing the category with the lowest number of Partitionn and an internal node (right child node) through the use of the rest categories. The shape of the concatenation of the number of patterns, the average coverage, and the average degree of the pattern set given in (7) was obtained from a scientific investigation of the experiment findings of earlier research [25]. Inferentially, a smaller number of patterns, a lower average degree of patterns, and a higher average coverage rate of patterns in the pattern set under discussion indicate that the pattern set’s category differs logically from other categories.

3) Multi-Class Categorization Using One Vs. All Binary Tree Explorations

In the root of the tree, a ML model is trained using examples from the first grouping as positive samples and examples from the second grouping as negative samples. Assigning the categories of the first set to the left sub-tree (leaf node) and the categories of the second set to the right sub-tree (nleaf node). Once each group has been split into two sub-groups using the aforementioned approach, the procedure is repeated until there is just one category per set that represents a leaf in the binary tree.

Starting at the tree’s root, each new data is identified. It is decided whether to allocate the input vector to one of the two different groups displayed through each node of the binary-tree by moving the pattern to the left or right sub-tree. There may be many categories among each of these groupings. This one is done repeatedly up to the instance attains the leaf node for the category it has been allocated to, which is at the bottom of the tree.

F. Performance Evaluation

Because of the imbalanced dataset, traditional evaluation metrics like accuracy can be misleading. A high accuracy score may be achieved by correctly classifying the majority classes while performing poorly on the minority classes. Consequently, evaluation metrics like recall, precision, F1-score, and area under the precision-recall curve (AUC-PRC) are more suitable for assessing classification performance in imbalanced multi-class datasets.

In our investigation, we chose Python because its libraries are helpful for ML. Numpy, Scikit-Learn, Matplotlib, and Pandas are the libraries that we utilized. However, Scikit is designed to work with NumPy and other scientific and mathematical libraries for Python. The Scikit-Learn library does not concentrate on uploading or summarizing the data; instead, it concentrates on modeling. The NumPy library provides operations for manipulating matrices, the Fourier transform, the algebraic domain, and arrays. To work with databases, one requires the Pandas library. It has tools to clean, examine, analyze, and transform data. Python’s Matplotlib is a graphics library to visualize the results.

Traditional criteria for assessment, such as accuracy, may be misleading due to the unbalanced sample. Correctly categorizing the majority of categories while scoring poorly on the minority categories can result in a high accuracy rating. Therefore, assessment metrics including recall, precision, confusion matrix, F1-scores, Receiver Operating Characteristic (ROC), and Area Under the Curve (AUC) are better suited for evaluating the efficiency of classification in unbalanced multiclass databases.

A confusion matrix aids in the visualization of a ML classifier’s classification. A row represents a predicted category while a column represents an actual category, or the other way around.

As demonstrated in (8), low precision demonstrates that the algorithms output a lot of false positives, whereas high precision shows that the system is capable of differentiating between genuine and malicious nodes.\begin{equation*} Precision=\frac {TP}{(TP+FP)}. \tag{8}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

As suggested in (9), recall influences the algorithm’s capability to detect malicious nodes; a low recall signifies that inappropriate behavior is more difficult to detect.\begin{equation*} Recall=\frac {TP}{(TP+FN)}. \tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

The harmonic combination of recall and accuracy is known as the F1-score. It sets a trade-off among recall and accuracy, which means a higher F1-score suggests a higher recall and precision value, meaning that the algorithm is going to be more efficient, as stated in (10).\begin{equation*} F1-Score=\frac {\mathrm {2\ast (}Precision\ast recall)}{(Precision+Recall)}. \tag{10}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

In True Positive (TP), the system predicted misbehaving accurately; however, in False Negative (FN), the malicious vehicle was misclassified as normal by the algorithm. The model in False Positive (FP) wrongly classified the data instances as illegitimate, but in True Negative (TN), the machine learning models predict the negative instances properly.

We used the ROC and the AUC to visualize and evaluate classifiers; ROC plots are a very helpful tool. By displaying the true positive rate (vertical axis) versus the false positive rate (horizontal axis), it assesses the model’s capacity to differentiate between positive and negative categories. The AUC score ranges from 0 to 1, with 1 being the best performance and 0 representing the worst. The ROC AUC measurement makes it simple to compare various models and is excellent for assessing unbalanced databases. Equation 11 below provides the AUC calculation method.\begin{equation*} AUC=\frac {\sum {Rank}_{i\in positiveclass} -\frac {M\mathrm {(1+}M)}{2}}{M\ast N}, \tag{11}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where Rank i indicates the instance’s rank value, M is the amount of positive occurrences, and N indicates the amount of negative instances.

SECTION IV.

Results and Discussion

This Section addresses multi-class unbalanced classification issues by developing a complete empirical investigation. Using the novel One-Versus-All Binary Tree (OVA-BT) approach for multi-class unbalanced classification in comparison to classic methods, we would like to assess the efficacy of the aforementioned ML classifiers and ensemble learning.

In a prior study [29], we used the VeReMi extension database to build a single multi-class classification called “Classic Learning Approach for Multi-Class Classification” (C-LAMC), but the results were poor since we learned from an unbalanced dataset. The OVA-BT technique, which separates the multi-class categorization issue into a group of binary sub-issues, is developed in this study in an effort to improve these results.

In fact, the performance of our suggested OVA binary tree technique and the Classic were compared in our study using the unbalanced VeReMi Extension database. Our main goal was to evaluate how well our new approach handled multi-class misbehavior identification in this specific database.

In this paper, we desire to demonstrate how OVA-BT can effectively support learning from multi-class unbalanced datasets.

The models used were created with Scikit-learn, and the hyper-parameters given in the previous Section were maintained. We mention that the preceding Section discussed the evaluation metrics. The testing set was utilized to evaluate the suggested method.

As demonstrated in Table 3 and Figure 6, the precision scores demonstrating the rate of identified true attack, we compare the results achieved in the C-LAMC to our new proposed OVA-BT scheme.

TABLE 3 Performance of Dfferent Models on Veremi Extension Dataset in Classic Approach (c-Lamc) and New OVA-BT Approach
Table 3- 
Performance of Dfferent Models on Veremi Extension Dataset in Classic Approach (c-Lamc) and New OVA-BT Approach
FIGURE 6. - Confusion Matrix and Classification Repot for best.
FIGURE 6.

Confusion Matrix and Classification Repot for best.

The higher precision of OVA-BT is provided by the ensemble classifier (Ensemble_OVA-BT), with 76%. This precision was 39% in C-LAMC. The employ of the OVA-BT strategy is particularly helpful in detection of misbehavior in VANETs. In this instance, the rate of precision increase is 37%.

Ensemble classifier was followed by MLP_OVA-BT with 0.71%, which is an increase of 25% over the classic MLP.

The precision of the KNN_OVA-BT, SVM_OVA-BT, and RF_OVA-BT classifiers was increased by 10%, 20%, and 4%, respectively, whereas the precision of the Naïve Bayes and Logistic Regression models remains with insufficient, both with the use of OVA-

BT and the classic approach C-LAMC.

These outcomes demonstrate the benefits of the OVA-BT approach in solving the difficulties of multi-class misbehavior detection in vehicular communication. We acknowledge that, when looked at separately, the best precision obtained with OVA-BT may appear to be very small. It’s important to stress that, when used with the same VeReMi dataset, this precision greatly exceeds classic (C-LAM) methods.

The accuracy of 0.39 attained by the traditional approach serves as a baseline that was previously employed in VANET misbehavior identification. With a precision of 0.76 in the ensemble learning classifier, the new OVA-BT technique significantly outperforms the baseline. This increase is significant since it shows how the OVA-BT method may improve the precision of misbehavior identification.

Our findings show a considerable improvement in the field of Vehicular Ad hoc Network misbehavior identification when compared with classic methods. The precision and robustness of misbehavior detection systems in actual production contexts may be improved, but our work offers a potential starting point for further investigation and enhancement approaches.

The confusion matrix for the better and worse models, MLP and Logistic Regression using the proposed OVA-BT scheme, is shown in Figure 7.

FIGURE 7. - (MLP_OVA-BT) and worst (LR_OVA-BT) classifiers.
FIGURE 7.

(MLP_OVA-BT) and worst (LR_OVA-BT) classifiers.

The outcomes shown in Table 4 show the evaluation performance on training and testing sets using the proposed OVA-BT approach of Naïve Bayes, Logistic Regression, Random Forest, KNN, MLP, SVM, and ensemble learning classifiers, as denoted KNN_OVA-BT, LR_OVA-BT, NB_OVA-BT, SVM_OVA-BT, RF_OVA-BT, MLP_OVA-BT, and Ensemble_OVA-BT, respectively.

TABLE 4 Evaluation of Various Classifiers’ Performance on Training and Testing Sets Using OVA-BT
Table 4- 
Evaluation of Various Classifiers’ Performance on Training and Testing Sets Using OVA-BT

The MLP_OVA-BT gave the best result with 71.37%, 71.31%, and 71.28% in precision, recall, and F1-score, respectively. Random Forest (RF_OVA-BT) comes in second place with 71.2% in precision, 70.1% in recall, and 71.15% in F1-score. The SVM_OVA-BT comes in third place, whereas the NB_OVA-BT and LR_OVA-BT algorithms generate the worst score with 24.71%, 26.58%, and 24.7% in precision, recall, and F1-score, respectively, in LR_OVA-BT.

Table 4 shows that for these 3 metrics on the testing set, ensemble learning has the highest predictive accuracy (recall = 0.783, precision = 0.7568, and F1-score = 0.768). It is important to note that recall, precision, and the F1-score are more helpful if the category distribution is unequal.

The confusion matrix on the testing set of the ensemble learning classifier; shown in Figure 8, shows that misclassification is the primary factor influencing these three measures. This is most likely due to overlapping classification rules and the low number of classes in some categories (see Table 1, b) above for the distribution of classes).

FIGURE 8. - Confusion matrix and Classification Report of the classifier ensemble on the testing set.
FIGURE 8.

Confusion matrix and Classification Report of the classifier ensemble on the testing set.

The individual models’ and ensemble learning’s ROC curves and AUC values are exposed in Figure 9. It is clear that, when compared to other individual models, MLP_OVA-BT performed better, followed by SVM_OVA-BT, then KNN_OVA-BT, while Naive Bayes (NB_OVA-BT) and Logistic Regression (LR_OVA-BT) produced the worst results. In comparison to MLP_OVA-BT, LR_OVA-BT, NB_OVA-BT, KNN_OVA-BT, SVM_OVA-BT, and RF_OVA-BT, the ensemble learning ensemble_OVA-BT achieves the greatest AUC value of 96%, which has increased by 5%, 16%, 15%, 6%, 1%, and 3%, respectively.

FIGURE 9. - ROC curves produced by various individual model and ensemble learning on the testing set.
FIGURE 9.

ROC curves produced by various individual model and ensemble learning on the testing set.

Although the MLP_OVA-BT classifier’s ROC curve has a lower AUC of 91% as compared to ensemble learning ensemble_OVA_BT with 96%, it performs best locally (on the left top corn). The Ensemble learning would then be preferred in this scenario with regard to the AUC measure (the higher AUC, better the model to differentiate between normal vehicles and misbehaving), but the MLP_OVA-BT model would obviously outperform in certain specific scenarios of misclassification costs.

We evaluated the performance of the classification algorithms Random Forest, KNN, SVM, Naive Bayes,

logistic regression, and Multi-Layer Perceptron (MLP) in our research on misbehavior detection in Vehicular communication using unbalanced databases and the One vs. All Binary Tree (OVA-BT) method for metrics of assessment.

We discovered that because of their bias towards the majority category, Naive Bayes and logistic regression classifiers performed poorly in identifying misbehaving. They had significant precision for cases of normal behavior but poor sensitivity and recall metrics for instances of misbehavior.

On the other hand, MLP performed better in identifying inappropriate behavior. It was able to overcome the difficulties presented by class imbalance due to its capacity to recognize complicated patterns and learn non-linear decision limits. By concentrating on the unique nuances of each class, the OVA-BT method, which assesses each misbehavior class separately, greatly enhanced MLP’s performance.

Our results highlight the significance of picking appropriate algorithms for misbehavior identification in unbalanced databases. Due to their limitations in handling class imbalances and detecting complicated inappropriate behavior patterns, naive Bayes and logistic regression aren’t always the best options. A more efficient method for identifying occurrences of misbehavior is MLP using the OVA-BT approach.

SECTION V.

Conclusion and Future Research

In this paper, we addressed the issue of misbehavior detection in VANETs employing the One vs. All binary tree (OVA-BT) approach for unbalanced data.

We employed a fivefold CV to tune the hyper-parameter of the individual classifiers in order to improve the classification accuracy of the VeReMi extension dataset. Evaluation metrics like the confusion matrix, precision, recall, F1-Score, ROC curve, and AUC values are used to assess the performance of the ensemble learning and six individual machine learning models: Random Forest, MLP, k-NN, Logistic Regression, SVM, and Naïve Bayes.

Additionally, we examined the impact of dividing the multi-class issue into binary models and the voting process inside various ML algorithms. Data is easier to understand when it is divided into multiple binary issues. When compared to a classic single, multi-class classification, the testing results demonstrate that our newly developed method using the OVA binary tree outperforms.

As future work, we propose to offer a trade-off between trying to balance the database and the objective that all the categories must be presented in an equitable split in order to handle the difficulties involved with an unbalanced database when utilizing the OVA-BT strategy. Moreover, we shall enhance the One Versus One classifiers to identify which categories are challenging to distinguish as well as which ones differentiate efficiently from each other. We will also try to test using SMOTE and DSSTE techniques to balance the data.

References

References is not available for this document.