Comparative Performance Evaluation of Intrusion Detection Based on Machine Learning in In-Vehicle Controller Area Network Bus

Communication between the nodes in a vehicle is performed using many protocols. The most common of these is known as the Controller Area Network (CAN). The functionality of the CAN protocol is based on sending messages from one node to all others throughout a bus. Messages are sent without either source or destination addresses. Consequently, it is simple for an attacker to inject malicious messages. This may lead to some nodes malfunctioning or total system failure, which can affect the safety of the driver as well as the vehicle. Detecting intrusions is a challenging problem in the context of using CAN bus for in-vehicle communication. Most existing work focuses on the physical aspects without taking into consideration the data itself. Machine Learning (ML) tools, especially classification techniques, have been widely used to address similar problems. In this paper, we use and compare several ML techniques to deal with the problem of detecting intrusions in in-vehicle communication. An experimental study is performed using a real dataset extracted from a KIA Soul car. Compared to previous work, which focuses on detecting intrusions based on the physical aspect, this paper aims to concentrate on the application of data analysis and statistical learning techniques. Furthermore, the paper provides a comparative study of the most common ML techniques. The results show that the techniques under consideration in this paper outperform other techniques that have been used previously.


I. INTRODUCTION
Recently, a considerable amount of research has focused on vehicle communication technology, such as smart vehicles, Vehicular ad hoc Networks (VANET) [1], [2], and Intelligent Transportation Systems (ITS). Vehicles are necessary for daily life, and they are becoming more electronically equipped and are on longer simple mechanical machines. Electronic Control Units (ECUs) are used in vehicles to monitor and control different components. ECUs are connected through buses managed by several protocols [3], [4]. A vehicle bus is an intravehicular communication network The associate editor coordinating the review of this manuscript and approving it for publication was Gustavo Callico . that does not have a host computer. A bus is used to link a set of ECUs to simplify the task of exchanging messages as well as diagnostics. Intravehicular networks have many advantages [5], including (1) reducing the cable budget, which is the third most costly system after the engine and the chassis; (2) minimizing the packaging space by using fewer connections for more electrical and electronic features, allowing a reduction in vehicle size; (3) meeting higher bandwidth demands that can manage the large number of ECUs, with some vehicles containing up to 70 ECUs with 2500 internal signals [5]; and (4) making communication more reliable because bus-based communication is more robust than the traditional point-to-point communication in older vehicles.
Currently, the most widely used protocols for in-vehicle communication are [3]: • Local Interconnection Networks (LIN), • Controller Area Networks (CAN), • FlexRay, • Ethernet, • Media-Oriented Systems Transport. All these protocols are based on bus communication, and each of them has certain advantages and weakness compared to the others. Among these protocols, we have chosen the CAN bus protocol, developed by Bosch in 1985 [6]. This protocol is used in the majority of vehicles today. Approximately 500 million CAN chips are used in vehicles [5]. In addition, a recent study predicted that the CAN bus will maintain its prosperity for the next decade [5]. The CAN bus is the leading technology due to its low cost compared to other protocols, the maximum bit rate for high-speed CAN is 1 Mbit/s by specification, and its acceptable fault tolerance behavior relative to the other intravehicular communication protocols mentioned earlier.
Despite its advantages, CAN bus suffers from many vulnerabilities. The main problem is that a CAN lacks any kind of security mechanism because it was not considered in its design [7]. Attacks on a CAN bus can come from outside, particularly from the On-Board Diagnostics (OBD) [8], or from other wireless interfaces, such as cellular links, Wi-Fi, and Bluetooth [5], [9]. Figure. 1 illustrates a combination of attack types, attack surfaces, and vulnerable assets. Modern cars are exposed to various types of attacks on the CAN bus from external devices connected to the car, particularly from OBD.
The first type of attack includes frame falsifying, sniffing, and relay attacks, which can be addressed by encryption and improving authentication. The second type includes impersonation, Denial of Service (DoS), and fuzzy attacks, which must be treated by developing an Intrusion Detection System (IDS) to distinguish between normal behavior and an attack.
Most of the previous research dealing with security problems in the CAN protocol have been concentrated on physical aspects, such as limiting physical access or using cryptography to protect CAN transmission [10]. However, there is still a need to achieve better IDS. Indeed, the limitation of physical access will affect the effectiveness of transmission in CAN bus. Cryptography is not always suitable with such a lightweight system. This will be discussed in detail in the related work section.
Over the last decade, Artificial Intelligence (AI) tools have generated interesting and effective results in solving complex problems that resemble ours, such as automatic system diagnostics and identification [11], fault detection in wireless sensor networks [12]- [16], and certain security problems in other fields. Thus, ML techniques, as the most interesting approach in the field of AI, can be very effective for the detection of intrusions. There are three ML models for prediction: (1) the regression model, (2) the classification model, and (3) the clustering model. For real-time or predictive intrusion detection, classification-based model or clustering-based model are applied where the former is used in the case of a supervised problem and the later is considered in the case of a non-supervised problem.
The objective of this paper is to comparatively study intrusion detection systems based on different ML models. For that, Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and MultiLayer Perceptron (MLP) have been used to enhance other models applied recently with the same dataset. Unlike previous studies, we perform intrusion detection by attack type on the KIA Soul dataset as one of the comparison criteria where we consider three type of attacks, including DoS, impersonation, and fuzzy attacks.
Security was not considered when bus-based CANs were designed in the 1980s [9]. However, most modern vehicles use bus-based CANs, which is a non-secure network that can be hacked by injecting faulty messages. Consequently, attacks can cause accidents, possibly resulting in injury or death. This makes protecting CAN bus-based network a high priority in order to ensure the safety of drivers and passengers. While existing research works have used ML models to deal with this challenging problem, they appear to be insufficient and can be enhanced by using other ML models. This motivates us to explore the capabilities of other advanced ML techniques, such as SVM, DT, RF, and MLP to overcome the existing security concerns with in-vehicle CAN buses.
The main objectives of the paper are as follows: • Develop intrusion detection-based ML in In-Vehicle controller area network bus through applying various ML techniques in the context of in-vehicle CAN bus networks as an IDS.
• Conduct a comparative performance evaluation of applied ML for intrusion detection in an in-vehicle CAN bus using a set of classifiers on a real dataset that includes messages transmitted using a CAN bus extracted from a KIA Soul car [6].
• Detect both the intrusion and the attack type: DoS, impersonation or fuzzy attack.
To the best of our knowledge, this is the first time that RF, DT, SVM, and MLP are applied with the KIA Soul dataset. The results of our experimental study show that RF not only outperforms SVM and DT but also the other classifiers (Hierarchical Temporal Memory (HTM), Recurrent Neural Networks (RNN), and Hidden Markov Models (HMM)) previously used in the same context.
The rest of this paper is organized as follows: Section 2 outlines the related work. In Section 3, a review of the classifiers used for intrusion detection is given. The experimental study and a discussion of the results are presented in Section 4. Finally, Section 5 concludes the paper.

II. RELATED WORK
Protecting communication inside vehicles is very important since it affects the safety of vehicles as well as that of their drivers and passengers. Achieving this task in the CAN protocol is challenging due to the shortcomings of CANs, which are vulnerable to many types of attack, including DoS, impersonation, and fuzzy attacks. This makes developing an IDS for this type of network an attractive problem for the research community. Indeed, much research has been undertaken to deal with this problem. In the following, we discuss the most relevant research investigating IDS for intravehicular communication.
In [6], the authors proposed using an analysis of the offset ratio and the time interval between the request and the response; i.e., working on a remote frame and data frame to create an IDS. Analysis of the response performance of ECUs helps to decide if a behavior is an attack (i.e., intrusion detection) or a normal behavior. The authors treated three types of attacks: DoS, fuzzy, and impersonation attacks in CAN-based networks. Some results showed that this approach is very encouraging. However, a metric like accuracy of attack detection is not given to determine whether or not the proposed approach achieved the best detection performance.
Groza and Murvay [8] proposed a bloom filtering-based IDS. A bloom filter is a probabilistic structure for testing whether an item belongs in a set. There are no false negatives with this filter, providing a 100% recall rate. The authors used this filtering method based on frame identifiers and part of the data fields to test frame periodicity, as it facilitates the detection of frame modification attacks or possible replays. The authors tested their contribution with a CAN bus; however, this approach can also be used with other types of in-vehicle communication. The disadvantage of this approach is that the authors that the compare their method with other methods. Furthermore, they included an important overload on ECU, which could affect their time response.
Tariq et al. [17] used RNNs and heuristics to detect attacks, employing the same dataset as [6] used in their study. The detection dealt with three types of attacks: DoS, replay, and fuzzy attacks. The authors used both neural networks and network traffic signatures. The accuracy of intrusion detection was high; however, these authors did not did not propose a technique for unseen attacks.
Neural networks are also used for intrusion detection in CANs in [18]. This study reported good results despite some weaknesses. For example, the detection of replay attacks was not adequate due to the high degree of similarity between genuine frames and injected frames, which makes the time stamp very useful in this case. Globally, the use of neural networks as IDS in CANs is promising and provides satisfactory results while still providing CAN bus communication safety.
A Deep Neural Network (DNN) was used in a novel technique for intrusion detection in CANs [19]. The authors used deep learning techniques to distinguish between normal behavior and attacks. The comparison between DNN-based IDS and standard neural networks shows that a DNN is better in terms of improving detection accuracy with a real-time response.
Wu et al. [20] proposed a novel intrusion detection method based on the information entropy method. This approach uses sliding windows with fixed numbers of messages. The authors show that optimization of the decision conditions and enhancement of the sliding windows help to improve intrusion detection accuracy while decreasing the false positive rate. Furthermore, the effectiveness of the proposed method was demonstrated in an experimental study providing real-time responses to intrusion with important detection precision. Despite promising results, the authors did not consider the impact of the vehicle operation state on information entropy.
Wang et al. [21] used the benefits of hierarchical temporal memory (HTM) to define a distributed anomaly IDS in a CAN-based in-vehicle network. The proposed technique predicts data flow depending on previous state learning in real time. Through an experimental study, the authors showed that HTM outperforms other detection models based on neural networks and HMMs in terms of detection accuracy.
A practical security architecture for a CAN-FD (which is designed to deal with the CAN bandwidth limitations)-based network is defined in [22]. The effectiveness of the proposed architecture was tested on three kinds of microcontrollers. This technique could be considered for use in vehicles manufactured in the future.
Despite the fact that a considerable amount of research has been focused on developing an IDS in CAN-based networks, there is still a need to achieve better systems. Most of the previous work has examined the behavior of exchanged frames or uses the data in the frames only superficially. In addition, traditional classification techniques are not used. The aim of this paper is to mine the data within the exchanged frames deeply and take advantage of the benefits of different classifier methods to define a smart IDS for CANs that is able to detect attacks in real time in order to protect vehicles as well as their drivers and passengers. VOLUME 9, 2021

III. CLASSIFICATION MODELS FOR INTRUSION DETECTION SYSTEMS
We have applied three ML techniques for intrusion detection. Intrusion detection is a supervised classification problem, as we can use a known dataset containing labeled data. The four approaches tested to solve this problem are SVM, DT, RF, and MLP.
In this section, the problem statement is outlined. Next, the four classification techniques used and the evaluation criteria are defined. Finally, the experimental results are given.

A. PROBLEM STATEMENT
Many research studies have dealt with the problem of intrusion detection using experimental approaches and published datasets [6]. In this study, a set of classification techniques are used for intrusion detection in same dataset. The dataset contains three types of attacks, DoS, fuzzy, and impersonation attacks. This dataset was created by injecting messages through the OBD-II port in real CAN traffic belonging to a KIA Soul car.
The data is prepared as shown in Table 1, describing the list of features. The results of applying RF, SVM, and DT will be compared to the latest research studies [21] investigating the same dataset.
Three types of attacks are treated: • DoS attack: This attack occurs when messages with high priority are injected into the CAN bus. The aim of this attack is to occupy the bus with packets carrying identifiers with high priority. This attack is done by the injection of packet 0 × 000 CAN ID in a short cycle inside the traffic.
• Impersonation attack: This attack occurs when an attacker creates an impersonating node for answering remote frames. Thus, data frames will be broadcast periodically by the impersonating node to respond as a target node for remote frames. This attack is performed by inserting packets coming from impersonating node, with an arbitration ID = ''0 × 164''.  Figure. 2, SVMs can efficiently perform non-linear as well as linear classification. For the non-linear model, this technique uses kernel functions.

C. DECISION TREES (DT) CLASSIFIER
A DT is a decision support tool based on the representation of the choices in the graphical form of a tree with the different classification decisions placed in sheets [26]. This technique uses a hierarchical representation of the data structure in the form of decision sequences (tests) for the result prediction class. Each observation, which must be assigned to a class, is described by a set of variables that are tested in the tree nodes. Tests are performed in internal nodes, and decisions are made in leaf nodes.
To explain the principle of this tool, we consider the classification problem. Each element x of the database is represented by a multidimensional vector (x 1 , x 2 , . . . x n ) corresponding to the set of descriptive variables of the point. Each internal node of the tree corresponds to a test performed on one of the variables xi. Once the tree has been built, classifying a new candidate is done by going down the tree, from the root to one of the leaves (which encodes the decision or class). At each level of the descent, we pass an intermediate node where a variable xi is tested to decide which path (or subtree) to choose to continue the descent. To build the tree, the learning base points are all placed in the root node. One of the variables describing the points is the class of the point (the ''ground truth''); this variable is called the ''target variable''. The target variable can be categorical (classification problem) or a real value (regression problem). Each node is cut (split operation), giving rise to several descending nodes. An element of the learning base located in a node will be found in only one of its descendants.
• The tree is built by recursive partition (see Figure. 5) of each node according to the attribute value tested in each iteration (top-down induction). The optimized criterion is the homogeneity of the descendants compared to the target variable. The variable that is tested in a node will be the one that maximizes this homogeneity. • The process stops when the elements of a node have the same value for the target variable (homogeneity).

Figure. 4 shows how RF is used in the context of intrusion
detection. RF is based on creating multiple decision trees and determining the class of each DT [27]. The final class is defined using majority voting. RF uses bootstrap aggregating applied to a learning tree. It operates on a training set, for example, X = x 1 , x 2 , . . . , x n , having Y = y 1 , y 2 , . . . , y n as responses. RF is executed by looping B times. In each iteration, it chooses a sample with changes n training examples X b , Y b from X , Y . Next, RF trains a classification tree f b on X b , Y b . Finally, after finishing the loop, a majority vote is applied to determine the right class.
If C b is the class prediction of the b th RD tree, the final class will be:Ĉ

E. MULIT LAYER PERCEPTRON
The Multi-Layer Perceptron (MLP) is a neural network learning approach. It is a feedforward learning algorithm with several layers of nodes, including an input layer, an output layer, and some hidden layers. This supervised learning technique uses a nonlinear activation function in each neuron. By applying the back propagation training, MLP is able to solve several multidimensional classification problems. It can distinguish non-linearly separable data. With a large number of layers, it can be considered as a type of deep learning technique.

IV. PROPOSED MODEL AND EXPERIMENTAL STUDY
This section describes the evaluation criteria, followed by the results of using ML as an IDS.

A. APPLIED MODEL
The overall architecture of the used model is described in Figure. 6, including the detail of the model workflow. The KIA Soul dataset CAN bus has been extracted from a shared repository. Then, the process of labelling has been performed by executing prepossessing according to the dataset description given in [6]. Then, a set of ML tools has been applied using Python. Finally, the results are presented by attack types. Furthermore, an overall comparison has been made with other ML models executed in other works with the same dataset.

B. EVALUATION CRITERIA
In this paragraph, we define the list of criteria that have been used to evaluate the RF results: Precision, which is defined by the following equation (2): Recall, which is defined by the following equation (3): The f 1−score combines the precision and the recall given by the equation (4):  Finally, accuracy is the most significant parameter representing the success of a classification method, as follows (5): where: • TP: True positive: True intrusion that is detected correctly, • TN : True negative: True intrusion that is not detected, • FP: False positive: Normal behavior that is considered an attack, • FN : False negative: Normal behavior that is not considered an attack.

C. DATASET
We have used a dataset which include DoS attack, fuzzy attack and impersonation attack. This dataset were constructed by logging CAN traffic via the OBD-II port from a real vehicle while message injection attacks were performing. The in-vehicle data was extracted from KIA SOUL.
• DoS Attack: Injecting messages of '0 × 000' CAN ID in a short cycle.
• Fuzzy Attack: Injecting messages of spoofed random CAN ID and DATA values.  Table 1 provides the feature list describing the prepared dataset [6], which includes three types of attacks: DoS, impersonation, and fuzzy attacks. A Python program was executed on a machine with 8GB RAM and an i7 processor. In the following, two kinds of comparison are given. The first comparison is based on attack type, and the second is an overall comparison with well-known methods.

1) COMPARISON BASED ON ATTACK TYPE
As mentioned previously, we consider three type of attacks, which are DoS, impersonation, and fuzzy attacks. In Table 2, a comparison based on attack type is given. Figures. 7, 9 and 8 show classifier results in terms of precision, recall, and f1-score for impersonation, DoS and fuzzy attacks, respectively. We found that the best result for the four classifiers is linked to detecting impersonation attacks.   Meanwhile, the detection of fuzzy attacks is very low. SVM shows the worst performance with fuzzy attacks.
As we can see, the results are poor for fuzzy and DoS attacks. This can be explained by the insufficient number of examples of these attacks in the dataset. We can see that RF outperforms DT, MLP and SVM with impersonation or fuzzy attacks. However, DT performs slightly better than RF and far from SVM and MLP. The worst performance is given by SVM as well as MLP with Fuzzy attacks. The best performance is given with impersonation attacks due to the support included in the dataset 11046. Meanwhile, the worst performance of the three classifiers is with fuzzy attacks, which is explained by the low support.
DT performs better than the other methods when DoS attacks occur. SVM has the worst performance with fuzzy attacks and worse performance compared to DT and RF. For fuzzy attack detection, SVM shows the worst results by no detection at all. In addition, the detection for this attack based on DT and RF is relatively weak. This fact can be explained by the low support of this attack in the dataset.

2) OVERALL COMPARISON
In this subsection, RF, DT, MLP, and SVM results will be compared to those of three other techniques: HTM, RNN, and HMM. The results of these three methods are directly taken from [21], where they were obtained from the same dataset. Table 3 shows the accuracy results for the applied techniques, including RF, SVM, MLP and DT, which contains the values for accuracy, precision, recall, training time, and testing time for the four classifiers (SVM, RF, MLP, and DT) used to detect intrusion. Figure 10 shows a comparison of the precision between the best-known ML techniques (SVM, RF, DT, MLP, RNN, HTM, and HMM). It is clear that the precision of RF, SVM, MLP and DT is better than that of RNN and HMM, but it is slightly worse than HTM. Additionally, in this section we have prepared for each attack a specific database that contains   has decreased. So, with only two classes the recognition improves. Figure. 11 shows the recall factors of the seven methods. The RF, SVM, MLP, and DT classifiers outperform the other techniques (RNN, HMM, and HTM). The most important comparison is that of accuracy. Figure 12 shows that the four classifiers used in this study, RF, SVM, MLP, and DT, outperform other techniques. RF exceeds HTM by 1 : 3%, RNN by 12 : 2%, and almost doubles the performance of HMM. DT also outperforms other techniques by the same rate, while SVM exceeds HTM by 1.2%, RNN 12.1%, and also almost doubles HMM.  In the next part, we present the confusion matrix of all techniques in Fig. 13, 14, 15 and 16.
The different results of each attack show that the number of attacks can influence the learning results. it can even be determining above a certain number. This is logical, as any learning model can only be generalized on the basis of a number of examples. it reminds us of the overfitting and underfitting problems.
Another type of comparison between the performance of different techniques can be made according to the percentage difference, as represented generally by equation 6: In our case general equation 7 can be used as follows: PD(x, y) = 100 × |x−y| x+y 2     outperformance of RF by more than 10%. Ordinary values indicate outperformance of RF by less than 10%. In terms of accuracy, the RF classifier outperforms HTM slightly, and it also outperforms RNN by 10.68%. Notably, RF performs better than HMM by more than 50%. This table clearly shows that the RF classifier is more suitable in the context of intrusion detection for CAN-based in-vehicle networks.
SVM, DT, MLP, and RF achieve better results than RNN because statistical learning techniques are often more efficient in multidimensional problems. In our intrusion detection problem, the input data dimension is 16. The most difficult phase for the statistical learning technique is parameterization, and optimal parameters are crucial to the success of this approach. We thoroughly explored the research space before closing the training phase. This yielded results comparable to the neural network techniques.
We noticed a few disadvantages of SVM technique including the long training and testing time. It takes almost 100 times longer than the others techniques (MLP, DT and RF) to train and to test. Parameterization is also difficult for statistical learning techniques, especially for nonlinear learning. For example, it is difficult to find optimal parameters for the kernel function. We also applied cross-validation. you find in figure 16 all the accuracy rate of the different executions (cv = 5) for each learning approach.

V. CONCLUSION AND FUTURE WORK
This paper deals with an important problem: malicious intrusion in communications in vehicles using the CAN bus protocol. Through an overview of the previous research in this area, we found that most existing studies have examined the behavior of exchanged frames or only superficially used the data contained in the frame without deeply considering the data itself. In addition, these studies do not use traditional classification techniques. For these reasons, in this study, we have proposed the use of the RF, SVM, MLP, and DT classifiers to distinguish between normal and malicious communications. According to the results of the experimental study performed with our dataset, we found that these four machine learning tools outperform the other techniques (HTM, RNN, HMM) in terms of accuracy.
In future work, we will apply non-supervised classification techniques to illustrate the detection performance with some unknown or new intrusions. It will also be important to apply deep learning techniques to large intrusion datasets.