Enhanced Intrusion Detection in In-Vehicle Networks Using Advanced Feature Fusion and Stacking-Enriched Learning

Modern vehicles rely heavily on interconnected electronic control units (ECUs) through in-vehicle networks to perform crucial functions such as braking and monitoring engine RPMs. However, the increased number of ECUs and their connectivity to the in-vehicle network poses a security risk due to the lack of encryption and authentication protocols such as the controller area network (CAN). To address this problem, machine learning (ML) based intrusion detection systems (IDSs) have been proposed. However, existing IDSs suffer from low detection accuracy, limited real-time response, and high resource requirements. This study proposes an accurate and low-complexity IDS for in-vehicle networks based on feature fusion and ensemble learning called the Feature Fusion and Stacking-based IDS (FFS-IDS). FFS-IDS fuses multiple features extracted from raw network traffic and then classifies traffic instances into intrusive and non-intrusive categories using a stacking ensemble learning of basic machine learning classifiers. Specifically, a decision tree is employed as a base classifier, and random forest is used as a meta-learner. This work implements and validates the FFS-IDS using real-time car hacking data sets and achieves better performance than individual decision tree classifiers and popular ensemble learning methods such as Random Forest, LightGBM, AdaBoost, and ExtraTree algorithms. The results demonstrate that FFS-IDS can detect Denial of Service (DoS), Gear spoofing, and RPM spoofing attacks with up to 99% accuracy and Fuzzy attacks with up to 97.5% accuracy using benchmark datasets. Overall, this study shows the effectiveness and practicality of FFS-IDS in detecting intrusions in in-vehicle networks, which is essential for ensuring the cybersecurity and safety of modern vehicles. Future work in this area could involve exploring additional feature extraction techniques and fine-tuning hyperparameters to improve the performance of IDSs further.


I. INTRODUCTION
Modern vehicles are equipped with electronic control units (ECUs) and robust computing systems, which have made them communication and computing-enabled terminals for intra-vehicle and inter-vehicle network communication [1], [2], [3].This increased communication has led to more functionality and comfort, but it has also increased security threats [4], [5], [6].The susceptibility of the controller area network (CAN) to different types of cyber attacks, The associate editor coordinating the review of this manuscript and approving it for publication was Shadi Alawneh .
including fuzzy attacks, DoS attacks, and spoofing attacks, has been a significant concern due to the lack of encryption and authentication policies in the de facto standard of invehicle networks [7].Intrusion detection systems (IDSs) are vital tools for protecting in-vehicle networks by identifying unauthorized events in the networks [8], [9].Machine learning (ML) and deep learning (DL) methods have been successfully implemented in developing effective IDSs for various applications [10], [11], [12], [13], [14].
However, most existing ML-based IDSs suffer from low detection accuracy, limited real-time response, and limited computing resources due to the availability of a large number of features of network traffic [1], [15], [16], [17].This work proposes an effective IDS (called FFS-IDS) for the CAN bus of in-vehicle networks to address these issues.FFS-IDS involves feature fusion and stacking-based ensemble learning to detect intrusions in in-vehicle networks.It fuses multiple features derived from primary features extracted from raw network traffic, capturing more information about network activity and improving the IDS's accuracy.It then classifies traffic instances into intrusive and non-intrusive categories based on stacking ensemble learning of basic ML classifiers, where a traditional decision tree is used as a base classifier and random forest is used as a meta-learner.
This work contributes to the field of in-vehicle network security by proposing a feature fusion method that combines basic features of in-vehicle network traffic to construct more comprehensive data subsets.This approach captures a broader range of information about network activity, improving the accuracy of intrusion detection.Additionally, I propose a stacking-based ensemble learning approach that further combines the outputs of multiple classifiers to improve detection performance.Specifically, we use a Bayes classifier, decision tree, and random forest classifier in a hierarchical structure to learn from the comprehensive data subsets.Finally, I validate the proposed methods using a real car hacking benchmark intrusion detection dataset for invehicle networks.The experimental results demonstrate that this approach significantly outperforms existing state-of-theart methods regarding detection accuracy and false positive rate.
This paper is organized as follows.Section II provides an overview of existing research on intrusion detection in invehicle networks.Section III presents the intrusion detection problem in in-vehicle networks.Section IV introduces the proposed FFS-IDS system based on feature fusion and stacking-based ensemble learning for detecting intrusions in in-vehicle network traffic.The experimental setup, including the benchmark dataset and performance metrics used to evaluate the proposed system, is detailed in Section V.The results of the experiments are presented and compared with existing state-of-the-art methods in Section VI.Section VII highlights threat to validity of this work.Finally, Section VIII summarizes the contributions and discusses future directions for further research.

II. RELATED WORK
There has been a growing interest in developing effective network traffic classification methods in academia and industry in recent years.Various techniques and features have been proposed for this purpose, including deep packet inspection, port number-based classification, and statistical classification methods [11], [18], [19], [20], [21].
Deep packet inspection methods effectively identify known patterns and classify network traffic based on payload content-based information.However, they require specific hardware and have limitations in identifying multimedia-based and encrypted traffic.Port number-based classification methods use transport layer headers' port numbers to classify network traffic accurately.However, they fail to classify traffic from modern applications that do not use popular port numbers.Statistical information-based methods extract high-level features from basic packet header information, and ML and DL-based methods often use this extracted information to classify network traffic accurately.However, these methods may require significant computing resources and may not always provide high accuracy in network traffic classification [22].These in-vehicle intrusion detection approaches' diverse strengths and limitations are compared in Table 1, allowing for a comprehensive understanding of their suitability for different scenarios.Several approaches have been proposed for detecting intrusions in in-vehicle network traffic using different techniques and features.For instance, Alshammari et al. [23] employed K nearest neighbor and support vector machine-based classifiers to detect intrusions in CAN bus traffic.Based on network traffic specifications, Olufowobi et al. [24] developed a real-time IDS for in-vehicle network traffic attacks and evaluated their system's performance using a synthetic and CAN intrusion dataset.In addition, Olufowobi et al. [25] proposed an adaptive cumulative sum method that utilizes statistical change-based information to detect attacks in CAN traffic quickly.Barletta et al. [26] used distance-based information to develop IDSs for in-vehicle networks.They suggested using the k-mean clustering algorithm with an X-Y fused Kohonen network, which demonstrated high performance in detecting intrusion from the CAN dataset.However, their system has computational complexity.Lee et al. [27] developed an IDS for detecting CAN attacks in in-vehicle network traffic using offset ratio and time interval-based information.They demonstrated the performance of their model by simulating different types of attacks, such as Fuzzy attacks, DoS attacks, and impersonation attacks.
DL methods have also been explored for detecting attacks in in-vehicle network traffic [28].Song et al. [10]   using DL methods and different features, such as spatial and temporal features, to detect intrusions in the car hacking dataset.They used a CNN for extracting spatial features and a long short term memory (LSTM) network for extracting temporal features, and the extracted features can correctly classify network traffic of the car hacking dataset.
Leveraging the taxonomy from Table 1 and highlighting the potential of ML/DL and statistical methods, Table 2 dissects the strengths and limitations of diverse approaches, features, datasets, and results.However, direct comparisons remain challenging due to individual study goals, data sources, and evaluation criteria.
Overall, the studies discussed in this section have shown promise in detecting intrusions in in-vehicle network traffic using various methods [29].However, each approach has its strengths and limitations.For example, DL-based methods such as CNN and LSTM networks have shown high accuracy in detecting intrusions, but they are computationally expensive due to their high complexity.On the other hand, statistical methods such as the adaptive cumulative sum method and distance-based IDS have shown promise in detecting attacks quickly with less computational complexity.Still, they may not perform as well as DL-based methods.When choosing an intrusion detection method for in-vehicle network traffic, it is essential to consider the trade-off between accuracy and computational complexity.Furthermore, more research is needed to evaluate the generalizability of these methods to different datasets and their robustness to different types of attacks.Based on the literature review presented and compared in Table 2, some research gaps can be identified: • Limited research on statistical features: While some studies have explored statistical features, there is still a lack of research on effectively utilizing them to develop accurate and efficient IDSs for in-vehicle network traffic.
• Lack of comparative studies: Although various techniques have been proposed for detecting intrusions in in-vehicle network traffic, there is a lack of comparative studies that evaluate and compare the performance of these techniques.Comparative studies can help identify the strengths and weaknesses of different methods and provide insights into which methods are most effective in detecting intrusions in in-vehicle network traffic.
• Limited research on resource-constrained environments: Many existing studies have focused on developing IDSs for in-vehicle network traffic in resource-rich environments without computing power or memory constraints.However, there is a lack of research on developing effective IDSs for in-vehicle network traffic in resource-constrained environments, such as those found in many embedded systems.
• Lack of focus on new types of attacks: While the existing studies have proposed different approaches for detecting various types of attacks, there is a need for more research on identifying and detecting new types of attacks that may be specific to in-vehicle network traffic.As the automotive industry continues to evolve, attackers may develop new attack techniques specific to in-vehicle network traffic, and it is crucial to have IDSs that can effectively detect such attacks.Despite their high performance, DL models can be computationally expensive due to their high complexity.Therefore, it is crucial to develop accurate IDSs for in-vehicle network traffic that utilize less computationally expensive ML models and statistical features.

III. FORMULATING THE PROBLEM OF INTRUSION DETECTION IN IN-VEHICLE NETWORKS
Intrusion detection in in-vehicle networks identifies abnormal events or attacks in the network traffic dataset.To frame this problem, we can define the following notations: Let DT = i 1 , i 2 , . . ., i N be the set of N instances in the in-vehicle network traffic dataset, where each instance represents m-dimensional feature space I. Thus, for an instance i j , the features are denoted as i j = f i1 , f i2 , f i3 , . . ., f im , and i j ∈ I .
To perform intrusion detection, we need to define a mapping ID that maps the input space I to an output space O, indicating the number of classes for network traffic classification.In binary classification, the output space consists of two classes, which can be denoted as O = intrusive, non-intrusive, normal, anomaly, 0, 1, or positive, negative.In multi-class classification, the output space has more than two classes and can be denoted as O This work aims to find a suitable mapping ID: I → O that classifies in-vehicle network traffic into attack classes based on a given network dataset.This study proposes a decision tree-based approach, alonure fusion, and stacking methods.The proposed approach is discussed in detail in the following section.
Strengths of this problem formulation include the precise definition of notations and the focus on identifying abnormal events in in-vehicle network traffic.Limitations of this formulation include the lack of discussion on the types of attacks that can occur in in-vehicle networks and the assumption that the network dataset is already given.

IV. DESIGN OF THE PROPOSED FEATURE FUSION AND STACKING-BASED IDS (FFS-IDS)
This work proposes the Feature Fusion and Stacking based IDS (FFS-IDS) for in-vehicle networks, as shown in Figure 1.The FFS-IDS leverages multiple features extracted from raw network traffic to classify traffic instances into intrusive and non-intrusive categories using ensemble learning of basic ML classifiers in a stacking approach.The proposed system operates in three phases, which are described below.

A. PHASE 1 -CONSTRUCTION OF THE BASIC DATA SET
The first phase involves extracting basic features from raw network traffic and constructing a benchmark dataset.These features capture relevant information about network traffic, including spatial, temporal, and content features.Each record in the dataset represents a network traffic instance in terms of j features as described in Section III.The raw data may contain noise, missing, and non-uniform scale data values.
Data preprocessing is applied to prepare the captured data for further processing by ML models.This involves handling null values, removing noise, removing redundant and irrelevant information, and converting data attributes to a uniform scale.
To demonstrate the performance of the proposed FFS-IDS, we use the car-hacking dataset, which is available in CSV format and contains fields such as Timestamp, ID, DLC, D0-D7, and Tag.Table 3 describes the attributes of the carhacking dataset.
The car-hacking dataset contains data fields of various types, including time, categorical, and numeric fields.However, this raw data cannot be directly used for ML purposes.A CAN traffic preprocessor module is used to make it compatible with ML algorithms that require numeric input, as shown in Figure 2. The preprocessing step involves encoding the ID field, which represents the unique code for each message sent on the CAN bus, into a numeric value ranging from 0 to the maximum number of IDs (Max IDs).In the car-hacking dataset, there are 29 unique CAN message IDs.The DLC field, indicating the number of bytes sent over the network, is normalized to a range of 0 to 1 using Eqs. 1 and 2 [32].Here, val i is the initial feature value i, while Min i and Max i are the minimum and maximum values of feature i, respectively [33].
Here, val i is the initial feature value i, while Min i and Max i are the minimum and maximum values of feature i, respectively [33].
The data fields (D0 to D7) contain data bytes in HEX format, and the DLC field value indicates the length of the data fields.The preprocessor module shifts the tag field to the last column and fills non-available data bytes with an arbitrary symbol, 'M'.All values of D0 to D7 are converted from HEX format to DEC format, and 'M' is replaced with 256.Finally, all values of D0 to D7 are normalized to a range of 0 to 1 using Eqs. 1 and 2.
To encode the tag field into numeric values, Normal (R) is converted to 0, and Attack (T) is converted to 1 for further processing with neural networks.The sklearn python library was used to perform the data cleaning methods mentioned above, specifically, the sklearn.preprocessing.LabelEncoder function for encoding categorical to numeric values and the sklearn.preprocessing.normalizefunction for normalizing values to a uniform scale.

B. PHASE 2 -FEATURE COMBINATION-BASED DATA SUBSET CONSTRUCTION
In Phase 2, we aim to construct a subset of the dataset by combining multiple features generated in Phase 1.As different features have different capabilities to detect anomalous behaviour, using a relevant feature set is crucial in the ML pipeline.I propose constructing comprehensive data subsets based on feature fusion concepts to address this issue.
Network traffic data contain different types of information analyzed in different dimensions, namely spatial and temporal aspects of network data and data content regarding network data behaviour.Efficient network traffic classification requires temporal, spatial, and content features derived from basic network traffic features, supporting and complementing each other in detecting anomalies.Hence, combining different features can result in accurate network traffic classification by characterizing different anomalies using comprehensive data features.Using a fixed or singlefeature dataset may not suffice in detecting anomalies in a complex network such as the in-vehicle network.
I propose constructing a comprehensive data subset by taking all permutations among different features using a feature fusion method to address this issue.This approach ensures both accuracy and diversity in network traffic by combining different feature sets.

C. PHASE 3 -STACKING-BASED ENSEMBLE LEARNING
In Phase 3, we utilize a stacking-based ensemble learning approach that combines the outputs of base classifiers trained on each subset of the dataset generated in Phase 2. While each basic algorithm is trained using a comprehensive feature data subset for partial network traffic learning, it only predicts the probability of a specific network traffic class.The goal of the meta-learner is to take the output of the basic algorithms and produce an overall detection of the network traffic class that is more generalized and comprehensive.
To achieve this goal, we first train a set of basic learning algorithms (BA 1 , BA 2 , . . ., BA n ) to create basic models (BM 1 , BM 2 , . . ., BM n ) using comprehensive feature data subsets (DT 1 , DT 2 , . . ., DT n ).The predicted probabilities of these basic models, BA 1 (p 1 , p 2 , . . ., p n ), BA 2 (p 1 , p 2 , . . ., p n ), . . ., BA n (p 1 , p 2 , . . ., p n ) are then fed to the meta learner.Based on a two-level stacking approach, the final model (FM) is trained using ensemble learning of basic models (BM 1 , BM 2 , . . ., BM n ).The proposed feature fusion and stacking-based IDS algorithm is presented in Algorithm 1.The computational cost of this algorithm depends on updating FM and is O(N * β), where β is a constant value less than N.The feature combination-based data subset construction phase requires O (1).Therefore, the overall computational complexity of the proposed algorithm in terms of space and time is O(N ).Set j ← 1. end while 14: end while 15: FM ← y (DT ′ ) 16: Return FM The proposed system's overall effectiveness and accuracy depend significantly on the accuracy of the base models.To ensure high performance and computational efficiency, I carefully selected the decision tree algorithm as the base learning algorithm and the random forest algorithm as the meta-learning algorithm.The decision tree algorithm is well-suited for classification tasks, providing highly accurate results with minimal computations [34], [35].On the other hand, the random forest algorithm is known to achieve high classification accuracy through multiple decision trees, even in the presence of noise and overfitting issues [35], [36].

V. EXPERIMENTAL SETUP AND IMPLEMENTATION
This section presents a comprehensive overview of the experimental setup, implementation, dataset, performance metrics, and results of the proposed approach for detecting intrusions in in-vehicle networks.I also provide a detailed analysis of the results and highlight significant observations from the comparative analysis with the outcomes of existing approaches.

A. EXPERIMENTAL SETUP
To implement the proposed FFS-IDS, and state of the art methods including DT, RF, LightGBM and ExtraTree methods, I utilized the Anaconda distribution of Python and various libraries such as Pandas, Numpy, and Scikit-learn for loading the dataset, performing pre-processing operations, constructing a comprehensive feature combination-based data subset, and using decision tree and random forest algorithms as base learning and meta-learning algorithms, respectively.The experiments were conducted on a machine with an Intel Core I3-2330M CPU @ 2.20 GHz, 4 GB RAM, and 1 TB HDD running on the Windows operating system.The results are recorded for FFS-IDS and the identified algorithms, DT, RF, LightGBM and ExtraTree methods.
To ensure fair comparisons, we used the default hyper-parameters of the identified algorithms as defined in the Scikit-learn library, as presented in Tables 4 -8.I also utilized commonly used performance metrics to evaluate the effectiveness of the proposed FFS-IDS approach.I compared the results with existing approaches for detecting intrusions in in-vehicle networks.Furthermore, I analyzed the results and highlighted significant observations from the comparative analysis.

B. BENCHMARK DATASET
To assess the performance of the proposed FFS-IDS system in detecting intrusions in in-vehicle networks, we utilized the car hacking dataset introduced in [37].This dataset comprises in-vehicle network traffic data recorded by ECUs over the CAN bus and includes normal and attack traffic.The dataset comprises messages transmitted by ECUs using specific identifiers, which are then received by all connected ECUs [10], [38].
The car hacking dataset includes attacks that disrupt normal vehicle operations, such as braking and RPM gauges.It comprises five types of attacks: DoS attacks, fuzzy attacks, and spoofing attacks on the gear system and RPM gauge, in addition to normal data instances over the CAN bus.
The car hacking dataset lists its features, as presented in Table 3.The dataset was created by logging in-vehicle network traffic through a real vehicle's on-board diagnosis (OBD-II) port.Attack traffic was injected by adding fabricated messages to the in-vehicle network.The dataset comprises 300 instructions for each respective attack class, with each instruction lasting for 3 to 5 seconds.The collected data is presented in CSV format, with separate files for normal and attack traffic, DoS, gear spoofing, RPM spoofing, and fuzzy attacks.Table 9 details the data instances used to validate the proposed FFS-IDS system.I divided the car hacking dataset into training and testing datasets in the ratio shown in Table 10 for training and testing.The experimental setup utilized Python programming language, the Anaconda distribution, and libraries such as Pandas, numpy, and sklearn for dataset loading, preprocessing operations, and constructing a comprehensive feature combination-based data subset.

C. PERFORMANCE METRICS
To analyze and compare the performance of the proposed FFS-IDS system and existing ML approaches for detecting intrusions in-vehicle networks, I computed commonly used performance metrics, including classification accuracy, false positive rate, and true positive rate.These metrics are typically computed from the confusion matrix, which represents the classification results of the IDS.The elements of the confusion matrix are defined in Table 11.The possible outcomes for classifying events are shown in Table 12.
While the confusion matrix is a powerful tool for representing the classification results of IDSs, it may not be beneficial for comparing different IDSs.Various performance metrics have been defined in terms of the confusion matrix variables to address this issue.These metrics produce numerical values that can be easily compared, providing insight into the overall performance of the IDS.Some commonly used performance metrics include classification accuracy, false-positive rate, and true-positive rate [39], [40], [41], [42].By evaluating these metrics, I can analyze and compare the effectiveness of the proposed FFS-IDS system with existing ML approaches for detecting intrusions in in-vehicle networks.5) F-measure (FM): For a given threshold, the FM is the harmonic mean of the precision and recall at that threshold.

VI. RESULTS AND DISCUSSION
This study compares the performance of the proposed FFS-IDS system with other commonly used classifiers, namely the decision tree [43], random forest [44], Light-GBM [45], AdaBoost [46], and ExtraTree [47], which are ensemble learning methods.The evaluation of these methods is based on the car hacking dataset.I conducted ten independent experiments using FFS-IDS and the other classifiers with their default hyperparameters.The performance of these classifiers was evaluated using commonly used performance metrics.To compare the results of these experiments, I visually represented the experimental results using Figure 3 -6.
Tables 13 -16 summarize the comparative analysis of the proposed FFS-IDS system with the existing approaches for detecting intrusion in in-vehicle networks based upon car hacking data set regarding the accuracy, precision, recall, f measure, false-positive rate and false-negative rate.
Figures 3 -6 and Tables 13 -16 demonstrate that FFS-IDS outperforms the baseline methods in detecting intrusions from the car hacking dataset, achieving higher accuracy, precision, recall, F-measure, FPR, and FNR.FFS-IDS performs better in detecting DoS and spoofing attacks than fuzzy attacks, which exhibit more complex behaviour.
Specifically, FFS-IDS achieved detection rates of up to 99% for DoS, gear spoofing, and RPM spoofing attacks,   and up to 97.5% for fuzzy attacks, with a significantly reduced FPR of 0.95% for DoS attacks compared to the other individual and ensemble learning methods.The precision, recall, F-measure, and FNR metrics also show similar superior performance for the DoS attack class, as reported in Table 13.
The comparative results presented in Figures 3 -6 further validate the effectiveness of FFS-IDS in detecting various  attack classes on different datasets, indicating that the feature fusion-based subset of the car hacking dataset, integrated with a stacking-based ensemble learning method, can improve the performance of IDS significantly over the individual decision tree classifier and most popular ensemble learning methods.This dataset's feature construction, followed by the stacking-based ensemble learning method, extracts helpful information for classifying normal and attack network traffic.
Traditional individual classifiers and popular ensemble learning methods reported less accurate results with high FPR and FNR values than FFS-IDS, mainly due to their inability to extract relevant information for normal and attack traffic classification.Moreover, these methods reported poor performance in detecting the fuzzy attack class due to the complex behaviour of fuzzy attacks based on injected messages.Fuzzy attacks are difficult to detect compared to other attack classes such as spoofing and DoS attacks, which require regular injection of attack messages into the invehicle network.Regular injection of attack messages can be easily detected for spoofing and DoS attacks.However, fuzzy attack messages are injected into the network less frequently.
Figure 7 shows box plots of the accuracy, FPR, and FNR metrics.It can be observed that the proposed FFS-IDS has reported stable results in detecting different attack classes, except for the fuzzy attack class, due to its complex behaviour.

VII. ADDRESSING POTENTIAL THREATS TO VALIDITY
Transparency and robustness are paramount in this research, and I comprehensively address potential threats to the validity of this study across four dimensions: internal, external, construct and conclusion validity.
• Threats to External Validity: The generalization of these findings may be limited due to using a specific car-hacking dataset.While this dataset captures various in-vehicle network scenarios, the diversity in real-world conditions may not be fully represented.Additionally, the specific characteristics of the attacks in the dataset may not cover the entire spectrum of potential intrusions in in-vehicle networks.
• Threats to Internal Validity: The experimental design involves using default hyperparameters for machine learning classifiers, which could influence the internal validity as the chosen parameters may not be optimal for the specific characteristics of the dataset.Moreover, the proposed FFS-IDS algorithm's performance is evaluated based on a specific configuration, and changes in the dataset or algorithmic parameters might impact the results.
• Threats to Construct Validity: The feature extraction techniques employed in this study focus on specific aspects of network traffic.Variations in network architectures or the introduction of new attack methodologies might threaten the construct validity, as the chosen features may not comprehensively cover all potential intrusions.
• Threats to Conclusion Validity: The conclusions drawn from the results are based on the specific dataset, experimental setup, and evaluation metrics chosen.
Changes in any of these elements or introducing new metrics could potentially alter the conclusions drawn from this study.

VIII. CONCLUSION AND FUTURE WORK
The increasing number of ECUs in modern vehicles has led to an increasingly connected internal network, the CAN, which has made them vulnerable to malicious attacks.This work proposed an effective IDS for in-vehicle networks called FFS-IDS, which uses feature fusion and stacking-based ensemble learning.FFS-IDS fuses multiple features extracted from raw network traffic and classifies traffic instances into intrusive and non-intrusive categories based on stacking ensemble learning of basic ML classifiers.
The experimental results demonstrated that FFS-IDS outperformed state-of-the-art IDSs in terms of detection performance, achieving detection accuracies of up to 99% for DoS, Gear spoofing, and RPM spoofing attacks, and up to 97.5% for Fuzzy attacks on the car hacking benchmark dataset.This research demonstrates the effectiveness and practicality of FFS-IDS for detecting intrusions in in-vehicle networks.
The future work outlined in the paper encompasses addressing identified limitations and enhancing the proposed FFS-IDS for in-vehicle networks.The paper acknowledges the constraints of using a single dataset for evaluation and default hyperparameters for machine learning classifiers.To overcome these limitations, additional feature extraction techniques can be explored to enhance the detection performance of IDSs.Furthermore, the intention is to fine-tune the hyperparameters of base algorithms, ensuring a more robust and accurate IDS.
The future research directions involve empirical validation and optimization of the FFS-IDS algorithm.The authors propose conducting thorough experiments to measure the execution time on diverse hardware configurations, analyzing each algorithmic stage's time consumption.Scalability concerning dataset size and complexity will be rigorously assessed.Additionally, a detailed analysis of resource consumption will be conducted, including memory usage, CPU and GPU utilization, and network bandwidth requirements.The authors aim to compare the resource consumption of their approach with existing IDS solutions, evaluating its feasibility for real-world deployment.
The optimization focus includes exploring various techniques to enhance the efficiency of the FFS-IDS algorithm.This involves investigating alternative feature fusion methods, optimizing data subset construction algorithms, and exploring lightweight stacking classifier architectures.The overarching goal is to reduce execution time and resource consumption while maintaining or improving the detection accuracy of the system.
Moreover, the paper proposes investigating hardwarespecific adaptations, tailoring the FFS-IDS algorithm for platforms like embedded devices or edge computing environments.This involves developing specialized implementations that leverage the strengths of available hardware resources while minimizing constraints.

FIGURE 1 .
FIGURE 1. Design of the proposed feature fusion and stacking-based IDS (FFS-IDS).

FIGURE 2 .
FIGURE 2. CAN traffic preprocessor for in-vehicle network intrusion detection.

VOLUME 12, 2024 2049
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

1 ) 3 ) 2 ) 4 )
Classification accuracy: It is defined as the ratio of correctly classified instances and the total number of instances.CR = Correctly_classified_instances Total_number_of _instances = TP + TN TP + TN + FP + FN (Detection rate or Recall: It is computed as the ratio between the number of correctly detected attacks and the total number of attacks.DR = Correctly_detected_attacks Total_number_of _attacks = TP TP + FN (4) 3) False positive rate (FPR): It is defined as the ratio between the number of normal instances detected as attack and the total number of normal instances.Precision (PR): It is the fraction of data instances predicted as positive that are actually positive.

FIGURE 3 .
FIGURE 3. Comparison of the detection performance of the ffs-ids system and other state-of-the-art methods on the DoS attack dataset.

FIGURE 4 .TABLE 13 .
FIGURE 4. Comparison of the detection performance of the FFS-IDS system and other state-of-the-art methods on the fuzzy attack dataset.

FIGURE 5 .
FIGURE 5. Comparison of the detection performance of the FFS-IDS system and other state-of-the-art methods on the gear spoofing attack dataset.

FIGURE 6 .
FIGURE 6.Comparison of the detection performance of the FFS-IDS system and other state-of-the-art methods on the RPM spoofing attack dataset.

TABLE 1 .
Comparison of network traffic classification methods.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 2 .
Comparison of intrusion detection approaches in in-vehicle network traffic classification.

TABLE 3 .
Attributes of the car-hacking dataset.

TABLE 4 .
Hyper-parameters of decision Tree classifier.

TABLE 5 .
Hyper-parameters of random forest classifier.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 10 .
Training and test datasets.

TABLE 14 .
Comparison of intrusion detection methods for Fuzzy attacks on in-vehicle networks.

TABLE 15 .
Comparison of intrusion detection methods for gear spoofing attacks on in-vehicle networks.

TABLE 16 .
Comparison of intrusion detection methods for RPM spoofing attacks on in-vehicle networks.