Efficient ECU Analysis Technology Through Structure-Aware CAN Fuzzing

Modern vehicles are equipped with a number of electronic control units (ECUs), which control vehicles efficiently by communicating with each other through the controller area network (CAN). However, the CAN is known to be vulnerable to cyber attacks because it does not have any security mechanisms. To find vulnerable CAN messages that can control safety-critical functions in ECUs, researchers have studied CAN fuzzing methods. In existing CAN fuzzing methods, fuzzing input values are generally generated at random without considering the structure of CAN messages, resulting in non-negligible CAN fuzzing time. In addition, existing fuzzing solutions have limited monitoring capabilities of the fuzzing results. In this paper, we propose a Structure-aware CAN Fuzzing protocol, in which the structure of CAN messages is considered and fuzzing input values are systematically generated to locate vulnerable functions in ECUs. Our proposed Structure-aware CAN Fuzzing system takes less time to run than existing solutions, meaning that problematic CAN messages that may have originated from SW implementation errors or CAN DBC (database CAN) design errors can be found quickly and, subsequently, appropriate action can be taken. Finally, we evaluated the performance of our Structure-aware CAN Fuzzing system on two real vehicles. We proved that our proposed method can find CAN messages that control safety-critical functions in ECUs faster than existing fuzzing solutions.


I. INTRODUCTION
Modern vehicles are equipped with a number of electronic control units (ECUs) that control electronic systems such as the engine, airbags, brake, and so on. ECUs communicate with each other through several in-vehicle networks (IVNs), such as the controller area network (CAN), FlexRay, local interconnect network (LIN), CAN with flexible data rate (CAN-FD), and automotive ethernet. In 1993, the International Organization for Standardization (ISO) adopted CAN as a de-facto standard (i.e., ISO 11898) for communication between ECUs [1] for its prime suitability with vehicle features.
A vehicle can be efficiently controlled by CAN communications, but the following security issues remain problematic. First, attackers can eavesdrop on all messages on The associate editor coordinating the review of this manuscript and approving it for publication was Diana Gratiela Berbecaru . the CAN-bus because the messages are broadcast without encryption. Second, attackers can gain control of a vehicle by injecting malicious messages because there are no access controls or authentication procedures on the network [2]- [8]. This can be illustrated by one key vehicle hacking result, in which Miller and Valasek were able to remotely control critical operations such as engine and brake function of a Jeep Cherokee [6]. In addition to remote attacks using wireless interfaces (e.g., Bluetooth or Wi-Fi), as in the aforementioned Jeep Cherokee attack, attacks based on wired interfaces (e.g., On-Board Diagnostics (OBD)-II port) have also been conducted [3].
To deal with cyber attacks, attack surfaces on a vehicle should be analyzed in advance before they can be exploited by a potential attacker. Analysis of attack surfaces can be divided into two parts: 1) vulnerability research on external networks (e.g., vehicle-to-vehicle (V2V), vehicle-toinfrastructure (V2I), etc.) and 2) vulnerability research on VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ in-vehicle networks. Vulnerabilities of external networks can be analyzed with several open-source tools, such as NMAP and ubertooth, and vulnerabilities of in-vehicle networks can be analyzed with reverse engineering methods [3], [4] that analyze ECU firmware and CAN fuzzing methods 1,2,3,4,5 that analyze CAN messages related to safety-critical ECU control.
While there is much active research that analyzes the vulnerabilities of external networks, research on the vulnerabilities of in-vehicle networks has several limitations. First, the reverse engineering of ECU firmware requires physical access to a target ECU to locate a debugging port and obtain the corresponding firmware. Second, CAN fuzzing requires non-negligible analysis time to randomly transmit fuzzing input values to the CAN-bus. In addition, it is difficult to monitor the responses corresponding to brute force fuzzing input values because CAN DBC (database CAN) is securely managed and undisclosed, so there is a limit to how much the meaning of CAN messages can be analyzed.
To overcome these limitations in analysis methods for in-vehicle networks, we herein propose a Structure-aware CAN Fuzzing system as a protocol that minimizes fuzzing time by considering the structure of a CAN message. In other words, our goal is not to identify all CAN messages that can cause misbehavior of the target vehicle but rather it is to identify some CAN messages that can cause misbehavior of the target vehicle in a short period of time. Additionally, the proposed method involves network-based fuzzing, which does not need to reverse engineer ECU firmware through a debugging port.
The proposed method consists of three phases: 1) CAN message analysis, 2) fuzzing input value generator, and 3) monitoring. In the CAN message analysis phase, the structure of the CAN message is analyzed. In the fuzzing input value generator phase, the fuzzing rules based on analysis results are defined, and structure-aware CAN messages are generated and injected into the vehicle. Finally, in the monitoring phase, the corresponding responses are monitored using side information.
The contributions of this paper are as follows.  2) By monitoring side information (e.g., IMU sensor values) of the fuzzing target vehicles, this method automatically finds misbehavior caused by injected fuzzing input values. 3) To validate the efficiency and practicality of the proposed method, we conducted experiments with two real vehicles and several vulnerable CAN messages (e.g., controlling steering and engine acceleration) that could be exploited as attack messages. This paper is organized as follows. We introduce background knowledge and related works in Section II, and Section III describes the system design of our Structure-aware CAN Fuzzing system in detail. In Section IV, we evaluate our method through experiments on real vehicles and discuss the limitations of our proposal in Section V. Finally, we present our conclusion and offer direction for future research in Section VI.

II. BACKGROUND AND RELATED WORKS
A. BACKGROUND 1) CONTROLLER AREA NETWORK Robert Bosch developed the CAN based on a bus topology (CAN-bus) and released the technology in 1986. In the CAN physical layer, wires are a twisted pair of cables to provide protection against electrical noise. The maximum speed of the CAN-bus is 1Mbit/s, but the actual baud-rate on a vehicle is 500Kbit/s. The maximum length of data is 8 bytes [9]. Fig. 1 shows the structure of a CAN Data Frame. Start of Frame (SOF) denotes the start of the frame, while Identifier (ID) is used to arbitrate the frame priority and Remote Transmission Request (RTR) identifies whether the frame is a data frame or a remote frame. The Identifier Extension Bit (IDE) distinguishes whether a frame is standard or extended, and Reserved (R) area is a pre-designation for an extended frame. Data Length Code (DLC) indicates the length of the data field, and the data field contains the data transmitted by ECUs. Cyclic Redundancy Check (CRC) field checks frame errors, and the Acknowledgement (ACK) field recognizes reception of a valid frame. Finally, End of Frame (EOF) denotes the end of a frame. In this paper, we focus on the ID, DLC, and data field of the CAN data frame.

2) OBD-II PID
OBD-II is a vehicle diagnostic service used to diagnose vehicle status. Parameter identifiers (PIDs) are codes that check the vehicle status, and all modern vehicles follow the OBD-II PID standard defined by SAE J1962. 6 However, not all vehicles support all standard PIDs, and vehicle manufacturers can create customize their own identifiers. Therefore, it is important to find the supported PIDs for each vehicle, which can be confirmed via a request and response process included in PID services. When PIDs make a request query using CAN ID 0 × 7DF with PID, a code is sent to the CAN-bus of the vehicle connected to the OBD-II port, and the vehicle then sends a response query with the CAN ID in the range 0 × 7E8 to 0 × 7EF. The response query includes a return value that represents the vehicle status or supported PIDs.

B. RELATED WORKS 1) CAN MESSAGE ANALYSIS METHODS
Vehicle manufacturers keep Database CAN, called CAN DBC files, secret because attackers may cause a vehicle to misbehave using knowledge obtained from the DBC files. However, researchers have already discovered that DBC files can be extrapolated through a CAN message analysis method (e.g., CAN message reverse engineering [10]- [17].
References [10] and [11] proposed the ACA algorithm, which effectively analyzes CAN messages to solve the problem of CAN message reverse engineering taking nonnegligible time. The ACA algorithm finds supported PIDs using the OBD-II PID standard, and then requests a PID diagnostic query from a test vehicle with the PID code that it wants to analyze. By analyzing the relationship between the response data of the PID query and the CAN messages, the ACA algorithm finds both the CAN IDs and the corresponding data fields related to the requested PID code.
Alternatively, an automatic reverse engineering method uses the READ algorithm [13] to extract the signal boundaries of the CAN data frame. READ logs CAN messages received from the vehicle and creates sub-files of CAN messages with the same ID. READ calculates the bit-flip rate, which indicates how often the bit value changes at each bit position in the data field (64-bit) of each sub-file. Finally, READ extracts signal boundaries and labels the signals (e.g., Counter, CRC, etc.) encoded in the data field using the bit-flip rate. In this research, a comparison of the formal CAN message specifications for the test vehicle confirmed the results of READ to be highly accurate.
LibreCAN [15] subsequently proposed a framework that analyzes CAN message signals of a vehicle's powertrain and body using the OBD-II PID protocol and the built-in Inertial Measurement Unit (IMU) sensor on a smartphone. LibreCAN adopts the methodology of READ [13] (i.e., bitflip rate-based signal extraction) for signal extraction from CAN messages. After extracting signals from CAN messages, it identifies the signal boundaries more accurately than READ [13] via a heuristic method. Finally, LibreCAN translates the meaning of CAN messages by using the identified signals obtained from the OBD-II PID and the IMU sensor.

2) CAN FUZZING METHODS
Some studies on ECU vulnerability analysis methods based on CAN fuzzing do not need to reverse engineer the ECU firmware through a debugging port. In general, existing CAN fuzzing methods 1,2,3,4,5 run their own fuzzing algorithms using random input values without considering the meaning of the CAN IDs and the data fields [18]- [23].
Therefore, existing fuzzing methods tend to take nonnegligible time because these methods are required to inject all possible values of the CAN data field (i.e., if DLC is 8, the number of input values that will be injected to CAN is 2 64 ). In addition, there is no published work on monitoring methods for the responses to fuzzing input values.

III. THE PROPOSED SYSTEM: STRUCTURE-AWARE CAN FUZZING
In this section, we introduce our proposed system that can systematically generate fuzzing input values and automatically monitor responses triggered by fuzzing input values.
A. OVERVIEW Fig. 2 shows an overview of our Structure-aware CAN Fuzzing system, which consists of three phases. We outline these phases next.

1) PHASE 1: CAN MESSAGE ANALYSIS
Phase 1 is composed of three parts: bit-flip rate analysis, OBD-II PID analysis, and correlation analysis.
• Bit-flip rate analysis: Signal boundaries are identified by calculating the bit-flip rate of the CAN data field. Then, the identified signal boundaries are mapped to seven signal types-Unused, Constant, Multi-value, Sensor, Counter, Checksum, and Undefined field-via a heuristic method.
• OBD-II PID analysis: By checking the supported PIDs provided by the OBD-II PID standard, CAN IDs related to a specific function (e.g., engine RPM, throttle position, etc.) can be identified.
• Correlation analysis: The relationships between CAN IDs are analyzed by calculating the correlation of signals identified as being in the Sensor field, which is one of the defined seven signals.

2) PHASE 2: FUZZING INPUT VALUE GENERATOR
Phase 2 is composed of three parts: CAN fuzzing rule definition, CAN fuzzing input value generation, and CAN fuzzing input value injection.
• CAN fuzzing rule definition: By using the analysis results in Phase 1, the system can define the fuzzing rules that can cause vehicles to misbehave.
• CAN fuzzing input value generation: The fuzzing input values are generated by using fuzzing rules. In our proposed process, if a fuzzing input value is inferred to have the Checksum field, the checksum value corresponding to the fuzzing input should be generated in two ways. If the fuzzing input value to be injected exists in the CAN log recorded in a fuzzing target vehicle, the checksum value obtained from the CAN log can be used as is.
Otherwise, a deep learning algorithm infers a new checksum value. VOLUME 10, 2022 FIGURE 2. System design of our structure-aware CAN fuzzing system.
• CAN fuzzing input value injection: The generated input values are injected into CAN by considering the error states of the CAN-bus. Note that bus-off [9] issues occur due to message collisions caused by fuzzing input value injection.

3) PHASE 3: MONITORING
Phase 3 is composed of two parts: CAN message monitoring and side information monitoring.
• CAN message monitoring: After injection of the fuzzing input values, the system monitors violations of signal properties as defined in Phase 2.
• Side information monitoring: After injection of the fuzzing input values, the system monitors incidents of misbehavior in a fuzzing target vehicle caused by the injected input values using the side information obtained from the monitoring sensors installed on the target vehicle (e.g., IMU sensors). Next, we provide a detailed explanation of each phase.

B. PHASE 1: CAN MESSAGE ANALYSIS 1) BIT-FLIP RATE ANALYSIS
Two existing studies, [13] and [15], analyze the signal of each CAN ID data field based on the bit-flip rate. Our proposed Structure-aware CAN Fuzzing method also adopts the bitflip rate-based analysis approach. The bit-flip rate is the frequency of changes in each bit position (from 0 to 1, and vice versa) and can be calculated with the following Equation (1): • BFR c,k : The bit-flip rate for the k th position in the data field of an arbitrary CAN ID (c).
• n c : The number of CAN messages with CAN ID (c) in a set of CAN messages that has been logged over a predetermined time (e.g., 10 minutes). Otherwise, b c,k,i is 0. The boundaries of the signals are identified by calculating the bit-flip rate of the CAN data field. Then, the identified signal boundaries are mapped to seven signal types by using the bit-flip rate-based heuristic rules as follows.
• Unused field: A signal with bit values of 0 and a corresponding bit-flip rate of 0, and has a size of 4 or 8 bits.
• Constant field: A signal with fixed bit values of a specific value that is not 0 and a corresponding bit-flip rate of 0, and has a size of 4 or 8 bits.
• Multi-value field: A signal with bit values of only a few constant values and a corresponding bit-flip rate that converges to 0, and has a size of 2 or 4 bits.
• Sensor field: A signal whose bit-flip rates decrease from the Least Significant Bit (LSB) to the Most Significant Bit (MSB), and has a size of 4, 8, or 16 bits. In general, the sensor field representing a physical value has a high bit-flip rate on the LSB, as the bit value changes frequently, while the bit-flip rate on the MSB is smaller than on the LSB. Therefore, we can infer that the field where the bit-flip rate decreases from the MSB to the LSB is related to the physical value generated by the sensor in the vehicle. • Counter field: A signal with a 2, 4, or 8-bit value, and bit-flip rates decrease by 1/2 from the LSB to the MSB.
• Checksum field: A signal whose bit-flip rates of 2, 4, or 8-bit values are distributed between 0.3 and 0.6.
• Undefined field: Any signal that is not classified into the above six fields.

2) OBD-II PID ANALYSIS a: FINDING SUPPORTED PIDs
In this step, we adopt the existing ACA-proposed method [10], [11], to find supported PIDs provided by a target vehicle. The OBD-II PID diagnostic query is requested through the OBD-II port, and supported PIDs are checked using response query data. When PID request queries with a CAN ID of 0 × 7DF are transmitted to the target vehicle after setting the PID codes (e.g.,

b: CAN ID-PID MATCHING
In order to analyze the relationship between the CAN IDs and the PIDs of a test vehicle, we adopted the existing methods proposed in [10] and [11] for the following matching step process. First, the diagnostic query with a PID is periodically requested from a test vehicle, and the corresponding diagnostic response packets along with normal CAN packets are recorded during a pre-defined time called a window time. Then, by comparing the diagnostic response packets and data field of the normal CAN packet recorded during the window time, the frequencies of CAN IDs that have the same value as the diagnostic response results are counted. By repeatedly performing the above described processes, the CAN ID-PID matching step can identify the CAN ID with the highest number of counts in all time windows as being highly correlated with the PID used in the diagnostic queries.
For details on the CAN ID-PID matching algorithm, please refer to [10] and [11].

3) CORRELATION ANALYSIS
In order to identify correlating CAN IDs, we analyze the data field of each CAN ID. injections, a value in the form of 2 k is injected into the target field where the U.C.M. violation will be located. k is an integer that increases by 1 from 0 to l. (l is the bit length of the target field.) -Sensor violation: Sensor violations generate fuzzing input values that include an abnormal Sensor field that is greater than the maximum value or less than the minimum value of the Sensor field to observe the misbehavior causing abnormal Sensor signals; the maximum value and the minimum value of the Sensor field are obtained from the CAN log.
• Field violation: Field violations generate fuzzing input values that are set to the same value at each byte position of the data field to observe misbehavior caused by the abnormal structure of a CAN message that has not been inferred by bit-flip rates. For efficient field violation input value injection, a value in the form 2 k is injected into each byte of the CAN data field. k is an integer that increases by 1 from 0 to 7.

2) CAN FUZZING INPUT VALUE GENERATION a: SIGNAL VIOLATION INPUT
Before input values corresponding to a U.C.M. violation or a sensor violation are generated, the target CAN message VOLUME 10, 2022 should first be checked to see whether it has a Checksum field in the data field. If a Checksum field exists in the target CAN message, the checksum of the injected input value should be set. According to [4], if a CAN message is injected into the vehicle without consideration to the checksum, the corresponding CAN message may be ignored because of the checksum verification process. In addition, some vehicle manufacturers use private keys stored in an ECU when generating the checksum.
In the proposed method, there are two ways that the checksum value can be obtained. These processes are known as defined checksum and inferred checksum. If the input value already exists in the CAN log, the checksum does not need to be inferred. In this case, the checksum is set to a defined value in the fuzzing input value that is equal to the existing CAN log. Otherwise, to set the checksum for new input values without reverse engineering the ECU, the checksum value is inferred using a deep learning method. The proposed checksum inferring process is described below.
• Pre-processing the CAN log for checksum inference: After extracting all CAN IDs that have values inferred by the Checksum field in the data field, values of the Checksum field are set as the output of the checksum inferring process, and the remaining values are set as input. Then, the input is converted from hexadecimal to binary, and the output is encoded by a one-hot vector. For example, if the data field is m-bit and the Checksum field is n-bit, the input data, as binary data, consists of an array of size (m − n), and the output data consists of an array of size 2 n . The reason why the number of neurons in the output layers is 2 n is that the checksum value can be any value in a range of numbers that can be represented using the corresponding bit. Fig. 4 shows the process of encoding the CAN data field with a one-hot vector. If the Checksum field is 4 bits and the corresponding value is 8, a one-hot-vector with an array of size 2 4 sets 1 in the 8 th position of the array and the other positions of the array to 0.
• Designing a model for checksum inference: A checksum inference model is designed to adopt a deep neural network (DNN) [24]. For example, if the Checksum field is n-bit, the input layer consists of (m − n) nodes and accepts the input data that has been converted to binary data. The output layer consists of 2 n nodes and takes the output data encoded with a one-hot vector. The hidden layer consists of a fully connected layer, and the design of the DNN model is shown in Fig. 5. We use precision, recall, and F1-score to verify the performance of the model that infers checksum values. Experiments and evaluations of the model are described in Section IV in detail.

b: FIELD VIOLATION INPUT
In a field violation, input values are generated regardless of the Checksum field because the field violation ignores the identified signal boundaries.

3) CAN FUZZING INPUT VALUE INJECTION
In this section, CAN messages are generated according to the Structure-aware CAN Fuzzing rules, and they are injected into a fuzzing target vehicle. When CAN messages are injected, the transmitter node may enter bus-off mode, which means that the node is unable to participate in the CAN-bus within a pre-defined time (i.e., bus-off recovery time) due to message collisions resulting from injected CAN messages and CAN messages generated from the test vehicle. Owing to this, our fuzzing input injection process is designed to reset the CAN controller of the transmitter when it detects the busoff state and then performs the fuzzing process without any delay originating from bus-off recovery. In addition, if the fuzzing input injection process finds any fuzzing input that can trigger misbehavior in the target vehicle, this input is injected several times. We can then confirm through experiments that the target vehicle is responded to in the same way according to the fuzzing input, not the bus-off event.

D. PHASE 3: MONITORING 1) CAN MESSAGE MONITORING
In order to monitor misbehavior in a vehicle caused by CAN fuzzing input, we can monitor changes in the following values.
• CAN ID monitoring: Check for existence of invalid CAN IDs. • Correlation monitoring: Check for changes in the correlation between the Sensor field values over a certain threshold.

2) SIDE INFORMATION MONITORING
An IMU sensor, an inertial measurement device that integrates accelerometers and gyroscopes, is used to monitor misbehavior caused by CAN fuzzing. The accelerometer measures velocity changes, and the gyroscope measures angular velocity for three axes: pitch (x-axis), roll (y-axis), and yaw (z-axis).

IV. EXPERIMENT AND EVALUATION
In this section, we evaluate the proposed Structure-aware CAN Fuzzing system on two different vehicles to consider the environment of the functions provided in a range of vehicles; According to the year of production and the price of the vehicle, the functions provided in each vehicle are different. VehicleA is a 2014 Kia Soul, a subcompact crossover SUV, 7 and VehicleB is a 2018 Genesis EQ900, a premium luxury sedan. 8 To send and receive CAN messages, we used a CAN monitoring tool called RAD GALAXY as shown in Fig. 6. We also used the Raspberry normal driving, and driving with various vehicle functions operating). Fig. 7 shows the bit-flip rate analysis results and representative CAN ID signals in the two vehicles.

2) OBD-II PID ANALYSIS a: FINDING SUPPORTED PIDs
In this experiment, we can find supported PIDs using the OBD-II PID query. As shown in Table 1, VehicleA supports 37 PIDs and VehicleB supports 38 PIDs.

b: CAN ID-PID MATCHING
In this experiment, we identified supported PIDs related to the engine by checking the list of supported PIDs. With this information, we could identify which CAN IDs have a high correlation with engine-related PIDs. Table 2 shows the matching results.
We verified our results using the open resource file 9 and confirmed that the identified results are correct except for offset 7 of CAN ID 0 × 440 in VehicleA and offsets 5, 6 of CAN ID 0 × 368 and offsets 0, 1 of CAN ID 0 × 374 in VehicleB. We organized these findings in Table 2.

3) CORRELATION ANALYSIS
First, we calculated correlation using the Sensor fields of each CAN ID and used the Pearson correlation coefficient to measure the degree and strength of the positive or negative correlation. In this step, we set correlation thresholds  to 0.7 and −0.7 to indicate high correlation fields with the target Sensor field. If a correlation score of a Sensor field with the target Sensor field is greater than 0.7 or less than −0.7, we can infer that the Sensor field is highly correlated with the target Sensor field. Table 3 shows the results of our correlation analysis, in which the Sensor fields related to the engine were set to the target Sensor fields.

B. FUZZING INPUT VALUE GENERATOR 1) INPUT VALUE GENERATION AND CHECKSUM INFERENCE
In this section, we generated fuzzing input values for each CAN ID related to the engine using signal violation (i.e., U.C.M. violation and sensor violation) and field violation rules as defined in Section III.
While generating the fuzzing input values based on a signal violation, we must check whether or not a Checksum field exists. In our engine-related fuzzing experiments, we found that the Checksum field exists in the data field of CAN ID 0 × 80 of VehicleA and CAN ID 0 × 112 of VehicleB. Thus, checksum values of these fuzzing input values, which are related to CAN ID 0 × 80 of VehicleA and CAN ID 0 × 112 of VehicleB, were inferred using our checksum inference model. We confirmed that the identified checksum fields of the two CAN IDs are correct by comparing them with the open resource file 9 . The hyperparameters of our checksum inference model are listed in Table 4, and the input and output values for the checksum inference model become processed as follows. For example, since CAN ID 0 × 80 is expected to have a Checksum field of 4 bits, the input layer for the checksum inference model consists of 60 nodes and the output layer consists of 2 4 nodes. Likewise, since CAN ID 0 × 112 is expected to have a Checksum field of 2 bits, the input layer for the checksum inference model consists of 62 nodes and the output layer consists of 2 2 nodes. Then, in order to train and  evaluate the checksum inference model, we dedicated 70% of the CAN log for training, 20% for validation, and 10% for testing.
As shown in Fig. 8, our model achieves an accuracy rate over 98% and converges the loss to 0.01 in the training process for both CAN IDs 0 × 80 and 0 × 112. From Table 5, we can observe that the precision, recall, and F1-score for CAN IDs 0 × 80 and 0 × 112 are about 98% and 99%, respectively.

C. MONITORING RESULTS
After injecting CAN fuzzing input values, the corresponding results are monitored via CAN logs and side information.

1) CAN MESSAGE MONITORING
Since vehicle misbehavior can be reflected in CAN messages, these are monitored using the following rules: 1) CAN ID monitoring, 2) CAN DLC monitoring, 3) U.C.M. monitoring,  4) sensor monitoring, and 5) correlation monitoring. When monitoring CAN messages according to these five rules, we observed no noticeable changes.

2) SIDE INFORMATION MONITORING
In this step, we compared sensor values obtained from each vehicle's normal state without CAN fuzzing, which was done using sensor values obtained from each vehicle's fuzzing state with CAN fuzzing. By monitoring side information obtained from engine-related CAN fuzzing experiments, we were able to detect misbehavior caused by CAN fuzzing in the steering control in both vehicles, engine acceleration in VehicleA, and brake control in VehicleB. Fig. 10 shows CAN IDs that have abnormal side information. In the figure, blue, orange, and green indicate x, y, and z axes of the IMU sensor value, respectively. When comparing graph (a) Normal state of the Soul with (d) Fuzzing state of the Soul using a U.C.M. violation with CAN ID 0 × 370, the amplitude increases for IMU sensor values found in (d). This vibration checked by the IMU sensor means that the test vehicle's engine accelerated. Also, we confirmed that the RPM of the dashboard changed as shown in Fig. 9 (a) RPM instrument panel. In addition, when the CAN fuzzing input value for CAN ID 0 × 316 was injected, the RPM of the dashboard changed. When comparing graph (b) Normal state of the Soul with (e) Fuzzing state of the Soul using the field violation role for CAN ID 0 × 164, a periodic change of IMU sensor values was found in (e). This periodical change is caused by the vehicle's steering control that actuates steering by 5∼10 • to the right as shown in Fig. 9 (a) Movement of steering in the Soul. Likewise, the difference between graph (c) Normal state of the EQ900 and (f) Fuzzing state of the EQ900 using a U.C.M. violation with CAN ID 0 × 340 shows the steering control for VehicleB as shown in Fig. 9 (b) Movement of steering in the EQ900.
Furthermore, brake control was monitored as shown in Fig. 9 (b) AEB, in which the autonomous emergency braking (AEB) is actuated by CAN ID 0 × 48A. Although the brake control was not monitored via the IMU sensor in the vehicle's stop status, it can be monitored during the vehicle's driving status.

D. NUMBER OF CAN FUZZING INPUT VALUES
As shown in Table 6, we compared the number of fuzzing input values between existing brute force CAN fuzzing methods and our proposed Structure-aware CAN Fuzzing method. For a fair comparison, we used CAN IDs that we identified as being related to the engine in the test vehicles (e.g., CAN ID 0 × 80, 0xA0, 0 × 316, and 0 × 440 in VehicleA, and CAN ID 0 × 112, 0 × 316, 0 × 366, 0 × 368, 0 × 374, and 0 × 556 in VehicleB). The process of calculating the number of fuzzing input values for each CAN ID is as follows. • Number of field violations: After dividing the data field (64 bits) by 8 bits, each field is injected with units of 2 k , such that k increases by 1 from 0 to 7. In short, all eight fields have the same value according to k. Therefore, the number of fuzzing input values for engine-related CAN IDs in both VehicleA and VehicleB is 8. As shown in Table 6, the total number of fuzzing input values of our method for VehicleA is 140 and 226 for VehicleB, which are the summations of the number of fuzzing input values for engine-related CAN IDs. We confirmed that the total number of fuzzing input values in our method is less than 2 8 , whereas the number in existing brute force CAN fuzzing methods is greater than 2 66 . As a result, we confirmed that our method takes less fuzzing time than existing brute force CAN fuzzing methods.

V. LIMITATIONS AND DISCUSSION
The efficiency of our proposed Structure-aware CAN Fuzzing system was validated on a 2014 Kia Soul and a 2018 Genesis EQ900. However, this study still faces some limitations.
First, the proposed Structure-aware CAN Fuzzing system was validated on vehicles produced by the same manufacturer. Therefore, in order to extensively validate our method, we must evaluate our method using vehicles from other makers in future research. Additionally, if the manufacturer offers vehicles equipped with an intrusion detection system (IDS), new fuzzing methods should be studied because it might be difficult to inject fuzzing input values into the vehicle. Second, the proposed Structure-aware CAN Fuzzing system does not contain feedback-directed fuzzing that is generally included in software fuzzing. When generating fuzzing input values, there are cases where the vehicles do not respond even if the checksum values of the fuzzing input values are correctly inferred. We think that the root of these issues may VOLUME 10, 2022 lie in the contexts of the CAN-bus. For example, a correct counter value may need to be set if a CAN message has a counter field. Third, we use only the IMU sensor to obtain side information. However, it is possible to obtain additional side information from the vehicle's black box video data, sound sensor, and so on. By adding such additional sensors, we can monitor vehicular misbehavior more specifically. Last, we did not consider unusual situations (e.g., a collision event or a debug mode). However, our method generates unusual fuzzing input values by modifying the unused field and constant field values, which are known to contain a fixed value or some multi-values during normal driving conditions. Even if modifying the values included in these fields could solve unusual situations to some extent, more tests must be performed to cover unusual situations.
In future work, we will improve our Structure-aware CAN Fuzzing method by adopting feedback-directed fuzzing input generation, which enables us to control the vehicle more critically by adjusting the field, such as the counter field of the control message. In addition, a variety of sensors other than the IMU sensor will be adopted to monitor vehicular misbehavior, and more fuzzing tests will be conducted on vehicles made by various manufacturers.

VI. CONCLUSION
With the increase in the number of connected and autonomous vehicles (CAVs), cyber attacks on such vehicles are increasing. To deal with vehicular cyber attacks, ECU reverse engineering and CAN fuzzing have been studied to analyze ECU vulnerabilities. However, ECU reverse engineering needs physical access to a target ECU, and existing CAN fuzzing techniques require non-negligible analysis time to randomly transmit fuzzing input values to the CAN-bus. Additionally, monitoring responses corresponding to random fuzzing input values is notoriously difficult. To overcome the limitations of existing analysis methods, we proposed our Structure-aware CAN Fuzzing system, which considers the structure of CAN messages. Additionally, to our knowledge, this is the first method of its kind developed for the Structureaware CAN Fuzzing method. In comparison with brute force CAN fuzzing, the proposed Structure-aware CAN Fuzzing system can minimize fuzzing time and monitor the response results of CAN fuzzing. Our method was validated on real vehicles, a 2014 Kia Soul and a 2018 Genesis EQ900. In future work, we will apply our method to various real vehicles and validate efficiency.