Securing Autonomous Vehicles Against GPS Spoofing Attacks: A Deep Learning Approach

With the rapid advancement of technology and multimedia systems, ensuring security has become a critical concern. Connected and Autonomous Vehicles (CAVs) are vulnerable to various hacking techniques, including jamming and spoofing. Global Positioning System (GPS) location spoofing poses a significant threat to CAVs, compromising their security and potentially endangering pedestrians and drivers. To address this issue, this research proposes a novel methodology that uses deep learning (DL) algorithms, such as Convolutional Neural Networks (CNN), and machine learning (ML) algorithms, such as Support Vector Machine (SVM), to protect CAVs from GPS location spoofing attacks. The proposed solution is validated using real-time simulations in the CARLA simulator, and extensive analysis of different learning algorithms is conducted to identify the most suitable approach across three distinct trajectories. Training and testing data include GPS coordinates, spoofed coordinates, and localization algorithm values. The proposed machine learning algorithm achieved 99% and 96% accuracy for the best and worst case scenarios, respectively. In case of deep learning, it achieved as high as 99% for best and 82% for the worst case scenario.


I. INTRODUCTION
The autopilot system such as autonomous vehicles or drones are frequently used for surveillance systems, secure communication and packet delivery.Relying on GPS measurements aided by precise high definition maps, autonomous vehicles choose shortest and optimized path from starting point to destination [1].This is mandatory for such vehicles in order to operate autonomously as well as correctly without any sort of human intervention [2].Thus the reliability and secure operation of GPS sensor is crucial factor for the wider acceptance of such vehicles.During any unforeseen condition, the communication signals that are exchanged between the autonomous vehicles and the ground stations can be lost or corrupted by incorporating some cyber-attacks such as spoofing or jamming [3].Jamming attacks refers to The associate editor coordinating the review of this manuscript and approving it for publication was Cheng Chin .
the fully blockage of the GPS operation via the disruptive signal transmission on the same frequency as that of GPS signals [4].On contrary to this, spoofing attack refers to deceiving the user by transmitting the signals possessing same characteristics just like the legitimate GPS satellite signals [5].
To resist such cyber-attacks, it is crucial that the autonomous vehicle architecture to be robust.Autonomous vehicles can be attacked in two forms which includes Denial of Service attack (DoS) and integrity attack [6].False data injection and spoofing comes under the category of integrity attack while black and gray hole attack and jamming comes under the category of DoS attack.Global Naviagtion Satellite System (GNSS) spoofing involves manipulating signals to misguide receivers, potentially causing dangerous consequences.Despite increased interest in GNSS spoofing, there is a lack of Commercial off the Shelf (COTS) receivers capable of countering advanced attacks.Addressing this gap is crucial to ensure the security and reliability of GNSS systems [7].We can categorize the GPS spoofing into two major classes, refined receiver based spoofers, GPS signal simulator and receiver based spoofers [6].In first category, it is supposed that position and velocity of the victim receiver are known precisely and such spoofing is quite impossible to detect using the traditional anti-spoofing techniques.In the second category, the simulators used to send the GPS signals which are concatenated with radio signals in order to produce duplicate GPS signal [8].
For interference detection, signal classification, multipath detection and data quality assurance, machine and deep learning is being utilized in GNSS [8].Various machine learning as well as deep learning based algorithms are also developed for the detection of GPS spoofing attack.Most widely used algorithms for GPS spoofing attack detection are decision trees, support vector machines and neural networks [8].Monitoring of cross correlation of multiple GNSS measurements and observables can be used for the detection of potentially spoofed signals [9].The stability and accuracy of the GNSS absolute solutions in case of autonomous vehicles can be significantly improved using the multi-layer recurrent neural networks in combination with long-short term memory (LSTM) algorithms [10].The deep learning methods can be used for the vehicle position prediction based on the multi-sensors data which includes GNSS, without the redesigning of the analytical model of every individual sensors on the autonomous vehicle [11].
The key contributions of this paper are outlined as follows: • CARLA 1 is used in this research to acquire real-time sensor values, specifically yaw rate (φ .), steering angle (α), wheel speed (v) and GPS receiver data.These sensor values serve as crucial inputs for training and evaluating the GPS spoofing attack detection model enabling realistic simulation environment.
• Novel sensor fusion method is developed for integrating data from diverse sensors, such as yaw rate (φ .), steering angle (α) and GPS.This sensor fusion approach enhances the accuracy and reliability of GPS spoofing detection by incorporating multiple sensor modalities thus leading to improved detection performance.
• GPS location spoofing attack detection solution is proposed based on machine and deep learning algorithms.Leveraging the CARLA dataset, the detection system employs state-of-art techniques, including anomaly detection and pattern recognition to differentiate between genuine and spoofed signals.The proposed solution is evaluated in terms of precision, recall, F1 score and accuracy through multiple scenarios using realistic data.The rest of the paper is organized as follows.Section II reviews the related work on GPS location spoofing attacks and detection techniques.Section III presents the proposed methodology, explaining the algorithm and framework for 1 https://carla.org/GPS spoofing detection.Section IV describes the experimental setup, including the utilization of the CARLA dataset.It also presents the results and analysis, discussing the performance of machine learning and deep learning algorithms.Section V concludes the paper and suggests future research directions.

II. RELATED WORK
The most common approaches used for the detection of GPS location spoofing attacks includes signal processing and data driven techniques.However, solutions based on signal processing requires prior knowledge of the expected signal properties, making them vulnerable to attacks that exploit such assumptions and also require specialized equipment.Data driven approach employs machine or deep learning algorithms for pattern detection and anomalies in large datasets and no specialized equipment is required.
The vulnerability of CAVs to GPS location spoofing attacks is explored in [1].It proposed a data-driven approach based on machine learning to detect these attacks, using only normal location data for training.The solution is tested and evaluated using realistic data and demonstrates over 98% accuracy in detecting attacks.
The vulnerability of Unmanned Aerial Vehicles (UAVs) to GPS signal spoofing attacks is discussed in [3].The article proposed a machine learning-based solution using SVMs to detect counterfeit GPS signals.Experimental analyses demonstrated the effectiveness of the model in accurately identifying spoofed signals, surpassing existing techniques.The proposed solution achieved 96% accuracy in detecting GPS spoofing attacks.
The use of machine learning in GNSS applications is explored in [8].A systematic review of literature is presented, encompassing various applications of machine learning in GNSS, including signal acquisition, classification, prediction, and anomaly detection.The article also addresses challenges and potential future applications of machine learning in GNSS.Highlighted applications include earthquake warning systems, hurricane tracking, ice detection and thickness estimation, as well as soil moisture estimation.The conclusion drawn from the review is that machine learning has the potential to enhance the accuracy and reliability of GNSS applications, while also paving the way for further research and exploration of new possibilities in the field [16].
The proposed paper [9] introduces a machine learning-based method for detecting potentially spoofed GNSS signals.
The approach involves monitoring the cross-correlation of multiple GNSS observables and measurements.To validate the approach, both synthetic and real-world spoofing datasets were utilized.The results demonstrated the effectiveness of monitoring cross-correlation among significant GNSS observables and measurements in detecting spoofing signals.SVM classification was employed for the spoofing detection, achieving an impressive accuracy rate of 97.8%.
The open service (OS) signals of any GNSS core constellation were vulnerable to manipulation, presenting a significant risk for Safety-of-Life (SoL) applications.Two categories of data manipulation, namely spoofing and meaconing, were identified [4].Spoofing involved generating and transmitting manipulated false GNSS signals, while meaconing consisted of recording and rebroadcasting authentic signals with a controlled delay.The threat of GNSS signal spoofing escalated with advancements in digital signal processing and the hardware implementations of Software Defined Radio (SDR) GNSS-spoofing transceivers.In response, the authors [17] proposed the GNSS signal post-correlation method along with machine learning algorithms to detect the presence of spoofing signals.Previous researchers had successfully employed SVM-based approaches, achieving success rates ranging from 94 to 95%.
In [12], the vulnerability of UAVs to GPS spoofing attacks, which involve attackers disguising themselves as genuine GPS signals to manipulate the navigation and positioning of UAVs, was discussed.The article proposed a novel GPS spoofing attack detection algorithm utilizing LSTM.The algorithm aimed to predict the flight paths of UAVs and identify deviations from these paths as potential GPS spoofing attacks.The article asserted that this algorithm outperformed existing detection methods in terms of efficiency and adaptability.To evaluate the algorithm's performance, it was tested in a simulation environment.The results demonstrated its effectiveness in detecting GPS spoofing attacks, with a detection ratio of 78%.Additionally, the computation time required for the algorithm ranged from 3 to 5 seconds.
Machine learning-based methodology for the automatic and accurate detection of amplitude ionospheric scintillation events, which induce fluctuations in satellite broadcast signals is explored in [18].The approach utilized common GNSS stand-alone receivers observables and achieved a high detection accuracy of 98% without prefiltering or excluding low-elevation angle data.It outperformed traditional scintillation detection techniques by reducing false alarms and missed detections.The authors also provided an overview of scintillation effects on GNSS signals and analyzed machine learning algorithms, models, and metrics for performance evaluation.Decision trees were highlighted as robust, nonlinear learners with the ability to avoid overfitting through pruning or ensembling techniques.However, it was acknowledged that individual decision trees could be prone to overfitting if they memorized the training data by excessively branching.
Utilization of deep learning models to enhance the modeling of multipath propagation effects on GNSS correlation outputs is discussed in [19].A Deep Neural Network (DNN) structure was proposed as a substitute for standard correlation schemes to effectively model multipath channels.The proposed solution could be seamlessly integrated into acquisition and tracking receiver blocks, exhibiting promising performances in time-delay tracking.The analysis of our proposed model along with the previous research is shown in Table 1.

III. PROPOSED METHODOLOGY
In this paper, we employed various machine learning and deep learning algorithms to propose a mechanism for detecting authentic and spoofed GPS location.In machine learning algorithms, SVM proved valuable for this task.However, tuning the algorithm and selecting the appropriate kernel for SVM are critical factors.On the other hand, deep learning algorithms require large amounts of data, significant computational resources, and extensive hyperparameter tuning.They may not perform well on small datasets.The performance of both machine learning and deep learning algorithms can vary depending on the characteristics of the data.

A. DATA ACQUISITION AND SYSTEM MODEL
The proposed system model is shown in Figure 1.The proposed methodology involves the acquisition of data from CAV equipped with a GPS receiver and a specialized device with Software Defined Radio (SDR) hardware and software.The CAV moves on a road network, and its true location p k and velocity u k are represented as where x k and y k represents the x and y coordinates of CAV's location at time k respectively.In the same manner x ′ k and y ′ k represent the horizontal and vertical components of CAV's velocity at time k, respectively.The GPS receiver processes satellite positioning signals to output the GPS location of the CAV as where p G k is a two-dimensional vector that represents the CAV's GPS location at time k with x G k represents the CAV's latitude and y G k represents the CAV's longitude.The underlying assumption in this work is that a user defined constant bias is introduced by the attacker in the GPS location Values.The GPS location of the CAV under attack is modeled as a Gaussian random variable as where B A represents the attack vector with b x and b y as the attack biases, indicating the magnitude of the attack in meters.
In case of attack free scenario, B A = 0. p G k represents the GPS location of the CAV at time k.p k is the true location of the CAV at time k, B A is the attack vector, G k is the covariance matrix and σ G k is the standard deviation of the GPS location.A specialized device, equipped with SDR software and hardware, is integrated into the CAV to monitor signals from the surrounding connected vehicles and wireless network infrastructure.This device operates autonomously, without relying on GPS measurements.At the heart of its functionality lies the Localization Algorithm (LA), specifically designed to estimate the precise location of the CAV based on these signals; see [20] for more details of localization.By using the characteristics of these signals, such as signal strength, time-of-arrival, or signal propagation patterns, the algorithm outputs the estimated location p k L of CAV's which is denoted as where p L k represents the CAV's location estimated by the localization algorithm at time k.p L k is a two-dimensional vector that represents the CAV's location, with x L k representing the CAV's localized x-coordinates and y L k representing the CAV's localized y-coordinates.The localization algorithm uses the information from on-board sensors and more precisely the yaw rate (φ .), steering angle (α) and wheel speed (v) measurements for the estimated location of CAV's.The LA measurements are modelled as Gaussian random variable and represented as where σ L k is the standard deviations of the LA measurement and L k is the covariance matrix.There are three major steps involved in our proposed solution; see [2] for detailed overview of such algorithm.The first step is of prediction in which the reading from on-board sensors specifically yaw rate (φ .), steering angle (α) and wheel speed (v) are used for the CAV's location prediction represented as Set the initial state of the vehicle: x real , y real for each movement step i from 1 to movementSteps do Generate random values for v k , δ k , ω k Update the predicted state using the vehicle dynamics equations: Update the process covariance matrix based on the predicted state and noise covariance Obtain the current sensor measurements: x measured , y measured Generate random values for measurement noise: n x , n y Calculate the innovation or measurement residual: δx = x measured − x pred + n x δy = y measured − y pred + n y Calculate the innovation covariance matrix: Update the state estimate: • δy Update the error covariance matrix: Update the real state variables: the values obtained from the first step by means of Bayesian filtering and output the values of refined location estimate pk+1 .In the last step, the GPS location measurements from the GPS receiver of the CAV (authentic and spoofed) i.e. p G k+1 & pG k+1 respectively, along with the values obtained in the second step i.e. pL k+1 are used with their corresponding labels (0 for spoofed and 1 for authentic) for the training and testing purposes of our proposed machine and deep learning model for the detection of location spoofing.The entire process is represented in Algorithm 1.
We conducted an analysis of our machine and deep learning models using three distinct datasets obtained from the CARLA simulator as shown in Figure 2. The first dataset consisted of 1246 samples, the second dataset comprised 2397 samples, and the third dataset contained 5777 samples.Each dataset contains values of p G k+1 , ṗG k+1 , pL k+1 and SVM aim to find an optimal hyperplane that separates the features into different classes with maximum margin.In a 2D dataset, a line (support vector) can accomplish data classification with maximum margins.Hyperplane can be expressed as: To find the optimal hyperplane, we minimize the equation: subjected to the constraints:  where ξ i ≥ 0. ||w|| represents the Euclidean norm of the weight vector w, C is the regularization parameter, y i is the class label for sample x i , (x i ) is the feature vector transformed using the kernel function, w T denotes the transpose of the weight vector w, b is the bias term, and ξ i is the slack variable.Given a test sample x, we compute the feature vector (x) and use the trained SVM classifier to make predictions with the expression: The sign function returns −1 for negative inputs and +1 for positive inputs.X i represents the feature vector which are SVM also employ different kernels for classification purposes, such as polynomial, RBF, linear, or sigmoid.In this work, all these kernels are utilized in SVM implementation as depicted in Algorithm 2, and the results are reported in Section IV.The hyperparameters used in our proposed SVM algorithm are depicted in Table 2.

2) CONVOLUTION NEURAL NETWORK
By leveraging the power of convolutional layers, pooling layers, and fully connected layers, CNNs can extract intricate spatial features from p G k , enabling accurate discrimination between genuine and spoofed GPS signals.Given a labeled dataset D comprising of x G k , y G k , ẋG k , ẏG k , xL k , ŷL k as input features and y i is the corresponding class label (0 for no spoofing and 1 for spoofing), CNNs aim to learn a discriminative mapping function between the inputs and the class label.The output feature map of a convolutional layer  is computed as: where F in is the input feature map, W is the filter weights, and b is the bias term.The max pooling operation, which down samples the feature maps, is represented as: where P in is the input feature map and P out is the output feature map after pooling.The softmax activation function, applied in the final layer, converts the logits into a probability distribution over the classes: where z i is the logarithm of the odds for the event corresponding to the class i, and K is the total number of classes.The acquired data, consisting of p G k and pL k , are utilized to train machine and deep learning algorithms for detecting GPS spoofing.The model aim to distinguish between legitimate GPS measurements and spoofed GPS measurements by learning patterns and characteristics from the collected data as depicted in Algorithm 3. The hyperparameters used in our proposed CNN algorithm are depicted in Table 2.
The training process involves feeding the x G k , y G k , ẋG , ẏG k , xL k , ŷL k values with their corresponding labels (i.e., 0 or 1) into the model.We evaluate the performance of our proposed model by analyzing the detection results using a confusion matrix.The confusion matrix categorizes the results into four categories: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).
To assess the effectiveness of our attack detection solution, we rely on several metrics derived from the confusion matrix, including Precision (P), Recall (R), and F1 Score.P represents the proportion of correctly identified attacks among all the detected attacks, including false positives.R measures the proportion of correctly identified attacks among all the true attacks, accounting for missed detections (i.e., false negatives).Furthermore, the F1 Score provides a balanced measure by taking into account both P and R. It is a weighted average of P and R, serving as a metric for accuracy on the given dataset.The F1 Score provides insights into the overall performance of our attack detection solution, considering both the ability to correctly identify attacks and minimize FP and FN.

IV. EXPERIMENTAL RESULTS
A comprehensive evaluation of our machine and deep learning model for each combination of bias value and dataset is presented in this section.The simulator environment of CARLA for dataset generation is presented in Figure 2. The dataset fetched from CARLA simulator comprises of time, compass, accelerometer readings (x, y, z), gyroscope readings (x, y, z), geolocation coordinates (x, y, z), GNSS latitude, GNSS longitude, GNSS altitude, control gear, control brake, yaw rate (φ .), steering angle (α) and wheel speed (v). Figure 3    pG k of dataset 1,2 and 3 respectively.This figure highlights the impact of GPS spoofing, where attackers manipulate GPS signals to deceive the localization algorithm and create a false trajectory.In Figure 6, the box plots showcase the distribution and statistical summary of three distinct datasets.Each box represents the interquartile range (IQR), encompassing the middle fifty percent of the data, with the median line demarcating the center.These box plots offer a visual means of comparing the data distributions and uncovering any discernible dissimilarities or resemblances present in the three datasets.The box plot analysis of dataset 1 reveals that the spoofing attack is more pronounced in the variables x G k and ẋG k as compared to y G k and ẏG k .This indicates that the spoofing attack has a stronger impact on the GPS coordinates related to the x-axis and its velocity.On the other hand, when considering the comparison between x G k and xL k , there is significantly less difference observed.This suggests that the localization algorithm employed shows higher accuracy, as the difference between the estimated localization x L k and the actual GPS location x G k is relatively small.
The experimental results demonstrate the effectiveness of machine and deep learning algorithms in accurately distinguishing between genuine and spoofed GPS locations.The evaluation metrics, such as Accuracy (A), P, R and F1-score, provide quantitative insights into the performance of the models.

A. MACHINE LEARNING MODEL
The proposed methodology incorporates K-fold crossvalidation to evaluate the performance of the model reliably.The model is trained on the training set D itrain and then evaluated on the test set D itest .The performance metrics, such as A, P, R, and F1 score, are calculated based on the predictions of the model on D itest .To assess the performance of the proposed model across different iterations, the K-fold cross-validation process is repeated multiple times, varying the subsets used for training and testing.This helps to mitigate the impact of random variations in the dataset splits.By applying K-fold cross-validation and calculating these performance metrics, we obtain a robust evaluation of the proposed model's effectiveness in detecting the nature of the signal (spoofed or authentic).Figure 7 shows the results of K-folds experimentation when 10% of the dataset is chosen in each iteration as the training fold and Figure 8 shows the same results in case of 5% training set selection in each fold.The variations in A observed across folds during K-fold cross validation is attributed to the potential over-fitting or underfitting of the model.To mitigate this issue, the value of K was chosen appropriately, specifically K = 20 for our GPS location spoofing attack detection.This choice ensures that the data is sufficiently diversified and reduces the risk of  over-fitting or under-fitting, thereby enhancing the reliability of the model's performance evaluation.
Figure 9 depicts the learning curve of the SVM model.It illustrates the relationship between the training set size and the model's training and validation accuracy or loss.The learning curve demonstrates a consistent and promising trend.Starting from an initial accuracy of 0.980, the curve exhibits a steady increase, eventually converging towards a near perfect accuracy of 1.This trend indicates that as the model is exposed to additional training examples, it learns from the data and refines its predictions, leading to higher accuracy.The learning curve's upward trajectory indicates that the model is effectively capturing the underlying patterns in the training, testing and cross validation datasets and successfully generalizing its knowledge to achieve near perfect accuracy showcasing its potential for accurate GPS location spoofing attack detection.The graph clearly demonstrates that convergence takes place for both the training and testing sets at approximately 950 training examples.
Among different SVM kernels, the linear kernel exhibits the best computational time performance as depicted in   The polynomial kernel falls in between, while the sigmoid kernel has the highest computational time, making it the least efficient option.For optimal computational efficiency, the linear kernel is recommended for GPS location spoofing attack detection.
The analysis of the accuracy achieved by the SVM kernel when applied to three distinct trajectories is shown in Table 3. Linear kernel achieves accuracy values ranging from 0.96 to 1 across the different cases.The highest accuracy is observed in case 6 and case 9 which is 0.99.The accuracy of RBF kernel ranges from 0.85 to 0.98.The highest accuracy is observed in case 9 i.e. 0.93, while the lowest is in case 7 i.e. 0.85.For polynomial kernel, the accuracy ranges from 0.78 to 0.95.Case 9 has the highest accuracy of 0.95, and case 4 has the lowest accuracy of 0.78.In case of sigmoid kernel, the accuracy values vary from 0.46 to 0.68.The highest accuracy is observed in case 9 which is 0.68, while the lowest is in case 4 i.e. 0.46.
The results of all SVM kernel are demonstrated in Table 4.The comparative analysis of SVM models reveals that the linear kernel outperforms other kernel functions in terms of P, R and F1 score and generalization, making it the most suitable choice for GPS location spoofing attack detection.The RBF kernel demonstrates competitive performance.However, it falls slightly behind the linear kernel in accuracy and computational efficiency.The polynomial kernel proves effective in handling nonlinear relationships and intricate patterns, particularly in datasets with polynomial characteristics.It requires careful hyper parameter tuning and is computationally demanding for large datasets.The sigmoid kernel shows moderate performance, being capable of handling certain non-linearities but struggling with complex and high-dimensional datasets.Parameter sensitivity and careful tuning are necessary for optimal results.Table 5 illustrates the confusion matrix of the SVM algorithm with different kernels.The confusion matrix provides a detailed breakdown of the model's predictions, showing the TP (correctly classified spoofed signal, TN (correctly classified spoofed signal), FP (misclassified authentic signal) and FN values (misclassified spoofed signal).The SVM linear kernel achieves perfect accuracy, almost correctly classifying all authentic and spoofed signals in the dataset.It has the highest number of TP and TN indicating excellent performance.The SVM RBF and polynomial kernels also show high accuracy, with a majority of authentic and spoofed signals being correctly classified.However, they have a slightly higher number of FN and FP compared to the linear kernel.The SVM sigmoid kernel demonstrates relatively lower accuracy compared to other kernels with a higher number of misclassifications for both authentic and spoofed signals.Based on these observations, the SVM linear kernel performs the best among the evaluated kernels, achieving the highest accuracy and lowest misclassification rate for authentic and spoofed GPS location.

B. DEEP LEARNING ALGORITHMS
The experimentaional results in case when 10% and 5% of dataset is chosen for each iteration as training fold are shown in Figure 11 and 12, respectively.The experimental results demonstrates that employing a 5% training fold in each iteration of the GPS location spoofing attack detection model leads to higher accuracy compared to using a 10% training fold.This finding suggests a reduction in overfitting, indicating that the model is better able to generalize to unseen data.Additionally, utilizing a smaller training fold enables a better balance between bias and variance resulting  in improved accuracy.These results highlight the importance of considering the appropriate training fold size to mitigate over-fitting and achieve optimal performance in GPS location spoofing attack detection.
The impact of increasing epochs on the accuracy of CNN is shown in Figure 13.By systematically increasing the number of epochs during training, the analysis aims to uncover any patterns or trends in the model's performance metrics.The results shed light on the relationship between epoch count and metrics such as accuracy, loss, and convergence rate, providing insights into the optimal number of epochs for achieving optimal model performance.The findings contribute to the understanding of the training dynamics and help in fine-tuning the training process to maximize the deep learning model's predictive capabilities.
As epochs increase, evaluation metrics such as A, P, R, and F1 score tend to exhibit certain trends as shown in Figure 14.Initially, as epochs increase, we observed improvements in the model's evaluation metrics.This indicates that the model is learning and refining its predictions by iteratively adjusting the weights and biases during training but with  increased number of epochs, computational time also increased which is not suitable for our GPS location spoofing attack detection problem.The results in Figure 15 focuses on the impact of increasing epochs on both computational time and accuracy in CNN algorithm.As the number of epochs increases, the computational time required for training the model also increases due to the extended duration of forward and backward passes through the neural network.Therefore, a trade-off needs to be considered between computational time and accuracy when determining the optimal number  of epochs for a deep learning model.It is crucial to strike a balance where the model achieves satisfactory accuracy without significantly increasing computational time.Proper model evaluation and monitoring techniques, such as early stopping, can help identify the point of optimal performance to mitigate the risk of over-fitting and unnecessary computational burden.For our GPS location spoofing attack detection model, we conducted the training process using 20 epochs.This choice aimed to achieve a reasonable level of accuracy while managing computational time effectively.The values indicate the duration in seconds for algorithm to process.As observed, the CNN model consistently exhibits the shortest computational time, followed by the LSTM and DNN models.
Table 6 presents the accuracy results obtained from various deep learning algorithms for three different trajectories.The table provides a comparative analysis of the algorithms' performance in terms of accuracy.As observed in Table 6, it is evident that increasing the bias values leads to noticeable improvements in terms of A, P, R, and F1 for each trajectory.The observed improvement is attributed to the clear distinguishability between authentic and spoofed GPS locations at bias values of 9.However, as the bias values decrease, the proximity between authentic and spoofed GPS signals increases, posing a greater challenge in differentiation.The analysis presented in Table 7 evaluates the proposed work using different deep learning models: CNN, LSTM, and DNN.Performance metrics, including A, P, R, and F1, are evaluated for each model across multiple cases.Consistently, the CNN model outperforms the others across multiple cases, achieving high P, R, and F1.Notably, in Case 2, the CNN model demonstrates superior performance with a P, R and F1 of 0.94, 0.97, and 0.96 respectively.However, the LSTM and DNN models also exhibit competitive performance in specific cases.For instance, in Case 4, the LSTM model achieves a P, R and F1 of 0.78, 0.81, and 0.78 respectively, accurately detecting and classifying GPS spoofing attacks.Similarly, in Case 9, the DNN model achieves P, R and F1 of 0.96, 0.96, and 0.96 respectively, indicating its effectiveness in spoofing detection.The performance of all models may vary across different cases due to varying data characteristics and patterns, impacting detection accuracy.Overall, the results underscore the CNN model's efficacy in achieving higher accuracy and robustness in detecting GPS location spoofing attacks.Meanwhile, the LSTM and DNN models exhibit promising performance in specific scenarios.
Table 8 illustrates the confusion matrix for the deep learning models: CNN, LSTM, and DNN.The confusion matrix provides a comprehensive summary of the models' performance, including FN, FP, TN, and TP for each class.By examining the confusion matrix, we can evaluate the accuracy and misclassification patterns of the deep learning models across different classes.Regarding the CNN algorithm, the matrix reveals that out of the total number of test samples, 879 were correctly classified as authentic signals, while only 4 were mistakenly classified as spoofed signals.Similarly, for spoofed signals, 836 were accurately identified, 14 being misclassified as authentic signals.
For the LSTM algorithm, the matrix demonstrates that 864 authentic signals were correctly predicted, while 19 were misclassified as spoofed signals.For the spoofed signals, 806 were correctly identified, and 44 were misclassified as authentic signals.Lastly, for the DNN algorithm, the matrix reveals that 846 authentic signals were correctly predicted, while 37 were misclassified as spoofed signals.Among the spoofed signals, 812 were correctly identified, and 38 were misclassified as authentic signals.Based on these observations, the CNN performs best among LSTM and DNN, achieving the highest accuracy and lowest misclassification rate for authentic and spoofed GPS location.

C. COMPARATIVE ANALYSIS
In Table 7, we conducted a comparison between our proposed work and existing studies.In [2], the authors have used bias measurements of 5, 9, and 12 meters, whereas we focused on bias measurements of 3, 5, and 9 meters.Despite the lower bias measurements, higher accuracy was achieved.The proposed algorithm presented in [2] attained an accuracy of 97.57% in the best case, while our approach achieved an accuracy of 99% in both machine learning and deep learning algorithms.Additionally, we performed a similar comparison with the methodologies proposed in papers [3], [9], [12], [13], [14], and [15] and the results are included in Table 9.

V. CONCLUSION
In conclusion, this research work addresses the critical security concerns associated with CAVs by proposing a novel methodology that uses DL and ML algorithms.Specifically, CNN and SVM are utilized to protect CAVs from GPS location spoofing attacks.The proposed solution has undergone extensive experimentation and analysis, utilizing real-time simulations in the CARLA simulator.The performance evaluation encompasses different learning algorithms applied to three distinct trajectories, considering metrics such as A, P, R, F1, and computational costs.The results strongly indicate the effectiveness of the proposed approach in mitigating the risks associated with GPS location spoofing attacks on CAVs.By harnessing the power of DL and ML algorithms, the proposed solution demonstrates great potential in fortifying the security of CAVs and reducing potential hazards to pedestrians and drivers.This research makes a significant contribution to the existing knowledge by conducting a comprehensive evaluation and comparison of various learning algorithms in the context of GPS location spoofing detection.The findings highlight the superiority of the proposed methodology over existing solutions, emphasizing the importance of incorporating advanced technologies to safeguard the integrity and security of CAVs.Looking ahead, future research could explore additional ML and DL techniques, as well as real-time implementation and testing on physical CAVs.Continued efforts in this field will play a crucial role in bolstering the security of CAVs, ensuring the safe and reliable deployment of autonomous transportation systems in the future.

FIGURE 1 .
FIGURE 1. GPS location spoofing attack scenario on CAVs.
illustrates the trajectories along with GPS noise measurements, providing a visual representation of the movement patterns of the vehicle.The figure represents the values of p G k , showcasing the variability and noise inherent in the measurements.Figure 4 showcases the values of p L k along with the values of p G k of dataset 1, 2 and 3 respectively.Figure 5 plots the values of p G k along with
The dataset D A , D B and D C comprises of the values of x G k , y G k , ẋG k , ẏG k , xL k and ŷL k along with their labels as 0 or 1, which consists of 1246, 2397 and 5777 samples respectively.We split D A , D B and D C into K where K = 20 nonoverlapping subsets: D 1 , D 2 , . . ., D 20 .For each iteration of K-fold cross-validation, we select one subset as the test set and use the remaining K − 1 subsets as the training set.The index of the current iteration is denoted as i, where 1 ≤ i ≤ K and K = 20.The training set for iteration i is represented as D itrain , and the corresponding test set is D itest .

FIGURE 7 .
FIGURE 7. K-fold experimentation when 10% of the dataset is chosen in each iteration as the training fold.

FIGURE 8 .
FIGURE 8. K-fold experimentation when 5% of the dataset is chosen in each iteration as the training fold.

Figure 10 .
Figure 10.It is the most efficient and fastest.The RBF (Radial Basis Function) kernel requires more computational time.The polynomial kernel falls in between, while the sigmoid kernel has the highest computational time, making it the least efficient option.For optimal computational efficiency, the linear kernel is recommended for GPS location spoofing attack detection.The analysis of the accuracy achieved by the SVM kernel when applied to three distinct trajectories is shown in Table3.Linear kernel achieves accuracy values ranging from 0.96 to 1 across the different cases.The highest accuracy is observed in case 6 and case 9 which is 0.99.The accuracy of RBF kernel ranges from 0.85 to 0.98.The highest accuracy is observed in case 9 i.e. 0.93, while the lowest is in case 7 i.e. 0.85.For polynomial kernel, the accuracy ranges from 0.78 to 0.95.Case 9 has the highest accuracy of 0.95, and case 4 has the lowest accuracy of 0.78.In case of sigmoid kernel, the accuracy values vary from 0.46 to 0.68.The highest accuracy is observed in case 9 which is 0.68, while the lowest is in case 4 i.e. 0.46.The results of all SVM kernel are demonstrated in Table4.The comparative analysis of SVM models reveals that the linear kernel outperforms other kernel functions in terms of P, R and F1 score and generalization, making it the most suitable choice for GPS location spoofing attack detection.

FIGURE 11 .
FIGURE 11.K-fold experimentation when 10% of the dataset is chosen in each iteration as the training fold.

FIGURE 12 .
FIGURE 12. K-fold experimentation when 5% of the dataset is chosen in each iteration as the training fold.

FIGURE 13 .
FIGURE 13.The impact of increasing epochs on CNN.

FIGURE 14 .
FIGURE 14.The impact of increasing epochs on evaluation metrics.

FIGURE 15 .
FIGURE 15.The impact of increasing epochs on computational time and accuracy.

Figure 16
illustrates the learning curve of CNN model.It demonstrates the relationship between the training set size

FIGURE 17 .
FIGURE 17. Computational time of deep learning models.

Figure 17
represents the computational time of the CNN, LSTM and DNN algorithm.The chart displays the time measurements for each model variant: CNN, LSTM, and DNN.

TABLE 1 .
Comparison of the related work.
accuracy, precision, recall, F1 score, training time, prediction time, learning curve data Split data into training and testing sets: X train , X test , Y train , Y test ; Define hyperparameters for tuning: C, kernel, γ ; Perform grid search to find best hyperparameters using X train and Y train ; Obtain best hyperparameters: C best , kernel best , )Algorithm 2 GPS Location Spoofing Attack Detection Using SVM Input : x G k , y G k , ẋG k , ẏG k , xL k , ŷL k Output:

TABLE 2 .
Hyperparameters of ML and DL algorithms.
and y i is the label, which is 0 in case of authentic signal or 1 in case of spoofed signal.It can also Algorithm 3 GPS Location Spoofing Attack Detection Using CNN Accuracy, precision, recall, F1 score, training time, prediction time, learning curve data Split data into training and testing sets: X train , X test , Y train , Y test ; k , ẏG k , xL k , ŷL k Output:

TABLE 3 .
Accuracy pertaining to SVM kernel for three different trajectories.

TABLE 4 .
Analysis of proposed work corresponding to SVM with different kernels.

TABLE 5 .
Confusion matrix of SVM algorithm.

TABLE 6 .
Accuracy pertaining to deep learning algorithms for three different trajectories.

TABLE 7 .
Analysis of proposed work corresponding to different deep learning models.

TABLE 8 .
Confusion matrix of DL algorithms.

TABLE 9 .
Comparison of proposed work with existing ones.