SCER Spoofing Attacks on the Galileo Open Service and Machine Learning Techniques for End-User Protection

Spooﬁng attacks pose a clear cybersecurity risk for all systems relying on Global Navigation Satellite Systems (GNSS) for time synchronization or positioning. Secure Code Estimation and Replay (SCER) spooﬁng attacks are the most challenging type of spooﬁng attacks, as these may be problematic even for future GNSS protection systems, like Navigation Message Authentication (NMA) or Spreading Code Authentication (SCA). This is one of the reasons that make the development of complementary protection techniques, like the one proposed in this work, necessary. In the ﬁrst part of the paper, the spooﬁng SCER attacks are analyzed in detail for GPS and, particularly, for Galileo. The role of the Galileo Pseudorandom Noise (PRN) intra-satellite non-orthogonality distortion term in hindering the attacks is discussed and a detailed comparison between GPS and Galileo expected quality curves for the SCER attack is provided. A complementary detection method for end-user receivers (assuming NMA is used) against SCER attacks is proposed, based on the application of machine learning and a proposed set of features extracted from the receiver search space, assuming the attacker was not able to null the satellite signal.


I. INTRODUCTION
A cryptographic protection system for the Galileo Open Service Navigation Message of the E1B signal is currently under development, based on TESLA (Timed Efficient Stream Loss-tolerant Authentication) protocol. It is expected to be available by 2020 [1]. The TESLA protocol is a symmetric cryptographic system that provides some level of asymmetry by means of a delayed provision of keys [2].
The Galileo Open Service signature solution for E1B, known as the Open Service Navigation Message Authentication (OS-NMA), is intended to protect GNSS users against attacks based on generating false GNSS signals. This technique is called Spoofing. There are two main groups of spoofing attacks, as detailed in [3]: Based on the source of the GNSS signal: 1) Simplistic Attack: A GNSS simulator is used to generate the false GNSS signal used in the attack.
The associate editor coordinating the review of this manuscript and approving it for publication was Ana Lucila Sandoval Orozco.
2) Meaconing: Recording and rebroadcasting a GNSS signal while adding a time delay, with the intention of diverting the real position of the victim. And based on the used resources: 1) Intermediate Attack: This attack implies knowing the victim's receiver antenna's position and velocity. This is required to properly place the counterfeit signals with respect to the real signals, at the victim's search space. In order to do so, the attacker will be receiving the real signals from the actual satellites. 2) Sophisticated Attack: This attack is conceived to overcome defenses based on the Angle Of Arrival (AOA) of the received signals. It implies the use of several Spoofers with a common oscillator. All of these Spoofers will use the real satellite signals, as in the case of the Intermediate attack. Other types of attacks are described in [4], like the Selective Delay attack, consisting in the isolation of each spacecraft signal components (e.g. by the use of directive antennas tracking each satellite) and the addition of extra delays to each VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ signal components. For other definitions of Spoofing attacks, please refer to [4]. Regardless of the used sources, the simplistic attack, which relies on using a signal generator to create the fake signal, does not imply any knowledge of the original Navigation Message, so it should be prevented by any authentication technique like NMA. Meaconing, on the other hand, since it implies the rebroadcasting of a real signal, makes such protection, in some cases, unsuccessful. Nonetheless, this type of attacks could be detectable by the victim with a trustable time source, if the time delay introduced by the spoofer is big enough [3]. This imposes an upper limit to the allowed delay added by the spoofer. In order to be able to control the victim's Position-Velocity and Time (PVT), the spoofer will need to add different delays to each satellite signal; this forces the attacker to estimate the symbols transmitted by the satellite (particularly the unpredictable symbols). Here, we are assuming that the spoofer does not have enough resources to use isolated channels, including antennas and RF equipment, per satellite. This particular situation is evaluated in Section II-E.
The spoofer approach of estimating the unpredictable symbols transmitted by the satellite and synthesizing a fake signal based on the estimated symbols is known as Secure Code Estimation and Replay (SCER).
Therefore, if we assume that breaking the cryptographic security is impossible for an attacker, the only available solution would be the estimation of the real unpredictable symbol while it is being transmitted by the satellite and adding this information to the fake signal. Each GNSS can support their users protection by including unpredictable symbols in the Navigation Message, like those of the Galileo OS-NMA. There is currently an active discussion in the space industry on the NMA role in protecting users against SCER.
The paper is structured as follows: 1) In Section II, the SCER attack for both Galileo and GPS is reviewed in detail, providing comparative results. The Intra-satellite PRN non-orthogonality distortion term is also defined. 2) In Section III, The Intra-satellite PRN nonorthogonality distortion term's impact on the SCER on Galileo NMA is analyzed. A case study, centered in Galileo OS-NMA SCER attacks simulations is considered. 3) In Section IV, a Spoofing detection complementary technique, applicable for NMA and SCA, but based on the fact that NMA is used, is proposed. 4) In Section V, a number of different machine learning algorithms are analyzed and their expected accuracies are presented based on simulations. 5) In Section VI, the expected conclusions are discussed. Note that the suggested detection method in Section IV, relies on the use of NMA and on the fact that the Spoofer was not able to null the original signal. If the conditions and type of attacks defined in Section II are not met, then it is impossible to ensure that the Navigation Message was not modified, making the use the detection method used in Section IV risky.

II. SCER ATTACK
Two different types of SCER attacks are considered, from the point of view of the delay [3]: 1) Zero-latency SCER attack: The delay of the spoofed signal is considered to be 0 at the beginning of the attack and then gradually increased, avoiding effects easily noticeable in the tracking loops of the victim. 2) Non-zero-latency SCER attack: A significant delay is present in the spoofer-generated signal. In order to avoid being detected, due to the tracking jumps in the victim's receiver, at the beginning of the attack, the spoofer may try to generate jamming signals that could temporarily ''blind'' the victim's receiver. It is also true that it will be impossible to perform a zerolatency SCER attack when the signal arrives to the victim first. This particular point is also analyzed in [4], suggesting the idea of transmitting any symbol value until the necessary number of samples are processed by the matched filter, getting at that instant a good estimate of the unpredictable symbol. This issue will depend greatly on the geometry of the satellites constellation and the arrangement of the victim and the spoofer.

A. BAYESIAN ESTIMATORS
Following a similar approach to the one described in [5], it seems reasonable to assume that the spoofer will use some sort of Bayesian estimator to determine the value of the unpredictable symbol transmitted by the satellites. These Bayesian estimators are based on the output of a matched filter which can be modeled, during a single symbol of the unpredictable pattern, as: where Y k is the sampled sequence of the signal received by the spoofer and s k is the sampled sequence of the local replica of the signal, generated by the spoofer. Note that n indicates the number of samples of the unpredictable symbol used for the estimation, while k l represents the first sample to be used for the integration. Z l (n) is the output of the matched filter after processing ''n'' samples. It is up to the Spoofer to determine what sampling frequency should be used, as long as it meets the needed sampling frequency recommended for the different GNSS. For the results derived in Section V, a sampling frequency of 50 MHz was used. At this stage, it is assumed that the Spoofer performed a good estimation of the signal delay τ and the Doppler frequency f dop by means of acquisition and tracking blocks. An analysis of the random variable that can be found at the output of the matched filter for Galileo, and GPS will be analyzed later.
As decribed in [5], the proposed Bayesian estimators (MAP, ML, MMSE) are used to provide a real value to replace the unpredictable binary symbol, instead of picking one out from the two binary values of the unpredictable symbol. As in [4], we will follow the approach of using the MAP estimator, using the sign of the output of the matched filter, as described in [5].
As considered in [5], the received GPS L1 C/A signal can be modeled as follows in Intermediate Frequency (IF): where c k is the NRZ (Non-Return to Zero) Spreading code, w k is the estimate of the NRZ unpredictable symbol, θ k is the carrier phase and f IF is the Intermediate Frequency. N k is the AWGN (Additive White Gaussian Noise) at the input of the receiver.
In order to allow later comparison with Galileo, we will consider unitary power: Therefore, defining the matched filter as: where s k is the sampled sequence of the GPS local copy signal in the spoofer receiver. This leads to: where W L is the true value of the NMA unpredictable symbol.

2) GPS SIGNAL MODEL WITH UNITARY POWER IN BASE BAND (BB)
We will also consider the GPS L1 C/A signal in Base Band (BB). Then, the received GPS L1 C/A signal can be modeled as follows: Therefore, defining the matched filter as: Which leads to: Note that both results are the same in terms of expectation and variance, regardless whether we consider the GPS L1 C/A signal in IF or BB.
As it is stated in [5], equation(1) (or in equations (4) or (8)) can be used to estimate the value of the unpredictable symbol. Depending mainly on the received C/N 0 , we can obtain a good estimation of the unpredictable code symbol under analysis after 6 µs of integration.
The chipping period in GPS L1 is approximately 0.978 µsec. Leading to the evaluation of the signal during less than 6 Chips [6].
If a Spoofer is trying to perform a SCER attack, using a single antenna to receive all the signals, a linear combination of different satellites signals will be present at the input. Those signals will be modulated with different spreading codes, which will be orthogonal among each other. This means that, in order to estimate the symbol, it will be necessary to evaluate the signal in an interval big enough, to start taking advantage of the sequences orthogonality. A way to overcome this delay (imposed by the sequences orthogonality not present in the very short term) could be using directional antennas in order to provide extra gain to the signal coming from the satellite under evaluation (note that it will be necessary to use several antennas in order to track different satellites), then the differential delay needed for properly controlling the victim's position could be performed by means of having isolated channels -one per satellite-, and applying differential delays to each channel.
In this attack, we assume that the attacker is close to the victim and a mobile environment is considered due to the remarks in [7], regarding detecting the spoofing attack based on the channel behavior. This will imply that the antennas gain could not be extremely big. If this assumption does not hold, then the spoofer could consider not regenerating the signal but just using a channel per satellite, applying a differential delay as needed. See Section II-E for further considerations on this type of attack.

C. GALILEO SIGNAL
As per [8] and [2], the Galileo E1B Signal will include a NMA based on TESLA, providing with unpredictable symbols, forcing an attacker to follow the SCER schema. We can define the E1 (excluding PRS (Public Regulated Service)) Galileo signal as (based on [9], using the syntax from [5]): where w k is the estimate of the NRZ unpredictable symbol, the e 1B k term is the NRZ PRN sequence for E1B, e 1C is the NRZ PRN sequence for E1C, α = 10 11 and β = 1 11 . The SC E1A|B,a|b k is defined as follows: where R a = 1.023MHz and R b = 6.138MHz. VOLUME 8, 2020 We can consider the generation of the full signal (both E1B and E1C) for the local copy in the receiver, or just the E1B part [10].
The received signal will also contain Additive White Gaussian Noise (AWGN), therefore the spoofer will receive As for the GPS case, we will also make some general considerations regarding the output of the matched filter in IF and BB. We will consider, for now, that only the E1B PRN is present in the local copy of the spoofer receiver and that a large number of samples are taken for integration (''n'' is large), although we will later consider these points in detail, in Section II-D1 and Section II-D2 for BB.

1) GALILEO SIGNAL MODEL WITH UNITARY POWER IN INTERMEDIATE FREQUENCY
The received Galileo signal can be modeled as follows, in Intermediate Frequency (IF): Therefore, defining the matched filter as: Leading to: Note that equations (17), (18), (21) and (22) are considering that the Spoofer is only using E1B PRN in the local copy in the receiver used to estimate the unpredictable symbol. For more details on this, please refer to Sections II-D1 or II-D2.

2) GALILEO SIGNAL MODEL WITH UNITARY POWER IN BASE BAND (BB)
The received Galileo signal can be modeled as follows in BB: Therefore, by defining the matched filter as: Leads to: Var Z l (n) GAL−BB = 2σ 2 n Note that, here, we are considering that the Spoofer is only using E1B PRN for generating the local copy in the receiver, used to estimate the unpredictable symbol. For more details on this, please refer to Sections II-D1 or II-D2. Therefore, Galileo is providing the same results for both BB and IF, and it is providing with higher variance than in GPS, leading to a predicted reduction in the effective C/N 0 of 3dB.
Based on these results (equations: (9), (10), (21) and (22)), assuming both symbols transmitted by the satellite have the same probability, then: Regardless whether we consider the signals in BB or IF. Therefore, for the sake of simplicity, we will further evaluate the output of the matched filter for Galileo in BB.

D. AUTHENTICATION TECHNIQUES AND SCER FOR GALILEO 1) GALILEO SIGNAL WITH FULL LOCAL COPY OF E1 WITH NMA
Let's first consider the full local copy, assuming that the spoofer already knows the delay and Doppler of the incoming signal (by means of previous acquisition and tracking stages). In such case, the spoofer is going to generate (25) as s k in (1). Where: Which leads to (26).
where (with the current Galileo SIS (Signal In Space) ICD (Interface Control Document) [9]): Note that, as already mentioned, it is assumed that the Spoofer is able to perfectly align the local replica of the signal and the satellite signal. In this context, it has to be considered that the average of the products of both Galileo subcarriers with themselves is one, in order to get to the expression in (26).
In this case, we can identify the term ι k which will be present in our matched filter output. The term ι k , the intra-satellite PRNs non-orthogonality distortion term, will affect the spoofer estimation, as it will be discussed later in Section III-A.
This term could be eliminated by means of extending the evaluated signal length in the matched filter (the spoofer will have to wait until the inner product of both spreading sequences is close to 0). 85518 VOLUME 8, 2020 The expectation of the output of the matched filter will be: where: And the variance will be: where W L is the true value of the unpredictable symbol. This means that the output of the matched filter will follow a non-stationary Gaussian (note that the noise at the receiver input is AWGN). As the number of samples (''n'') increases, the Gaussian expectation will tend to be stationary, and Z l E full will follow: We will now consider a local copy with only the E1B spreading code. This option is interesting for the spoofer, compared to the full local copy, as it is not so computationally expensive. We assume, again, that the spoofer already knows the delay and Doppler of the incoming signal and the estimations are perfect. We will denote s k to the local copy of the signal, properly aligned to the incoming signal and without the E1C PRN. Then: where (with the current Galileo SIS ICD [9]): Note that, as already mentioned, it is assumed that the Spoofer was able to perfectly align the local replica of the signal and the satellite signal. In this context, it has to be considered that the average of the products of both Galileo subcarriers with themselves is one, in order to get to the expression in (32).
In this case, we can identify the term ι k , which will be present in our matched filter output. By evaluating the output of the matched filter over a long-enough period, we can eliminate this term, called the intra-satellite PRN non-orthogonality distortion term.
The expectation of the output of the matched filter will be: where: And the variance will be: where W L is the real value of the unpredictable symbol. As the number of samples (''n ) increases, the output of the matched filter will follow a Gaussian with stationary expectation.
The Spread Code Authentication (SCA) is a protection method which is currently under discussion for its future implementation in Global Navigation Satellite Systems (GNSS). Currently, an implementation for GPS known as ''Chips-Message Robust Authentication'' (CHIMERA) is under evaluation [11]. The techniques, based on including the unpredictable signature in the Spreading Codes, can be considered as an evolution of the techniques based solely on the use of unpredictable symbols in the Navigation Message (like the NMA).
As detailed in [12], the SCER attack on SCA protection approach relies on a similar method to the one presented in Section II, namely: applying Bayesian estimators to the output of a matched filter, then, after comparing against a threshold, the polarity of the unpredictable chip is obtained. The main difference between both cases is that, in the NMA, the Spoofer can use longer integration times, while in the SCA case, the integration has to be limited to the chip length. Note that, in the NMA case, the result was the unpredictable symbol polarity, while in the SCA the chip polarity is obtained. This implies that the symbol must be previously known by the attacker, either because the symbol is unpredictable (e.g. NMA) and is firstly estimated (implying a larger delay to start transmitting the false signal) or because the symbol is predictable and well known in advance. Note that if the spoofer is not able to include the proper polarity values in the unpredictable chips, then, when the end user (the victim) came to know, via the Navigation Message, the actual chip authentication sequence, correlation losses will be present, so making the SCA test fail. Moreover, the use of NMA in combination with SCA can make the GNSS systems even more secure since it ensures that the Navigation Message has not been tampered. The Navigation Message will allow us to determine the real values of the unpredictable chips, hence this point is of high importance. VOLUME 8, 2020 Every time the attacker needs to estimate a chip, (1) will be used. Assuming the positions of the unpredictable chips are known by the attacker and assuming the unpredictable chips are included in the E1B spreading code, then the following local copy would be used: Note that the design decisions on the distribution of the unpredictable chips (and whether this information is made available to the users in advance or not) will also have an impact in the SCER attack, as described in [12].
Although operations performed by the Spoofer in the SCA case are very similar to those presented in Section II, the expectation result will differ sightly, unlike in (34), the distortion term will not depend on e 1B k but on the transmitted symbol and the pilot spreading code. where: The variance will be the same as in the NMA case: This implies that the spoofer will have to take the term ξ SCA (n) into account. For the Galileo E1B case, the chip length is approximately 1 µs, therefore the maximum integration time available for the Spoofer will be 1 µs. Taking the expectation and variance obtained into account, the theoretical expression in (24) and the Galileo curves presented in Fig. 1 are still applicable. These curves provide low detection probabilities, with integration times of 1 µs (P d < 0.6) for C/N 0 between 35dBHz to 50dBHz, which are good C/N 0 for normal GNSS equipment. This implies that the SCA technique will be a good protection method against SCER, combined with NMA, as the SCER Spoofer detection probabilities will make the current attack approach very complicated: it will require working with very high C/N 0 . On top of that, even in case that the attacker is able to work under such circumstances, the Galileo modulation will impose a distortion term, present in (38). Hence, if the spoofer is not able to estimate ξ SCA (n), it will have a negative impact on top of the already poor detection probability.
On the other hand, retro compatibility problems are yet to be fully understood for all types of GNSS users. Note that Galileo OS-NMA is just using spare bits in the current navigation message (therefore, it does not impact the expected performance or how users are currently using the Galileo signal). Nonetheless, SCA implies an initial reduction in the Auto Correlation Function (ACF) peaks that users will obtain in the search space. Suggestions on the use of the pilot channel to track the signal, while SCA unpredictable chips are being transmitted, are provided in [13]. GPS will require a significant change in their modulation to provide with an open service signal with a pilot channel. This implies that receivers manufacturers will anyway need to upgrade their receivers to track such new signals. Still, when applied to Galileo, this may imply changes on how receivers are tracking the Galileo signals, which are already providing with the pilot channels. Hence, a bidirectional discussion should be put in place between the Galileo programme and receivers manufacturers to fully understand the impact for end users of applying SCA. Moreover, SCA will impose some computing requirements to the end user's receivers, as it is necessary to store pre-correlation raw signal samples to be used at a later stage, when authentication message with the details on the used puncturing of the Spreading Code is received.

E. ONE SATELLITE -ONE ANTENNA -ONE CHANNEL CASE
We will also consider the scenario where the spoofer, instead of trying to estimate the symbol transmitted by the satellite, has enough resources to use a directive antenna per satellite (we will assume that the spoofer is able to track each satellite in the sky and the antenna directivity is such that the signals of the rest of the satellites are completely eliminated in the output of the antenna). The spoofer is assumed, then, to provide an independent channel per satellite. Instead of estimating the symbol, as described in the first part of the paper (Section I), the spoofer could use an independent antenna and channel to track each real satellite, leading to the application of a different delay per channel (hence, per satellite too). This will not require estimating each symbol.
If we assume that the Noise Figure of the spoofer NF = 8dB (e.g. National Instrument USRP UBX Daughter Board, configured gain of 19.50dB, see [14]), then: Implying that the spoofer may introduce, at least, an 8 dB sanction to the SNR of the generated signal, with respect to 85520 VOLUME 8, 2020 the case where the spoofer is estimating the symbol transmitted by the satellite. Note that the Spoofer will need to track each satellite and use a group of directive antennas and isolated channels. Such setup for a spoofer following the victim is not simple, although it represents a serious threat if the attacker has enough resources to use such a complicated setup.
Even if we assume that the spoofer has enough resources to use this setup, techniques like those proposed in Section IV can still protect critical infrastructure standoff victims.

III. CASE STUDY: THE INTRA-SATELLITE PRNs NON-ORTHOGONALITY DISTORTION TERM AND ITS ROLE IN HINDERING THE SCER ATTACKS TO THE GALILEO NMA
Different simulations were performed in order to study the influence the intra-satellite PRNs non-orthogonality distortion term could have in the SCER attacks on Galileo NMA. The NMA case was further evaluated, in order to better characterize the impact of SCER on the Galileo NMA which will be available by 2020, as per [1].
In terms of the PRN distortion in SCER for Galileo, the key differentiator, with respect to the GPS L1 C/A case, is that these PRN non-orthogonality distortion terms are caused by the very same satellite the spoofer is trying to falsify, while in the GPS case, the distortion will be caused only by other satellites. The Spoofer cannot get rid of the ι k distortion term in Galileo even if only one single satellite is present at the receiver input.
We can conclude here that the modulation of the Galileo system is making the Spoofer estimation of the symbol harder, which is good for the Galileo users. We will discuss the intra-satellite non-orthogonality distortion term ι k and its impact in the SCER attacks, and we will compare it to the GPS case and its impact in the performance of the attack. Future work will also compare this intra-satellite effect to the effect due to the non-orthogonality between different satellite PRNs, for short integration times. The present work is only focused on the intra-satellite effect, as it is considered more daunting for Spoofers. Note that, regardless whether a very directive antenna is used (if possible) by the spoofer, the intra-satellite non-orthogonality distortion term effect will still be present, as it is introduced by the same satellite the spoofer is trying to use for the SCER attack. On the other hand, attenuating the signals coming from other satellites will reduce the effect between satellites.
The attacker has two main options available in order to overcome this problem appearing in the matched filter output due to the Galileo CBOC modulation in E1: 1) Extending the integration of the matched filter long enough, so the ι k parameter tends to 0. 2) Estimating the PRN non-orthogonality distortion term (ι k ) parameter. This case will not be further detailed, but, taking into account different channel models and studies [15], [16], the channel will only be challenging for low elevation satellites in urban areas, particularly for unpredictable symbols that may be transmitted together [9]. Note that this may make the Spoofer estimation of the intra-satellite PRN non-orthogonality distortion term invalid, after some symbols were transmitted.

A. EXTENDING THE INTEGRATION OF THE MATCHED FILTER WITH NMA
In order to get rid of the distortion terms, the obvious way forward will be to extend the integration time of the matched filter. Clearly, if we extend the integration time to the symbol period, we will be maximizing the C/N 0 and we will completely eliminate the distortion terms. Nonetheless, the spoofer cannot wait until the end of the symbol period is reached. Instead, and depending on the received C/N 0 , the spoofer will extend the integration time until a valid symbol estimation is available, based on an output, as clean as possible, of the matched filter. It can be seen, in Fig. 1, that the theoretical result for Galileo is worse than the theoretical expression for GPS (complementary of (23)), mainly due to the Galileo modulation.
In Fig. 2 the results of the Galileo simulation are compared to the Galileo theoretical curve (43). The results in Fig. 2 were obtained by analyzing Galileo E1 simulated signal (1 second of data per C/N 0 ), generated with the workbench described in Section III-B. No quantization (signal generator was configured to work using directly float numbers), sampling frequency of 50MHz, spoofer working with a local copy with just E1B (as described in Section II-D2), no acquisition or tracking errors were included and MAP was used as the Bayesian estimator. One single satellite under analysis. A known pattern of alternating ones and zeros was used for error estimation.
In (43), P d is the Spoofer probability of detection of an unpredictable symbol. As it can be observed in the Fig. 2, the simulation confirms the 3dB reduction. The solid-line shows the theoretical P d result for Galileo, derived from (43). The dotted-line shows the Galileo simulation results.
On the other hand, the non-orthogonality distortion term ι k effect is obvious for high C/N 0 and short integration times (see Fig. 2, for high values of C/N 0 and short integration times: the results differ from the theoretical response for Galileo, the one in (43)). Nonetheless, the effect is smaller compared to the impact of the 3 dB reduction, with respect to GPS, due to the Galileo modulation. For more realistic values of C/N 0 , like 53 dBHz and integration times of 1 µs or 10 µs, the difference between the simulated results probability of detection and the theoretical probability VOLUME 8, 2020  of detection is about 1%. For higher values of C/N 0 , like 67 dBHz and integration times of 1 µs, the difference is close to 10%. Therefore, this effect may not play a major role against SCER protection, although it could make the SCER spoofer work slightly harder. In Fig. 2, a comparison of the corrected theoretical Galileo P d , (43), and the simulation results can be found. It can be seen how, for low integration times and high C/N 0 , the simulation results divert more from the theoretical expected values (rightmost part of the figure, integration times of one and two microseconds).
Note that the effects of the Intra-satellite PRN nonorthogonality distortion terms are not modeled in the provided theoretical quality curves, as it only accounts for the complementary error function (erfc).

B. WORKBENCH FOR GALILEO SIGNAL SPOOFING
A complete workbench for testing the overall Galileo SCER approach was generated in Python, following the Galileo SIS ICD [9].
A signal generator was developed, capable of reading RINEX 3.0 files with Galileo Navigation Messages or including a known pattern of symbols that are considered unpredictable, as if TESLA completely unpredictable chains were used. This module is able to read the text files with the Galileo PRNs, annexed to [8] and receives an input from the user with the desired signal C/N 0 , the simulated satellite name (as of now, only one satellite is simulated at once), the sampling frequency, the Navigation message content to be modulated (in hexadecimal format), and the length of the simulation.
The signal simulator generates a binary file with I/Q samples with the Galileo E1 signal, as requested by the user (sampling rate, number of bits for quantization, delay, Doppler, length of the resulting file, Galileo satellite PRN and SNR are configurable). The spoofer module performs the estimation of the symbol and allows the use of the three Bayesian Estimators, described in Section II-A.
The receiver module will allow the benchmarking of different spoofer detection methods, by means of adding modules in the victim's receiver.
At the time of this paper submission, only the Galileo Signal generator, the spoofer and the acquisition step of the receiver are fully developed. Nonetheless, for the analysis performed for this article, the workbench capabilities are sufficient, as it is not necessary to really generate the unpredictable symbol in order to analyze the performance of the different SCER spoofer estimators with the Galileo signal. Indeed, a controlled combination of 1 and 0 symbols were used, in order to have a reliable source to quickly determine whether the spoofer was wrongly estimating the symbol.

IV. COMPLEMENTARY MACHINE LEARNING TECHNIQUES FOR SCER PROTECTION
As described in Section II, the SCER attack is a risk for GNSS users, even those relying on techniques like NMA. Nonetheless, Galileo OS-NMA forces the attackers to use the SCER schema and prevents them to follow other approaches like modifying the navigation message. This implies that, if the attacker wants to divert the victim's PVT, it is mandatory to generate a fake signal (including the unpredictable symbols estimated from the real signal), with a different Doppler and/or delay. Therefore, in the victim's Search Space (a very detailed analysis on the search space can be found in [15]), two correlation peaks will be found: one due to the spoofer signal and one from the original satellite signal, if the spoofer is present and if the spoofer was not able to null the original signal in the receiver input.
If the spoofer signal is superposed with the real signal and the navigation message was not modified by the attacker, then the effect on the victim will be negligible. If it is not superposed, then, assuming enough resolution in the search space is available, two separated peaks shall be present.
It is straightforward to conclude that a full branch of protection methods could rely on identifying abnormal search space distributions. We will evaluate the use of machine learning techniques to protect users against SCER attacks on Galileo OS-NMA, based on the analysis of features extracted from the search space.
Note that, if NMA techniques, analyzed in detail in Section II, are not used by the victim, the technique discussed in this Section will not, by any means, guarantee the navigation message was not modified. The detection method proposed in this Section is a complementary method to NMA, particularly designed against SCER on GNSS with NMA and only applicable if the original signal was not nulled.
The Search Space implemented in the workbench, defined in Section III-B and visible in Fig. 5, was calculated using the Parallel Acquisition in Time Domain method. The current method, as defined, is very heavy in terms of computing load. Future work will be focused on implementing a demonstrator and reducing the computing load. Note that the current resolution implies the use of powerful FPGAs implementing parallel correlators in order to generate the Search Space. The Spoofer signal was generated using the SCER method with MAP as Bayesian estimator, particularly using (1) with a local copy of E1B only, therefore the output of the Spoofer matched filter was following a random variable with expectation defined in (34) and variance defined in (35).

A. THE SEARCH SPACE WITH SPOOFER PRESENCE
Each cell of the Search Space is calculated by performing the following operation (based on syntax from [15]): where c [n − τ D ] is the local copy used by the victim and r [n] is the signal received by the victim's receiver (where τ D is the delay used by the receiver in each search space cell, F D is the Doppler frequency used by the receiver in each search space cell and N is the number of samples to be integrated to calculate each cell of the search space). If any spoofer is present and the original signal was not nulled by the spoofer, then the received signal will follow: where Y SAT GAL−BB is the real Galileo signal, in baseband, with AWGN noise and Y Spof GAL−BB is the Spoofed generated signal, in baseband, with AWGN noise. For the sake of simplicity, we will consider that the victim is using a local copy with only the E1C PRN (the Galileo pilot signal) for generating the search space.
Therefore, each search space cell in the victim's receiver will follow: where we have the terms coming from the real satellite signal: And those terms coming from the spoofed signal: And the variance of each cell will be: where N is the number of samples used in the matched filter of (44). Then, it is quite straightforward to conclude: 3) If F sat = F D , τ sat = τ D and F spof = F D , τ spof = τ D , then: 4) And if F sat = F D = F spof and τ sat = τ D = τ spof then: Note that the case in (57) will not pose a risk to the user at all, if the OS-NMA is used and the cryptographic protection is not broken (e.g. SCER attack). As the OS-NMA cryptographic protection is not broken and the Doppler and delay are the same as the ones of the authentic satellite, the victim's computed solution shall not differ with respect to the real one.

B. FEATURES EXTRACTION
As in any other machine learning problem, the first step is the feature extraction. We need to evaluate what information we are going to feed into the classification algorithms. The proposed features extraction is based on fitting the correlation peaks in the search space as 2D Gaussians and detecting RFIs during a time analysis window previous to the beginning of the signal sequence used for computing the search space, refer to Fig. 4 for details on this. As it is known [17], the ACF of the Galileo signal is not following a Gaussian waveform, although, for the purpose of increasing the computing efficiency and, at the same time, capturing the relevant features of the autocorrelation peaks, it is deemed sufficient for the purpose of detecting Spoofing signals. Further work will evaluate other waveforms that can further improve the computing efficiency, while retaining the necessary information for the classification algorithms.

1) GAUSSIAN EXTRACTION
In order to properly characterize the location and shape of the peaks in the search space, the algorithm will: 1) Adjust 2D Gaussians around the maximum peaks in the search space. 2) After successfully fitting a Gaussian in the search space, the fitted Gaussian is substracted. 3) Repeat the process N times. 4) Estimate the residual noise.
In the upper part of Fig. 5, the search space with the real satellite signal (Galileo E1) and the spoofer signal, generated using symbols estimated by using SCER with MAP Bayesian estimator, can be seen. No channel attenuation was introduced.
In the lower part of Fig. 5, the search space, after the Gaussian extraction process, can be seen. This is the resulting Search Space after applying the algorithm that can be seen in Fig. 4. As it can be appreciated, only the residual noise after the Gaussian subtraction remains in the search space. The value of this residual noise is also estimated and fed into the classification algorithms, so the Machine Learning techniques can have the information related to the relationship between the peak amplitude values and the noise in the Search Space. The workbench described in III-B was used. Note that the Fig. 5 is not showing one of the cases used for the algorithm training nor the exact same configuration of the workbench used for the testing of the machine learning algorithms, but just as an example to show the concept.

2) RFIs DETECTION
As described in Section II, attackers may try to blind the victim's receiver before starting the attack. Due to this reason, we will also look for RFIs during the time window previous to the reception of data used to generate the search space in the victim's receiver.
The initial analysis proposed in this paper is based on simply finding outliers, assuming that an Automatic Gain Controller (AGC) is present, although future work will be performed in order to use a more sophisticated RFI detection schema. The RFI presence will be a feature to be considered for the machine learning algorithms.

V. CASE STUDY. SIMULATION WITH GALILEO E1 SIGNALS
In order to benchmark several machine learning algorithms, the data extraction method described in Section IV-B was implemented in the workbench from Section III-B. Different machine learning algorithms were analyzed using the workbench, based on Python Scikit-learn [18] library. Particularly: RBF SVM, Ada Boost, Decision Trees, Nearest Neighbors and Random Forests.

A. DATASET GENERATION
The datasets were generated using several combinations of Doppler shifts and time delays: 1) In the Spoofer case, for all the attacks, delay of 300 µsec and Dopplers of -5KHz and -2KHz were used. 2) For the real satellite the delays were of 200 µsec and 240 µsec. Dopplers: 5KHz and 10KHz. These configurations were deemed sufficient for the used search space resolution, as the detection results were correct. The resolution in the victim's search space should always be high enough to properly allow the 2D Gaussian functions fitting. VOLUME 8, 2020 FIGURE 6. Performed K-folds and reported accuracy. Based on [23].
Due to the reduced amount of peaks positions in the dataset, overfitting may occur. In order to rule out this possibility, the accuracy results were calculated, both feeding the positions to the algorithms and without feeding them with this information. This means that, in the Fig. 7, the results were obtained when the algorithm did not know where the peaks were located in the Search Space.
The used dataset was generated by applying the proposed feature extraction algorithm to datasets with Galileo signals from C/N 0 = 0dBHz to C/N 0 = 50dBHz.
The integration time in the victim's receiver was 16ms and the local signal was only the E1C PRN. The dataset was composed of 381 cases with Spoofer, and 1074 without Spoofer. Note that the dataset was not balanced. This had a clear impact on the false alarm probability and the missed detection probability. It can also be seen in the F1 [19] scores in Fig. 10. Depending on the system final application in which the Machine Learning complementary protection algorithm will be deployed, it will be necessary to tailor the dataset to reduce the false alarm probability or the missed detection probability. In order to reduce the probability of missed detection of a particular class (e.g. Spoofer present in the received signal), firstly, such class should be over-represented in the input dataset and secondly, the algorithms should be evaluated to find the fitting parameters that maximize the accuracy and F1 scores for the class of interest. Confusion Matrices [19] should also be considered when evaluating the results. As it can be seen in other state-of-the-art techniques that rely on the analysis of the Search Space [20], the use of adaptive thresholds that depend on the location of the peak is already proposed. Nonetheless, the innovative and beneficial point of using Machine Learning techniques for the Search Space analysis is that these techniques allow the redefinition of the thresholds by just modifying the used training dataset. This approach will allow a reconfiguration of an operational deployment of the system by means of feeding the operational system with a known dataset that may include new Spoofing techniques that were not conceived at the moment of the deployment of the system (as long as these new spoofing techniques imply a detectable signature in the Search Space). This also implies that, in order to not allow the Machine Learning techniques to fit into non-relevant features, the training dataset must be carefully designed and curated. If, for instance, the training dataset does not contain the sufficient    [20] regarding the relative position of the Spoofing signal with respect to the time delay are relevant, that can also be modeled into the system by accounting that situation into the training dataset. In other words: the detection capabilities of the deployed system can be further improved by means of a simple reconfiguration, without major modifications of the system. The relevancy of the training dataset is not only limited to this future evolution of the system and its detecting capabilities. As the PFA (Probability of False Alarm) of the system is also modeled by means of the over/under-representation of the spoofing cases in the training dataset. The same applies to the PMD (Probability of Missed Detection).
The result that should be considered for analyzing the PFA (Probability of False Alarm) is not the accuracy, which considers both classes (Spoofer present and spoofer not present), but can be derived from the reported confusion matrices.  Both Recall and Precision give a rate of the number of correctly predicted elements of a class against the number of wrongly labeled elements (elements that, in reality, belong to a class and were predicted to the other class (Precision), or elements that don't belong to the class and were predicted as part of the class (Recall)). This means that F1 will be a value between precision and recall, per class, providing with a score which will be one for a perfect case and 0 for a model performing terribly. The F1 scores can be found in Fig. 10 and Fig. 11.
Note that modifying the number of spoofing signal cases in the training dataset will modify the values for F1 for both types of classes, too.
Other authors, as in [21] suggest the application of Neural Networks for the analysis of features extracted from the output of early and late correlators tracking the signal. In the present work, we are analyzing the entire search space for the generation of the features. This is an important detail, as otherwise if the victim tracks the false signal and that signal is far from the real peak in the Search Space, the attack may go undetected. In [22], the use of Support Vector Machines (SVM) on sensor fusion data is suggested for UAVs. Such approach makes sense in a dynamic environment like the one of a moving vehicle but, as claimed by the authors, if the Spoofer has absolute knowledge of the victim's trajectory, the attack will go undetected if this protection approach is followed. Note that this implies that critical infrastructure standoff victims (e.g. standoff timing users) are particularly in risk because of that, so for such receivers, the analysis of the Search Space should be advised.
As it can be seen in Figs. 8 and 7, the results were generated with two different setups in the victim receiver: 1) No Gaussian positions in the search space were fed into the algorithms, only using a local copy of E1C PRN and with a coherent integration time of 16ms. 2) Gaussian positions in the search space were fed into the algorithms, only using a local copy of E1C PRN and with a coherent integration time of 16ms. Multipath was not simulated in the dataset, hence clear sky conditions are assumed at the victim's receiver antenna.
The calculated Search Space has a resolution of 392.16Hz in the Doppler axis and 20 ns in the delay axis, which was sufficient for the cases under analysis. Parallel Acquisition in Time Domain was used in the victim's receiver in order to compute the search space.
The workbench was configured to work with comma floating numbers. This allows disregarding effects related to fixed-point precision.
The results can be found in the Figs. 8 and 7. The algorithm configurations were as follows: 1) Nearest Neighbors: groups (K) = 5, uniform weights, ball tree algorithm, leaf size = 20. The accuracy results were obtained using K-folds technique (K = 5). From the overall amount of samples, 30% of them were used for validation, deriving the accuracy results reported in this paper. The other 70% were used to train the models, using K-folds technique, dividing the dataset into five groups (K = 5). See Fig. 6 for details on the performed cross-validation.
The C/N o lower limit for the used datasets can be seen in the horizontal axis of Figs. 7 and 8. For instance, a point in 20 dBHz means that all input data samples used for the training and validation are extracted from signal records with C/N 0 of 20dBHz or higher. The used data samples reduction can be seen on the right vertical axis in all these Figs. and the black points in the figures.
The best results were obtained with algorithms based on Decision Trees, namely: Decision Trees, ADA Boost and Random Forest. These algorithms perform in a remarkable manner, when signal C/N 0 is above 30dBHz. As it can be seen in Fig. 7, RBF SVM algorithm does not achieve such great performance in that C/N 0 range, providing low (compared to the results provided by the Decision Trees based algorithms) accuracy, around 75%. It is expected that the multipath will affect the accuracy results in a negative manner, although the solution could already be applicable to critical applications where a standoff receiver is in full open sky conditions and with no multipath. Further work will evaluate the multipath impact in the accuracy results. As per the impact of the algorithms not knowing the location of the peaks in the search space, a small accuracy reduction can be seen for the best performers: Decision Trees, Ada Boost and Random Forest. The reduction is very small, but it is still present. In the RBF SVM case, the algorithm seems to be very stable, with respect to the inclusion of the position of the peaks, as no difference is seen when allowing the algorithm to know the location of the peaks. As per the Nearest Neighbors case, the accuracy is reduced when the location of the peaks is introduced, particularly for high C/N 0 . This can be explained due to the fact that, for the Nearest Neighbors case, an intense optimization was performed with the dataset without including the location of the peaks. As it can be seen in the Confusion Matrices (tables: 1, 2, 3 and 4), the impact is lower, but it is still present. This seems to imply that the results of these algorithms are more reliable and less prone to over-fitting, considering the proposed features and the simulated dataset.
The RBF SVM results do not improve as C/N 0 increases. It provides poor results, with respect to other algorithms. These results are not improved by the inclusion of the positions of the detected Gaussians. The reason for that is that the Radial Base Function kernel is not able to properly separate the data in the proposed features space. Indeed, results with other kernels for SVM, e.g. Linear kernels, are better and improve with C/N 0 , confirming the fact that RBF is not properly separating the data. The results of the Linear SVM are not reported in order to not clutter the results graphs.   Just as a reference, in Fig. 9, the SVM with linear kernel and RBF kernel are compared. In the linear case, as the C/N 0 improves the results also improve, eventually providing similar values to Decision Trees, when C/N 0 is greater than 30 dBHz. This demonstrates that while the RBF kernel is not able to properly separate the provided dataset, the Linear kernel is. The results in Fig. 9 were generated with E1C only 16 ms of integration and without feeding the position into the algorithm. In order to get more relevant accuracy figures, the best algorithms in terms of performance will be evaluated showing the F 1 score against the lower C/N 0 limit for both classes (Spoofer present/Spoofer not present). Confusion Matrices are shown (tables 1, 2, 3 and 4) for the best performer (as it can be derived from Figs. 10 and 11), the Random Forest algorithm. In general, for Random Forest, as reported in the Confusion Matrices in table 2 and table 4, for C/N 0 >= 39 dBHz some classification errors are reduced to zero. This cannot be understood as a perfect result but as a very low error rate that, due to the size of the dataset, is not shown. The proposed technique should be only used to detect attacks (mainly for stand-off critical application receivers). Once the attack is detected, other auxiliary navigation systems shall be used (e.g. Inertial navigation systems, alternative stable clocks, etc.)

VI. CONCLUSION
Spoofing attacks represent a very serious threat for GNSS systems. To fight against that risk, some authentication techniques, like NMA in Galileo, will be available. However, this may not be enough to counteract the spoofing attacks based on SCER. For this particular case, a complementary approach based on the application of machine learning methods to the receiver search space has been introduced in this paper.
Two main aspects have been studied: the Intra-satellite PRN non-orthogonality distortion term due to Galileo modulation and the use of machine learning techniques for end-user protection.
Such Intra-satellite PRN non-orthogonality distortion term was found and a quality curve that allows direct comparison between GPS and Galileo for SCER Spoofer symbol estimation, was provided (refer to Section II). A deep examination of the operations performed by a spoofer to produce a SCER attack was provided, too.
Simulations in Section III confirmed the presence of the Intra-satellite PRN non-orthogonality distortion term for Galileo attacks on NMA, with low integration times and high C/N 0 . Appreciable impact was only found for integration times of 1 µs and C/N 0 greater than 67 dBHz. The impact was around 10%, or more, in terms of Spoofer P d . This effect may be of particular relevance for attacks on systems using SCA while using high-gain antennas. The Galileo quality curves proposed in Section II were also confirmed by the simulations.
Theoretical calculations suggest that a different distortion term, as seen in (38), appears in the SCER attack on Galileo with SCA, hindering the estimation of the unpredictable chip. Moreover, the Galileo theoretical quality curve, calculated in Section II, is very challenging for the attack on SCA as the integration time will need to be below 1 µs. In the second part of the paper, a new Machine Learning technique for SCER spoofer detection was proposed. Based on the simulation results, with classifiers based on Decision Trees and the proposed features extraction method, the models obtained performance ratios (correct classifications) greater than 98.48%, for C/N 0 between 40 dBHz and 50 dBHz. The False alarm rates get a significant improvement for decision tree-based algorithms for C/N 0 >= 39 dBHz, as seen in tables 2 and 4. Therefore, this seems to be a promising complementary solution for detecting spoofing attacks which, otherwise, may not be detected (assuming NMA is used, as the Spoofer may modify the navigation message without being detected by the proposed method if NMA is not used). Note that it was assumed that the attacker was not able to null the original satellite signal. Hence, the proposed Machine Learning technique could be applied for critical applications standoff receivers in two cases: when using GNSS signals that do not support NMA or when using NMA signals to protect against SCER, and always after a successful check of the Navigation Message authenticity.
Further work will evaluate other signal features, trying to reduce the computational load of the extraction step. Multipath simulation will be considered and more sophisticated RFI detection methods will be evaluated, too. The workbench will be updated to simulate SCA, and steps will be performed to start implementing a demonstrator on FPGAs.