Stealthy Adversarial Attacks on Machine Learning-Based Classifiers of Wireless Signals

Machine learning (ML) has been successfully applied to classification tasks in many domains, including computer vision, cybersecurity, and communications. Although highly accurate classifiers have been developed, research shows that these classifiers are, in general, vulnerable to adversarial machine learning (AML) attacks. In one type of AML attack, the adversary trains a surrogate classifier (called the attacker’s classifier) to produce intelligently crafted low-power “perturbations” that degrade the accuracy of the targeted (defender’s) classifier. In this paper, we focus on radio frequency (RF) signal classifiers, and study their vulnerabilities to AML attacks. Specifically, we consider several exemplary protocol and modulation classifiers, designed using convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We first show the high accuracy of such classifiers under random noise (AWGN). We then study their performance under three types of low-power AML perturbations (FGSM, PGD, and DeepFool), considering different amounts of information at the attacker. On one extreme (so-called “white-box” attack), the attacker has complete knowledge of the defender’s classifier and its training data. As expected, our results reveal that in this case, the AML attack significantly degrades the defender’s classification accuracy. We gradually reduce the attacker’s knowledge and study five attack scenarios that represent different amounts of information at the attacker. Surprisingly, even when the attacker has limited or no knowledge of the defender’s classifier and its power is relatively low, the attack is still significant. We also study various practical issues related to the wireless environment, including channel impairments and misalignment between attacker and transmitter signals. Furthermore, we study the effectiveness of intermittent AML attacks. Even under such imperfections, a low-power AML attack can still significantly reduce the defender’s classification accuracy for both protocol and modulation classifiers. Lastly, we propose a two-step adversarial training mechanism to defend against AML attacks and contrast its performance against other state-of-the-art defense strategies. The proposed defense approach increases the classification accuracy by up to 50%, even in scenarios where the attacker has perfect knowledge of the defender and exhibits a relatively large power budget.


I. Introduction
Machine learning (ML) based signal classification plays an important role in next-generation wireless systems.It can be used, for example, to identify the underlying pro-tocol or modulation scheme of the received signal in a spectrum-sharing scenario, e.g., coexisting Wi-Fi and cellular transmissions over the unlicensed 5/6 GHz bands [1]- [3], and LTE/radar transmissions over the CBRS band [4], [5].It can also be used to identify anomalies, rogue signals, and selective-jamming attacks [6]- [8].Signal classification may also be used for RF fingerprinting [9]- [11] to provide awareness of nearby emitters and avoid radio interference.
Deep neural networks (DNNs) were used in [12], [13] to identify signal types not included in the training phase, i.e., unknown signals.In [14], [15], the authors proposed more advanced DNNs, including fusion convolutional neural networks (CNNs) and self-supervised DNNs to improve the accuracy of modulation classification.Recent DNN-based RF signal classifiers also use recurrent neural networks (RNNs) [16], [17] (see also [18], [19] and the references therein for related work on ML-based signal classification).However, ML classifiers are vulnerable to adversarial machine learning (AML) attacks.These attacks can infer membership [20], leave a backdoor in the data [21], poison the data [22], or mislead the classifier into assigning wrong labels during normal operation [23]- [27].In this paper, we focus on the last type of AML attacks.Specifically, we investigate the impact of AML perturbations on signal classifiers, considering realistic aspects of the wireless scenario.AML perturbations have mainly been studied in the context of object classification/recognition, but more recently in the context of RF signal classification (e.g., [28]- [34]).In such attacks, an adversary trains a surrogate DNN, henceforth called the attacker's classifier, to produce cleverly crafted perturbations that are difficult to detect.When combined with the original (a.k.a."benign") samples, these perturbations can mislead the defender's classifier into wrongly classifying the signal type (see Figure 1).
Several factors contribute to the effectiveness of an AML attack, including how much knowledge the attacker has about the defender and what imperfections the AML perturbations may encounter before reaching the defender's classifier.In [28]- [31], the authors studied AML attacks in two extreme scenarios: the attacker has full knowledge of the defender's classifier (white-box attack) or it has zero knowledge (blackbox attack).Specifically, in [28] the authors adapted the original Fast Gradient Sign Method (FGSM) for generating perturbations [23] to attack modulation classifiers, assuming the attacker has perfect knowledge of the defender's classier.The authors in [29]- [31] showed that DNN-based signal classifiers are vulnerable to both white-box and black-box attacks.These attacks only represent two extremes.In many practical scenarios, the attacker has partial knowledge of the defender's classifier.The authors in [32]- [34] analyzed AML attacks that require prior knowledge (exact or probabilistic) of the channel state between the attacker and defender, assuming that the attacker knows the DNN architecture (including the trainable weights and loss function) used by the defender's classifier.Note that due to differences in the dynamics of the transmitter-attacker and transmitter-defender channels, the benign signal seen by the attacker will be different from the one seen by the defender, which will result in different trained weights even for the same DNN (ultimately, impacting the effectiveness of the attack, as later shown in our simulations).
Prior works on RF signal classification have not extensively examined differences in the hyperparameters of the DNN structures, even when such structures are trained under the same data.Our study examines both aspects (differences in the input as well as differences in the hyperparameters).In particular, we observe that knowledge of the defender's classifier plays an important role in the strength of the attack.Intuitively, the attack is stronger when both the attacker and defender apply the same DNN than when they use different DNNs.Even under the same DNN architecture, differences in the hyperparameters can also affect the attacker's effectiveness (even when the attacker and defender use the same training and testing datasets).For example, if two CNNs differ in filter sizes at the Cov2D layer(s) or in the number of layers, the attack can be less effective.We further observe the attack's effectiveness is reduced even when the defender and attacker apply the same DNN but train it with different seeds.This implies that knowledge of defenders' DNN structure is critical for AML attacks.In our work, we first examine the impact of AML perturbations under a white-box model.We use the results as a reference point to evaluate other attack scenarios where the attacker has partial knowledge of the defender.
Previous works (e.g., [28]- [34]) primarily focused on modulation classification attacks.Such works used CNNbased classifiers as examples but did not consider sequenceto-sequence models such as RNNs [16], [17].Our paper evaluates both protocol and modulation classifiers, considering CNN-and RNN-based designs.We start with FGSM, as a simple technique to generate AML perturbations [23].We then extend the treatment to multi-step attacks by considering Projected Gradient Descent (PGD) [24] and DeepFool [25].We evaluate these attacks under different knowledge levels for both modulation and protocol classifiers.
The authors in [32]- [34] considered the problem of synchronization between the attacker and defender.The syn-chronizing problem in our paper differs from theirs in two main aspects.First, we focus on studying the synchronization issue for input-dependent AML attacks (e.g., FGSM, PGD, and DeepFool).In contrast, the shift-invariance property demonstrated in [32]- [34] pertains to input-independent attacks, e.g., Universal Adversarial Perturbation (UAP).In the UAP attack, the same matrix of perturbations is generated for all different benign inputs.This matrix effectively fools all inputs with high probability [35].Consequently, the defender receives the same perturbation for all inputs, a notable contrast from the perturbations we examine in our paper.Second, UAP, being input-independent, allows the generated attack on one window to be effective on other windows, as demonstrated in [32]- [34].In our paper, we refer to this coarse-scale misalignment as inter-window shift.However, the misalignment can also be a fraction of a window, a scenario we refer to as intra-window shift.In this case, the shift-invariance property of the UAP attack is no longer valid.For input-dependent AML attacks, we study the impact of both inter-and intra-window shifts.
Finally, we propose a two-step defense mechanism to improve the robustness of the defender's classifier to AML attacks.Our defense approach relies on training multiple classifiers with various adversarial examples [23], each at a given level of perturbations.During normal operation (testing phase), a separate DNN-based estimator is used to predict the level of perturbations of the AML attack (including the possibility of no attack).Subsequently, one of the retrained classifiers is selected for robust signal detection.
Our contributions are summarized as follows: • In addition to modulation classifiers, we extend the study of AML attacks to protocol classifiers used in spectrum-sharing scenarios (prior work focused only on modulation classification).In contrast to [28]- [34] where only a CNN classifier was studied, in our work we consider two CNNs and three RNNs (e.g., LSTM and bidirectional LSTM).

II. System Model
We consider a wireless communication system that consists of a legitimate transmitter-receiver pair and an adversarial device (see Figure 1).The transmitter generates RF signals according to one of several possible protocols (for protocol classification) or modulation schemes (for modulation classification) in an interleaved manner, i.e., one protocol or modulation scheme is active at a time.Without loss of generality, we assume that the defender's classifier resides within the legitimate receiver 1 .This classifier is trained to identify the protocol (or modulation scheme) based on the received baseband I/Q samples, which we refer to as benign data or benign input.The attacker generates its perturbations based on overheard benign data.These perturbations interfere with the defender's classifier, pushing it into wrongly classifying the received samples.We refer to the combined benign data plus perturbations as adversarial data.
The output of the defender's classifier is represented by the mapping z = g(x; θ), where x is a window of I/Q samples and θ is the set of learnable DNN parameters, i.e., weights and biases.The input x is in R 2×N , where N is the window size (in consecutive samples) and the first (second) row represents the sequence of I (Q) values, respectively.The input matrix x is passed through the DNN and is represented by a feature vector resulting from a projection and nonlinear (activation) function, σ(•).The classifier assigns a label f (x; θ) = arg max k (σ(z) k ) to the received input, where k ∈ K and σ is a softmax function.In this formulation, σ(z) k is the numerical output of classifier f corresponding to the kth protocol (or modulation) type.
At any given time, let H td be the channel matrix from the legitimate transmitter to the defender, H ta be the channel matrix from the legitimate transmitter to the attacker, and H ad be the channel matrix from the attacker to the defender.We assume AWGN {n d } and {n a } at the receive chains of the legitimate receiver (defender) and attacker, respectively.In the absence of AML perturbations, the defender receives x d = H td x t ′ + n d , where x t ′ is the transmitted waveform.The attacker receives x a = H ta x t ′ + n a .The adversary uses its signal x a to generate and transmit AML perturbations η.
In the presence of AML perturbations, the defender receives We introduce a variable τ to indicate the time lag between the arrival of the benign signal at the defender and the arrival of the corresponding AML perturbations.Accordingly, the signal received by the defender becomes x * d (τ ) = H td x t ′ + H ad η(τ ) + n d .Several approaches can be used to generate η.Such approaches were studied in the context of computer vision and natural language processing.In this paper, we apply these approaches in the context of RF signal classification.Specifically, the attacker seeks to determine AML perturbations that, when combined with the original signal, fall within an ℓ ∞ ball determined by ϵ and that maximize the classification error.More formally, the adversary would ideally solve: where I is an indicator function that reflects the number of misclassified labels in a given training set.We seek the smallest possible perturbations.To achieve this goal of finding the perturbation efficiently, we add a constraint on η.Note that ϵ > 0 is a user-defined parameter that limits the power of the perturbation and ensures the attack is difficult to identify by the defender.Instead of constraining η, one can also attempt to find the minimal η that is sufficient to change the estimated label.This is done by solving the following minimization problem: This type of perturbation, proposed by Moosavi-Dezfooli et al. [25] is called the DeepFool attack.

III. DNN Structures
This section discusses the DNNs we consider for protocol and modulation classification, as well as the datasets used to train and test them.

A. DNNs for Protocol Classification
We consider four DNN structures for protocol classification, as shown in Figure 2. Three of these structures are stacked RNNs, each made of dense layers as well as Long Short-Term Memory (LSTM) and/or bidirectional LSTM (BiLSTM) layers.The last DNN is a CNN, modified from LeNet [36] by replacing the Conv2D of LeNet with Conv1D layers to efficiently transform and extract features from the time-domain sequence.In addition, we remove the padding layer from LeNet to improve the accuracy.The kernel size for the Conv1D layer is set to two, and its stride is set to one.The activation functions for the Conv1D and the fully connected layers are scaled exponential linear units.The output layer in each classifier is soft-max.To train and test the protocol classifiers S 1 to S 4 , we generate a dataset of 15,000 inputs (see Section VI), each containing 512 pairs of I/Q samples.AWGN is added to the samples to achieve a given signal-to-noise ratio (SNR) 2 .Approximately 60% of the dataset is used for training, 20% for validation (i.e., early stopping, hyperparameter tuning, etc.), and 20% for testing.We monitor the cross-entropy and use early stopping with a patience of three.

B. DNNs for Modulation Classification
We also consider the is a four-layer CNN that uses two convolutional layers and two fully connected layers.The hidden layer activations are Rectified Linear unit (ReLu).The output layer is a soft-max.The RML 2016.10adataset comprises 220,000 data segments (i.e., windows), representing 11 modulation schemes.There are 20 SNR values that range from −18 dB to 20 dB in steps of 2 dB.This results in 1,000 windows of samples per modulation scheme per SNR.We use 50% of the data for training, 5% for validation and early stopping, and 45% for testing.The RML 2016.10adataset is available in windows of 128 samples (I/Q pairs) each, with a stride of 64 samples, i.e., two successive windows overlap by 64 samples.

IV. Adversarial ML Attacks
We consider three different approaches for generating adversarial data: FGSM, PGD, and DeepFool.Although other approaches have been proposed in the literature, these three are often applied to wireless communication systems.

A. FGSM Attack
FGSM uses the gradients of a DNN to generate a perturbation η and, subsequently, the adversarial data x adv = x + η [23].Ideally, the defender would predict the same class for x and x adv if η is less than the given precision.However, the adversary can craft η and cause the defender's classifier to change its decision on the perturbed data.We denote the DNN's mapping function as f : R 2×N → [0, 1] K with parameters θ.Even though the difference between x adv and x is the small perturbation η, the difference f (x+η; θ)−f (x; θ) is not linear in η.In fact, the impact of η can be learned and amplified by FGSM to change the label sign by calculating backpropagated gradients.The adversarial perturba-tion is formally given by η = ϵ sign(∇ x L(x, y; θ)), where L(x, y; θ) is the loss function of the classifier (typically, cross-entropy) with parameters θ [23].The adversarial data are generated by maximizing the loss with respect to the classifier's input x and true label y based on the gradients ∇ x L(x, y; θ).The authors in [29] proposed a new parameter ϵ acc for adapting ϵ during the generation of the FGSM perturbations.In Section VII.F, we compare the results in [29] with the unmodified FGSM approach and show that both versions of FGSM lead to reducing the defender's accuracy.

B. PGD Attack
FGSM can be interpreted as a one-step approach to maximize the impact of the perturbations.PGD is a more powerful variant of FGSM that uses multiple steps to project the gradient on the negative loss function [24].We consider a constraint set Q for perturbation power ϵ.Starting from the initial point x 0 , PGD iterates over the equation x t+1 = P Q (x t + α sign(∇ x L(x, y; θ)) until a stopping condition is met, where P Q is a projection operator that ensures that the output satisfies the constraint and t is the iteration number, t = 0, 1, 2, . . ., T .In other words, PGD generates the perturbation in T iterations using a step size α.Clearly, the choice of α and T significantly impacts the performance of the PGD attack.Section VII studies the classification performance of PGD-based perturbations under different α and T .

C. DeepFool Attack
In DeepFool [25] ϵ is not set a priori; instead, the adversarial perturbation is determined by the smallest η needed to change the label f (x; θ).We can calculate the perturbation for x as in Equation (2).The same notation for f (x; θ) = arg max k (σ(z) k ) is used as in Section II.To show the changes in σ(z) with t, let σ(g(x; θ)) be the output activation function that generates K outputs corresponding to the number of classes.DeepFool continues until the accumulative perturbation η changes the input's label.For multi-class problems, DeepFool updates the gradient changes between all other labels and the label that the target model predicts, and chooses the label with the smallest change as the direction to accumulate the perturbation.To find the closest possible perturbation that would mislead the classifier, we need to calculate the gradient of σ(g(x; θ)).Therefore, this work considers the perturbation vector directed to the decision boundary between the originally predicted label and a fake label ŷ.The perturbation at each t can be written as: ∇σ(g(x; θ)) ŷ .DeepFool returns η as the sum of perturbation at each step (η t ).The DeepFool algorithm is summarized in Algorithm 1.

D. Energy of Perturbations
In all the previously discussed perturbation methods, the parameter ϵ controls the power (or energy) of the perturbations.This ϵ is sometimes called the adversarial budget Algorithm 1 DeepFool Attack (Multi-Class Classification) end while return η = t η t [38].A larger ϵ implies that the perturbations can have a larger impact on the input, which results in a lower classification accuracy of the adversarial dataset.A larger ϵ means the adversary requires more energy.To reflect the energy level of the perturbation, we define the Signal-to-Perturbation Ratio (SPR) as the energy ratio between the received signal and the perturbation: E(x)/E(η), where E(x) is the average signal energy that received by the defender before the additive perturbation: correspond to the I/Q values contained in the nth input sample.E(η) is the energy of the perturbation generated by the attacker without including the channel impact between the attacker and defender.The relationship between SPR and ϵ is not in closed form because the energy of each window of samples varies from one window to another, and the perturbation vector differs for each class of data.As a result, to obtain the SPR as a function of ϵ, we must first generate the perturbations and then compare the average energy between the benign signals and perturbations.We show such relationships in Figure 3(a)-(b).It can be observed that the SPR drops quickly with ϵ at the beginning.This trend slows down when ϵ is large.
Recent research proposed ML approaches for detecting low-power interference [39], [40].According to the method in [39], an adversarial signal can be detected when the interference power is 10 dB below the benign signal.Therefore, the adversarial perturbations will be hidden if the SPR exceeds 10 dB.According to Figures 3(a)-(b), an SPR > 10 dB corresponds to ϵ < 0.25 and ϵ < 0.002, for the protocol and modulation datasets, respectively.
Indeed, the range of values for ϵ depends on the specific dataset used.In our experiments, the samples in the two datasets exhibit significantly different amplitudes, as shown in the examples in Figure 4. Thus, for the same ϵ, the impact of the perturbations will be greater on the modulation classifier than on the protocol classifier.This is why we

V. Adversarial Attacks with Limited Knowledge
In this section, we assess the impact of the attacker's knowledge of the defender's classifier on the effectiveness of an AML attack, considering the aforementioned three techniques for generating AML perturbations.A whitebox attack (full knowledge) is expected to cause the most degradation in the defender's classification accuracy.What is less clear is how much reduction in the attack's effectiveness, if any, results from limiting the amount of information available to the attacker.Accordingly, we consider scenarios where the attacker possesses only partial information about the defender.We divide such knowledge into classifier and data domains, and consider different levels of knowledge for the attacker in both domains.Under partial knowledge, the attacker's DNN ends up being different in structure and/or trainable weights than the defender's DNN.Even with such differences, our results show that the attack is still effective, but such effectiveness depends on the similarity between the surrogate and defender classifiers.This observation confirms the concept of attack transferability, defined as the ability of an attack generated using one DNN classifier to impact the performance of another DNN classifer [41], [42].However, the level of transferability is a function of the dissimilarity between the two DNNs.Recent studies [43], [44] corroborate our findings, prompting the authors of these works to suggest applying transformations and input diversity during the training of the attacker's DNN so as to improve the efficacy of attack transferability.

A. Limited Knowledge of Defender's Classifier
We consider realistic scenarios in which the attacker trains a classifier f a (x; θ a ) that is not identical to the defender's classifier f d (x; θ d ).In this case, the loss can be represented by L(x * d , y a ; θ a ) because the label for the corresponding input needs to be estimated by the attacker's classifier.The difference between f a (x; θ a ) and f d (x; θ d ) has a direct impact on the loss function.We study the following four levels of the attacker's knowledge and test their impacts on the perturbations.
Attack A 1 : In this scenario, the attacker knows the hyperparameters of the defender (i.e., the network type, number of hidden neurons, activation functions, etc), but does not know the exact values of the defender's trained weights.This may result from using different random initializations or different learning rates during the training.As a result, the adversary's and defender's classifiers will have different weights and biases even if they have the same classification performance.For our simulations, we use two different sets of random seeds to initialize two classifiers before training them and keep all other settings the same.
Attack A 2 : In this attack, the adversary knows the overall structure of the defender's DNN but does not know other hyperparameters.For example, the attacker may know that the defender's classifier uses a seven-layer CNN model with Conv1D as the first two layers, but the attacker does not know the filter numbers of these layers.For our simulations, we assume that the attacker knows the number of layers, their types, and their order but does not know these layers' filter numbers (or unit numbers for RNNs).
Attack A 3 : In this case, the attacker knows the type of classifier that the defender uses (e.g., CNN or RNN), but not its structure.To study this attack, we use a differently structured classifier of the attacker to generate the adversarial perturbations.Sometimes, we consider the same type of DNN but with different layer numbers (e.g., we use a threelayer RNN structure S 1 for the defender but use a two-layer structure S 3 for the attacker).
Attack A 4 : In this attack, the attacker knows nothing about the defender's classifier.The mapping function f a can differ significantly with classifier types, especially if a CNN represents features differently than an RNN.In this scenario, we consider the situation when the attacker uses RNN structure S 1 as the classifier to generate the adversarial perturbations.Still, the defender uses the CNN structure S 4 as the detector and vice versa.

B. Limited Knowledge of Defender's Training Data
In a practical wireless setting, the benign samples received by the attacker, x a = H ta x t ′ + n, and those received by the defender, x d = H td x t ′ + n, are different due to channel impact.The attacker trains its classifier f a based on the dataset x a .Because the training data sets at the attacker and defender differ, the parameters θ a and θ d will differ.As a result, the adversarial perturbations must be generated with We denote this type of attack as A tr .

C. Imperfect Synchronization between Perturbations and Benign Data at Defender
Due to differences in propagation delays, as well as processing delays of benign data at the attacker, the adversary cannot guarantee that its perturbations will be perfectly synchronized with the benign data received by the defender [37].We study such imperfect synchronization and analyze its impact on the defender's classification performance.In our setup, the defender's waveform is sampled by a fixed-length moving window before being sent to the classifier.Therefore, we consider two situations: intra-and inter-window shifts, as shown in Figures 5(a) and 5(b).Note that our study of the impact of imperfect synchronization does not mean that the defender has any way to control or even estimate the degree of mis-synchronization.

D. Incomplete Sequences of Perturbations
The adversary may act intermittently to prevent being detected, generating its perturbations for only a fraction of the time, as shown in Figure 5(c).In this setting, the attacker listens to the channel at the beginning of the transmission and sends part of the perturbation to be superposed with the benign signal at the defender.For simplicity, we assume that during its active periods, the attacker's perturbations are synchronized with the benign signal.

E. Limited Energy Ratio between Perturbations and Channel Impact
Previously, we depicted the relationship between the SPR and ϵ before accounting for channel effects (see Figures 3).We also study the channel impact between the attacker and the defender, assuming AWGN channels.In this case, the total interference received by the defender is the sum of the adversary's perturbations and the channel noise.The Perturbation-to-Noise Ratio (PNR) was introduced to measure the relationship between the transmitted power of the adversarial perturbations and the noise/fading of the channel between the attacker and defender.In Section II, we expressed the perturbations received by the defender as H ad η + n d .The PNR, denoted as E(H ad η)/E(n d ), is averaged over all received baseband I/Q pairs.To evaluate the channel impact between the attacker and defender, we treat the received signal at the defender without attack as benign.Note the benign signals already include the AWGN noise between the transmitter and defender.To further determine the channel noise between the attacker and defender, we use the energy of the benign signals as the reference and vary such channel noise in several levels.After determining the channel noise between the attacker and defender, we further vary perturbations to evaluate their impact under different PNRs.The SNR in the attacker-defender channel is related to SPR and PNR as SNR = SPR × PNR or, equivalently, SNR [dB] = SPR [dB] + PNR [dB].Therefore, if the attacker wants to ensure the perturbations are undetectable, it should have a PNR value below SNR −10 dB.

VI. Datasets
For protocol datasets, the Matlab Wi-Fi, LTE, and 5G Toolboxes were used to generate signals.Of the various possible features, we use the baseband I/Q samples at the defender (with AGWN) as input to the classifier.I/Q samples are obtained before decoding the signal, providing a rich representation of the actual waveform.The simulated waveforms are divided into multiple sequences by applying a sliding window with a step size of one, each consisting of 512 I/Q pairs.Simulated transmissions are sent at the same center frequency, over a 20 MHz channel.In addition, we consider the LTE, Wi-Fi, and 5G NR as the classes of signals transmitted under an AWGN channel with SNR = 15 dB.The Wi-Fi waveforms are transmitted by generating baseband samples of 802.11ac (VHT) with BPSK modulation and 1/2 coding rate.The LTE waveforms are generated by downlink with reference channel R.9, which uses a 64 QAM modulation.We also generate 5G waveforms using 5G DL FRC with QPSK modulation and a coding rate of 1/3 with a subcarrier spacing of 15 kHz.These sequences form the datasets to train and test the four protocol classifiers.We generate a dataset of 15,000 inputs, with approximately 5,000 samples for each label (Wi-Fi, LTE, and 5G).
In addition to the 15,000 windows of samples, we also consider a much larger set of 220,000 windows.Specifically,  to better illustrate the impact of adversarial perturbations on classification accuracy, we consider the publicly available RML 2016.10adataset for modulation classification [37].This dataset comprises noisy I/Q samples for 11 modulation schemes: 8PSK, BPSK, QPSK, QAM16, QAM64, CPFSK, GFSK, PAM4, WBFM, AM-DSB, and AM-SSB.Each modulation scheme is represented in 1,000 windows of samples for each given SNR, with the SNR varying from -18 dB to 20 dB in steps of 2 dB.Thus, the RML 2016.10adataset includes 220,000 windows of samples (20,000 windows per modulation scheme), each consisting of 128 I/Q pairs.

VII. Performance Evaluation
In this section, we evaluate the impact of FGSM, PGD, and DeepFool attacks when the attacker possesses different knowledge levels about the defender.We then test the impact of mis-synchronization attack, persistence, and channel noise, considering FGSM as a representative example.We apply our evaluation to both protocol and modulation datasets.

A. FGSM Attacks
Figure 6 depicts the classification performance at the defender vs. ϵ, considering FGSM attacks on the protocol dataset.As shown in Figure 6(a), the RNN structures S 1 -S 3 achieve approximately 91% accuracy under benign AWGN perturbations.In contrast, the CNN structure S 4 achieves 97% accuracy (refer to the dashed lines for benign performance).The three RNN structures S 1 -S 3 have comparable performance but have various bidirectional LSTM designs.The accuracy drops for all four classifiers as we increase the budget of the adversarial FGSM perturbations via ϵ.Note these are white-box attacks where the adversary is capable of the most damage.We also observe that structure S 1 has the highest average accuracy over all ϵ settings among the three proposed RNN models.Therefore, we use structure After evaluating the white-box attacks, we consider attack scenarios where the attacker has incomplete knowledge (as described in Section V) of the defender's classifier and/or the training dataset used by the defender.The accuracy for RNN (i.e., structure S 1 ) is shown in Figure 6(b).The impact of attack A 1 is close to the white-box attack.This result is expected because the attacker has the same hyperparameters as the defender.Although the classifiers are trained with different seeds, one can still inherit most of the properties from the other.Attack A 2 exchanges the filter number of the first two layers, and attack A 3 uses one less layer (e.g., remove the third layer of structure S 1 ) for the attacker.Both show similar performance as the defender, which means these hyperparameters are relatively important for generating adversarial perturbations.Attack A 4 has the weakest attack effect.This is because the attacker applies the CNN structure S 4 to generate the adversarial signals for the RNN model (i.e., the adversary does not know the structure of the defender).Even though both classifier types can classify the waveforms accurately, the actual trained model differs significantly from the others.Therefore, a well-crafted perturbation for the CNN may not achieve the expected effect on RNNs.Attack A tr uses the different training datasets to generate the perturbations.Thus, it shows more variance than other attacks.It has an equivalent trend with attacks A 2 and A 3 , but slows when ϵ exceeds 0.15.
The accuracy of CNN (i.e., structure S 4 ) is shown in Figure 6(c).Similar to the RNN observations, the attack's impact depends heavily on the adversary's level of knowledge about the defender.In the simulation, attack A 2 exchanges the filter number of the two Conv1D layers, and attack A 3 removes the second Conv1D layer at the attacker side.Compared to the RNN, the layer and filter number We then show the impact of FGSM under the whitebox attacks using the RML 2016.10adataset.We use VT-CNN2 as the benchmark classifier for the defender.The adversarial budget, ϵ, varies from 0.00025 to 0.005.As ϵ increases, the perturbations exhibit higher power (i.e., lower SPR), and reduce more accuracy of the defender.We evaluate the defender under different SNRs and summarize the results in Figure 7(a).In addition to the white-box attack, we study FGSM perturbations under four limitedknowledge attacks.To keep the energy of the perturbation low, we explore the FGSM attacks with ϵ = 0.001 as an example.As shown in Figure 7(b), limited-knowledge attacks A 1 and A 2 show close accuracy with the white-box attack.This result suggests that a small structure change may not heavily impact the FGSM adversarial signals for VT-CNN2 on RML 2016.10adataset.However, when the attacker's knowledge is further reduced, the impact of FGSM becomes weaker (shown as the attacks A 3 and A 4 ).This indicates that the attack can be significantly weakened if the defender's knowledge is less than a certain level.However, these imperfect knowledge attacks are still stronger than AWGN with equivalent power.

B. PGD Attacks
Due to the CNNs and RNNs having similar accuracy trends under FGSM attacks as shown in Figure 6(b) and 6(c), we use RNN (structure S 1 ) as the classifier for the protocol dataset to show the remainder of the attack schemes.We first study the impact of step sizes and maximum iteration numbers on PGD-based perturbations.Under the white-box attack, we test the classification accuracy of the defender's classifier while fixing ϵ = 0.15.Recall that the PGD attack is computed over multiple steps (iterations) of gradient descent, and is parameterized by ϵ and α.The parameter ϵ regulates the power budget (same as FGSM), whereas α controls the step size.α can be chosen from a wide range because the projection in PGD always pushes the perturbed signal into the constraints of ϵ, as described in Section IV.B.While a larger ϵ can strengthen the attack, a larger α does not guarantee a stronger attack, which was also observed in [45], [46].Using the CIFAR-10 dataset, Croce and Hein [45] showed that when α is twice the value of ϵ, the PGD attack becomes weaker than when using smaller values of α. Figure 8(a) depicts the classification accuracy under PGD perturbations versus the number of iterations for three values of α with ϵ = 0.15 (protocol dataset).The defender's accuracy does not decrease when α goes from 0.2 to 0.3 because α is larger than ϵ.We observe that PGD with α = 0.1 achieves the lowest accuracy after ten iterations.Accordingly, we chose α = 0.1 for PGD and evaluated this attack for different values of ϵ. Figure 8(b) shows the defender's classification accuracy under FGSM and PGD attacks.PGD attacks are stronger than FGSM attacks when ϵ ranges from 0.05 to 0.3.These results suggest that PGD may be more effective at generating perturbations.
Comparable trends are observed in VT-CNN2 using RML 2016.10adataset.To ensure that PGD attacks result in perturbations with limited energy, we fix ϵ = 0.0025, and vary α from 0.001 to 0.005.Figure 8(c) shows the classification accuracy for different values of α.When α is close to ϵ, the value of T has a visible impact on the effectiveness of the PGD attack, particularly when T increases from 1 to 2. After a few iterations, the impact becomes less significant.Similar trends are observed under the other two small values of α.Moreover, we evaluate the accuracy of the defender's classifier under attacks as a function of ϵ when testing SNR is 16 dB, as shown in Figure 8(d).FGSM and PGD are quite effective in degrading the defender's classification accuracy.As expected, the accuracy goes down with a larger ϵ.Generally, PGD is an iterative attack and can impact the classification accuracy more than the one-round FGSM attack.We compared and summarized   the impact of FGSM and PGD attacks in Figure 7(a), where we allow a sufficient number for PGD attacks for comparison.When ϵ is very small, PGD similarly impacts the classification performance as FGSM.As ϵ increases, the difference between PGD and FGSM is more pronounced.In our case, the accuracy gap between PGD and FGSM only grows when ϵ increases from 0.00025 to 0.001, but drops after that point (i.e., as ϵ further increases).
In addition to PGD under white-box attacks, we evaluate the limited knowledge adversary for the protocol dataset in Figure 9 and for RML 2016.10adataset in Figure 10. Figure 9 compares the different knowledge levels of PGD attacks with α = 0.1 and T = 20 to the AWGN attack.The PGD-based attacks significantly impact the defender's classifier when we allow a larger ϵ value.Similar to the FGSM trends, the limited-knowledge PGD attacks show a weaker impact.Attacks A 1 and A tr are closer than other attacks.This performance is because they have the closest knowledge of the defender.Attacks A 2 and A 3 have similar performance as the defender, which is consistent with FGSM results in Figure 6(b).
In Figure 10, we explore the PGD attacks with ϵ = 0.001, α = 0.01, and T = 20, under different SNRs for RML 2016.10adataset.We observe the attacks become weaker with less knowledge of the defender, similar to FGSM in

C. DeepFool Attacks
We first compare FGSM and DeepFool in terms of the defender's accuracy and SPR, assuming a white-box attack.
A range of ϵ is considered for FGSM.DeepFool is not parameterized by ϵ, so it has only one entry in Table 1.From this table, we observe that FGSM with a larger ϵ reduces the defender's accuracy but requires more energy (lower SPR).This observation is in line with the observations in [30], [33].FGSM with ϵ = 0.2 has the closest SPR to DeepFool's.Therefore, in From the point of view of interference power, DeepFoolbased perturbations exhibit more fluctuations in their SPR.Specifically, in attacks A 2 , A 3 , and A 4 , DeepFool perturbations exhibit lower SPR than their FGSM counterparts but are still less effective than FGSM in terms of degrading the accuracy.One justification for this observation is that DeepFool calculates the gradient changes for all the possible labels and chooses the shortest direction among these labels to update the perturbation at each step.However, the estimation of the boundary between different labels heavily relies on the anticipated outcome of the defender's classifier, which is only partially known by the attacker.As a result, the imperfect knowledge of the attacker can weaken DeepFool more than FGSM.
We further consider the DeepFool for VT-CNN2 on RML 2016.10adataset and show the limited-knowledge attacks over all SNRs.As shown in Figure 11

D. Impact of Synchronization
We evaluate the accuracy of the defender under intra-and inter-window shifted perturbations to simulate the imperfect  synchronization.The results on the protocol dataset are shown in Figure 12(a) and (b).Both intra-and inter-window shifts weaken the strength of the FGSM attack; however, the shifted FGSM attacks still degrade the performance further than AWGN.The equivalent AWGN means that the attacker transmits the AWGN noise instead of FGSM perturbation, where AWGN has the same energy as the FSGM attack under given ϵ.The intra-window shifted attack can be weakened a lot even only has one sample step shift, as shown in Figure 12(a).The shift between the signals and perturbations can further reduce the attack performance until the shift size reaches around 100 samples.Similarly, the first several steps for the inter-window shift have a more significant impact on the attack, as shown in Figure 12(b).
When the shift step achieves around 50 windows, the effect of shifted attack starts to converge.In an actual attack, the attacker cannot control such synchronizations; however, our results can be used as the referring point to understand the impact of the asynchronization and estimate the defender accuracy for the attacker.We further evaluate the impact of perturbation shifts on the RML 2016.10adataset.We train classifiers with the data over whole SNRs and analyze the performance for testing data under different SNRs.We consider testing data with the highest SNR (18 dB) and use it as an example scenario to show the impact of synchronization, and later for completeness, and channel effect.Figure 12(d) and (e) show the impact of synchronization for RML dataset.We consider a smaller range of the sample shifts than the protocol classification dataset because the window length of the RML dataset is 128, other than 512.Similar to the protocol classification dataset results, the first several steps drop the attack strength a lot for the intra-window shift as shown in Figure 12(d).The shifted perturbations perform comparably when the shift step exceeds ten samples.The FGSM attack with low ϵ (e.g., ϵ = 0.001) has a similar effect as the equivalent AWGN attack when the intra-window shift is greater than ten steps.For the inter-window attack as shown in Figure 12(e), the effectiveness of the FGSM attack is reduced even with one window shift.However, the further shift does not degrade the attack more.This is because the testing data in the RML 2016.10adataset is shuffled by default, and the impact of an inter-window shift larger than one step is the same as a random-step shift.The order of I/Q pairs is unknown, so the inter-window shifted perturbations are similar to the shuffle.Overall, the FGSM attack with larger ϵ suffers less for both the intra-window and inter-window shifts.

E. Impact of Completeness
In an ideal attack, the attacker can continue to send the streaming of perturbations that are superposed to the defender's signal.However, it can be stealthier if the attacker sends the perturbation discontinuously.The impact of the perturbation completeness for the protocol dataset is explored and summarized in Figure 12(c).The attack can still be effective even after losing some perturbation samples, especially when missing parts are less than 50.With more perturbations missing, the attack becomes weaker.Nevertheless, the incomplete attack with 300 samples losing is still more substantial than the equivalent AWGN attack (shown as dashed lines above).Note that our full sample length is 512 for the protocol classification problem, indicating that the AML attack with half perturbation interrupted is still effective.The impact of the completeness for RML 2016.10adataset is summarized in Figure 12(f).Both the AWGN and FGSM attacks are impaired due to truncation.The impairment has a near-linear relationship with the number of missing samples when the missing amount exceeds ten.Even if the FGSM attack degrades with losing samples, it can still be more powerful than the AWGN attack with the equivalent energy.

F. Channel Impact
The efficacy of an AML attack depends on both the channel type (e.g., Raleigh fading vs. AWGN) as well as channel conditions.We assume that all three channels (Tx-attacker, Tx-defender, and attacker-defender) are AWGN, and we evaluate the effect of the channel conditions between the attacker and defender.To do that, we first obtain the power of the received (benign) signal at the input to the attacker based on the power of the transmitted benign signal and the given SNR value for the TX-attacker channel (SNR T −A ).For the protocol dataset, we set SNR T −A to 15 dB during training and testing.For the RML 2016.10adataset, AWGN is already embedded in the signal at different SNR T −A values, so during training we use the average of all the samples in this dataset (over all SNR T −A values) to determine the average power of the received benign signal.Testing of the modulation classifier is done at SNR T −A = 18 dB (the highest SNR in the RML dataset).For both protocol and modulation classification, let β denote the ratio between the (average) power of the incoming signal at the attacker and the noise power of the attacker-defender channel.For a fixed β, (hence, fixed noise power, E n , of the attacker-defender channel), we vary the power of the perturbations by varying the PNR.Recall that the 'N' in the PNR refers to the AWGN of the attacker-defender channel.Figure 13  of the attack.This can be observed for all values of β.
Another key observation is that for small to medium values of β, the attack is still significant even at small PNR values.For example, when β = 0 dB (very noisy attacker-defender channel, relative to the power of the received benign signal) and a PNR of −5 dB (perturbations power is 5 dB less than attacker-defender noise power), the classification accuracy is about 20% for both protocol and modulation classifiers.Even with lower PNR values (e.g., −10 and −15 dB for the protocol classifier), the attack is still significant.For both the protocol and RML 2016.10adatasets, we consider the β from 0 dB to 15 dB with a step size of 5 dB.The defender's accuracy reduces when PNR increases for all values of β.When β is low, the channel noise between the attacker and defender can degrade the classification accuracy even with slight perturbations.Channel noise here can be regarded as the traditional jamming attack.In Figure 13(a), such noise makes the accuracy of the defender drop to around 70%.As the β increases, the channel condition improves, and the defender's accuracy also rises.For example, when PNR is around −10 dB, the defender's accuracy performs better under larger β.As β increases, the channel noise decreases.As shown in Figure 13(b), when β = 15 dB, the defender has an accuracy of 80%, which aligns with the observation under the benign data.
As previously mentioned, the authors in [29] modified the FGSM attack and evaluated its performance under different SNR values.Their study was conducted using the RML 2016.10adataset and the VT-CNN2 classifier, assuming an AWGN channel and a white-box attack.In their results (Figure 2 in [29]), the defender's accuracy dropped to 0% when SNR = 10 dB and PNR = 0 dB.Our results in Figure 13(b) show that the unmodified FGSM attack reduces the defender's accuracy to around 40% when β = 10 dB and PNR = 0 dB.This implies that even the (unmodified) FGSM algorithm can significantly reduce the defender's accuracy, although not to the level achieved by the ϵ adaption approach in [29].Our findings on FGSM are aligned with other works, e.g., [31], which also showed the efficacy of the original FGSM attack.Note that channel information may be leveraged to design very effective (channel-dependent) AML attacks, as done in [32], [33].However, even when the technique used to generate the perturbations is channelagnostic (e.g., the classical FGSM), our results above show that the attack is still impactful over a wide range of SNR and SPR values.
It is important to note that different studies in the literature were conducted under different simulation settings; some rooted in hardware experiments, while others consider specific channel models and types of attacks (e.g., UAP attacks).For instance, the study of channel effects in [32]- [34] is based on a Rayleigh fading model, whereas our study considers an AWGN channel.Intuitively, the success of an attack depends on both the channel model (e.g., AWGN vs. Rayleigh fading) as well as channel conditions.These disparities can lead to variations in the AML attack efficacy.For instance, unmodified AML attacks might become less effective in a fading channel; however, their potency increases if the attacker and defender are in close proximity.Consequently, a meaningful comparison necessitates applying a similar setup.

VIII. Defense Against Adversarial Attacks
In this section, we investigate several defense mechanisms against AML attacks.First, we provide a summary of related work on this topic.

A. Related Work on Defense Mechanisms
Recently, several defenses have been proposed against AML attacks on DNN models [47]- [53].Olowononi et al. [47] presented an encryption mechanism to hide the DNN internal weights, parameters, and training data from an adversary.They also presented three techniques to improve the defender's robustness: input pre-processing, adversarial training, and post-processing.He et al. [48] evaluated adversarial training, randomization, defensive distillation, and gradient masking to defend against adversarial attacks.Adesina et al. [49] presented statistical approaches to monitor metrics such as the peak-to-average power ratio (PAPR), the distribution of softmax outputs of the DNN classifier, and median absolute deviation (MAD) of the data for adversarial signal detection.They also evaluated the efficacy of adversarial training and randomization to mitigate AML attacks.Of the various defense mechanisms proposed in the literature, adversarial training remains one of the most robust methods [54], [55].Moreover, some methods in [47] and [48] may not be effective for broadcasted RF signals due to their vulnerability to eavesdropping.Accordingly, we present a novel adversarial training approach to improve the robustness of protocol and modulation classifiers.
Several new defense mechanisms have recently been proposed in the literature (e.g., [26], [45], [56]), but were often countered by more potent attacks that are capable of bypassing these defenses.In principle, certified defenses (CD) ensures that a given classifier is robust to adversarial perturbations as long as these perturbations are constrained by a given bound.The authors in [57]- [59] proposed CD mechanisms that offer robustness guarantees against normbounded attacks.Recent research employs techniques like convex outer approximation [57], semi-definite relaxation [58], and differential privacy [59] to efficiently determine upper bounds on the worst-case loss.Random smoothing, a prevalent CD method [60], [61], introduces noise to input data and employs statistical approaches to measure the model's resilience to perturbations and provide probabilistic guarantees on its resistance to bounded perturbations.The widespread adoption of CD stems from the simplicity and effectiveness of this approach across diverse models and input variations.Lipschitz-based methods [62], [63] are variants of CD that also gained attention.These methods center on regulating the network's Lipschitz constant -a metric of a function's sensitivity to input changes.By ensuring minimal output variations in response to slight input perturbations, these methods train networks to inherently maintain stability and robustness.Despite the versatility of CD techniques against various attacks, their practical use in the wireless domain is limited because of the difficulty of establishing a meaningful bound on the attacker's perturbations, which undermines the efficacy of these techniques.

B. Adversarial Training
Adversarial training [23], in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstand strong attacks.As a result, instead of updating the loss function based on a benign input x, the new loss function at the trained defender classifier is calculated based on both benign and adversarial inputs, as follows: L(x, y; θ) = γL(x, y; θ) + (1 − γ)L(x adv , y; θ).(3) The key idea behind this strategy is to increase the model's robustness by ensuring that the model predicts the same class for legitimate and perturbed examples.Considering the same attack generation method previously described: the defender first trains a DNN, denoted as DNN naive , using benign data then the attacker steals DNN's structure, including all the weights and biases.In our defense mechanism, the defender uses DNN naive to develop its AML perturbations and combines them with benign data to retrain its DNN.The retraining dataset consists of the original and the selfperturbed data, resulting in a data augmentation compared to the DNN naive training.The retrained DNN is denoted by DNN defense .To balance the impact between benign and adversarial data (i.e., the losses for both types of data), we set the sample number of both parts the same.As a result, portion parameter γ is 0.5, and the retrained DNN can have relatively good accuracy on both the benign and the perturbed data.
One important aspect of adversarial training is setting the parameters of the AML generator.For FGSM, this would be the value of ϵ.First, we consider a scenario where the defender uses a fixed value of ϵ, irrespective of the ϵ used during the attack (testing) phase.Considering the RML dataset and the VT-CNN2 network as a basis, we study the classification accuracy of the defender's DNN (DNN defense ) in three scenarios: (1) DNN defense is trained using benign data (i.e., DNN defense and DNN naive are identical), (2) DNN defense is retrained using a combination of benign and AML data, where the FGSM perturbations used for retraining are produced using ϵ = 0.005, and (3) DNN defense is retrained using a combination of benign and AML data, where the FGSM perturbations used for retraining are produced using the same ϵ using by the adversary during the test phase.Note that in the second scenario, the choice of ϵ = 0.005 is triggered by our interest in considering a reasonably small ϵ that leads to high SPR values, i.e., stealth attacks.The third scenario reflects the best-case performance of the defender, as it requires the defender to learn the specific ϵ used by the attacker, which is hard to obtain in a real attack.
Figure 14 shows the defender's classification performance for the three scenarios for different values ϵ used in the attacker's AML perturbations, i.e., the ϵ of the test dataset.In scenario one (blue bars), the higher ϵ of the attacker's AML data, the stronger the attack, and, hence, the worse the performance of the defender's classifier.Scenario two is presented in the orange bars.Interestingly, the inclusion of AML perturbations as part of the defender's training dataset improves the defender's classification performance only when the value of ϵ used by the attacker is close enough to the ϵ=0.005 used in the defender's AML training dataset (the accuracy increases from blue to orange bars).To improve the performance under benign-only training data, the defender need not exactly pinpoint the attacker's ϵ, i.e., a coarse estimate of ϵ is sufficient.For example, the performance under scenario two is better than that of scenario one when ϵ = 0.0025 and ϵ = 0.001.This is because the ϵ in the FGSM attack only impacts the energy of the perturbation.In other words, the perturbation vectors generated under ϵ = 0.0025 and ϵ = 0.005 points in the same direction but at different scales.The adversarial samples with high ϵ help the DNN know the direction of the perturbations.They can help improve the accuracy for lower ϵ by providing the same perturbation direction.When ϵ of the defender's training set is significantly different from ϵ of the attacker's testing set, the performance in scenario two can be worse than scenario one, i.e., the AML training data poisons the original (benign) dataset.In scenario three, shown in green bars, we assume that the defender and attacker use the same value of ϵ.This is a strong assumption since it is hard for the defender to know attacker's ϵ in advance.However, the results indicate that the classification performance can be improved if the defender can estimate ϵ.
We build a two-step structure for robust classification even under adversarial data.Figure 15 shows that the adversarial signal detector first approximates the ϵ value of the received signal and then assigns the signal to the corresponding modulation classifier.These classifiers are adversarially trained with a specific ϵ to perform well when receiving the same ϵ adversarial signals.We start with the design of the detector.Different neural networks, including CNN and RNN, are considered.We train the detector to predict the ϵ of the received signal from one of four possible values, where ϵ ∈ {0, 0.001, 0.003, 0.005}.Figure 15 shows the LSTM network that achieves the best performance with an accuracy of 72%.The confusion matrix of the LSTM-based detector is shown in Figure 16.Although there are incorrect classifications, the misclassified are typically mostly drop in the adjacent values of ϵ.If we consider the accuracy as the sum of correct and adjacent labels, the average accuracy can achieve 96.75%.It indicates the detector can reasonably estimate the ϵ of the received signals.
Figure 17 compares the classification accuracy between VT-CNN2 and our approach, where the solid lines represent the testing accuracy for the VT-CNN2 and the dashed lines are for the proposed two-step defense mechanism.When the SNR is high, the accuracy of benign data is approximately 76%.This performance is lower than VT-CNN2, especially when the SNR is from -10 to 10 dB.However, the slight decrease in the performance from the adversarial information is negligible compared to the increase in the accuracy of the adversarial data.In other words, although adversarial training slightly sacrifices some accuracy on benign data to defend the attacks, the retrained model outperforms the original VT-CNN2 across all adversarial perturbations.For example, VT-CNN2 only achieves 10% accuracy when ϵ is high (e.g., ϵ = 0.005).In contrast, the adversarially trained model achieves approximately 60% accuracy on adversarial data with different values of ϵ.Overall, our structure combines the benefit of all the classifiers in the second step and is robust to all four adversarial signals we considered.
We study a defense mechanism that is based on training the defender's classifier using either FGSM-or Deepfoolbased perturbations, under DeepFool attacks.We summarize the results in Figure 18.The black and blue plots show the accuracy of the original VT-CNN2 modulation classifier, while the grey, red, and orange plots are for the retrained VT-CNN2 classifier and the proposed defense mechanism.It is anticipated that training with FGSM perturbations but testing it under DeepFool attacks yields a relatively lower accuracy improvement than testing it under FGSM attacks.This is attributed to the dissimilar nature of perturbations generated by these two attacks.For SNR greater than 0 dB, the proposed defense with FGSM-based adversarial training provides 8% improvement in accuracy relative to the original VT-CNN2 classifier when the attacker uses DeepFool perturbations.When we retrain the two-step defense mechanism with DeepFool perturbations and test under DeepFool attacks, we observe that the defender's accuracy significantly increases to 57% at high SNRs, as shown in the red plot.Similar to the orange line (trained and tested under FGSM), the defense mechanism's accuracy greatly improves when training and testing are done using the same attack type.

C. Autoencoder-based Defense
The authors in [50], [51] use an autoencoder before RF classifier to mitigate the impact of additive perturbations.We utilize the autoencoder-based defense mechanism as described in [50], [51].Specifically, the denoising autoencoder (DA) architecture is chosen to be a fully connected DNN with 256-128-64-128-256 neurons at each layer.Note this is the same structure as Sahay et al. [50].The DA was trained   to minimize the mean squared error over 100 epochs.At evaluation time, the adversarial and benign signals are passed through the DA and then passed through the modulation classifier.Ideally, the DA would remove the adversarial perturbations without causing degradation to the classifier's performance under benign input.Figure 19(a) shows the amplitudes of the original and DA-reconstructed waveform for the RML 2016.10adataset, respectively.Visually, the denoised signal in blue is similar to the original signal in grey.This observation demonstrates that the DA successfully reconstructs the input.Then, the FGSM signals are passed to the DA using different values of ϵ.As shown in Figure 19(b)-(d), the denoised signal (in blue) is similar to the grey line when ϵ is small (0.001).As ϵ increases, the reconstructed signal deviates further from the original signal.The perturbation can have larger amplitudes with larger ϵ values, which results in the benign and adversarial signals deviating further.Unfortunately, the DA fails to denoise data perfectly.The DA's reconstruction error shows that the approach is ineffective under large perturbations.Figure 20 compares the DA defense and our proposed defense under FGSM attacks.The DA defense improves the defender's accuracy when ϵ = 0.001 to 50% at high SNRs; however, the defender's accuracy degrades to near 10% as ϵ increases.In contrast, our proposed approach outperforms the DA method, and can improve the accuracy to more than 65% under attacks (except ϵ = 0, i.e., no attack).Although the results in [50], [51] show the DA's effectiveness, their perturbations use small values of ϵ.Our results show that the DA may not be a suitable defense mechanism for larger ϵ.

D. Ensemble-based Defense
We extend our evaluation to include an ensemble-based defense approach.Inspired by [52], we train three DNN models: a fully connected neural network (FCNN), a CNN, and an RNN.Additionally, we consider both the original time-domain I/Q data as well as a frequency-domain version obtained using the discrete Fourier transform (DFT).Thus, we end up with six trained classifiers: three DNNs trained in the time domain and three DNNs trained in the frequency domain.The outputs of six classifiers are averaged to form an ensemble prediction, following the strategy outlined in [52].
While the authors of [52] demonstrated impressive accuracy for their classifier leveraging both time and frequency representations, our observations show that the DNN classifiers trained on frequency-domain transformed data do not attain the same accuracy as the time-domain models.Figure 21 shows the CNN's accuracy that is trained with I/Q and the DFT data.The two classifiers have similar accuracy when the SNR is less than −8 dB; however, the CNN trained with I/Q data has better accuracy than their DFT counterpart as the SNR increases.
A potential explanation for the disparities in our findings and those of [52] could stem from differences in the datasets.Specifically, the datasets employed in [52] include only four modulation types.In contrast, our study of modulation classification is based on the full RML 2016.10adataset, which consists of 11 modulation types, including two amplitude modulation schemes (AM-DSB and AM-SSB).These two modulation schemes were not a part of the dataset used in [52].Applying DFT to amplitude modulation data can potentially lead to the loss of crucial temporal features, resulting in lower accuracy for a DFT-trained classifier compared to a classifier trained on raw (time-domain) I/Q data.Furthermore, irrespective of whether the data are processed in the time or frequency domain, including additional classes in the dataset adds complexity to the decision boundary, which can lead to class overlap and reduction in accuracy.Figure 22 compares our defense to the ensemble-based approach.The ensemble strategy surpasses our defense (depicted by solid lines) when the SNR ranges from −10 to 5 dB.The trend is reversed for SNR > 5 dB.The ensemble defense's accuracy is nearly 50% in high SNR scenarios under FGSM attacks with ϵ = 0.001.Our defense exhibits an accuracy exceeding 60% in such scenarios, establishing a more effective safeguard than the ensemble approach for these experiments.

IX. Conclusions
Machine learning, particularly deep learning, plays an increasingly important role in wireless communications and can achieve state-of-the-art performances without handcrafted features.While these DNNs achieve satisfactory performance, they are also vulnerable to adversarial perturbations, limiting the classifiers' robustness.Most of these perturbations are undetectable at the input to the deep learning classifier; however, the classifier's output has significant changes.Thus, the strength of the attack is strong if the performance goes down and the SPR keeps high, which also makes the perturbation hard to detect.
This work studied the vulnerability of DNN-based classifiers to AML-based jamming attacks for signal classification datasets.We considered two different signal classification types, namely, protocol and modulation classification.By adding different types of AML-based perturbations while maintaining a relatively high SPR level, all DNNs significantly reduce the classification accuracy.We considered various adversarial approaches, including the FGSM, PGD, and DeepFool attacks.The decrease in performance when the adversarial signals have a high SPR, further shows that highly successful attacks can be challenging to detect [64].
The results show that these attacks can negatively impact the defender's accuracy.We observed similar trends on the DNN-based classifier for the protocol and modulation datasets.The effectiveness of the AML perturbations depends on the amount of information the adversary has regarding the structure and training dataset of the defender's classifier.Accordingly, we studied different attack scenarios with varying levels of knowledge.In one extreme, an attacker with full knowledge of the defender (white-box attack) significantly degrades the defender's accuracy.Compared to traditional jamming, where the attacker transmits only AWGN noise, the proposed AML-based attack requires much less transmit power to mislead the classifiers.
We also observed that DNNs are vulnerable to these attacks even if the attacker has imperfect synchronization, incomplete sequence, or under the noisy channel, of both the protocol and modulation classification.We generate attacks under these more practical cases and evaluate the impact of attacks of different synchronization, sequence length, and channel noise levels.We show that these imperfect attacks can still effectively drop the defender's accuracy in a certain imperfection range.
Finally, we propose the counter measurements for AML attacks and address one limitation of adversarial training.The proposed mechanism splits the defense into two steps: ϵ estimation and classifier retraining.In the first step, the ϵ estimator accurately estimates ϵ, and the adversarial training in the second step can counter a more specific attack.The proposed structure combines the benefit of all the classifiers in the second step.As a result, the two-step defense shows better robustness and effectively improves the defender's accuracy under different budget settings of attacks compared to the single-classifier retraining.

FIGURE 1 :
FIGURE 1: AML perturbations attack on a signal classifier in wireless systems.

FIGURE 6 :
FIGURE 6: Accuracy of proposed DNN classifiers under benign and FGSM-based perturbations: (a) All four DNNs under white-box attacks, (b) RNN under limited-knowledge attacks, (c) CNN under limited-knowledge attacks.

FIGURE 9 :
FIGURE 9: Accuracy of an RNN-based classifier (structure S 1 ) under different limited-knowledge PGD attacks.

FIGURE 13 :
FIGURE 13: Accuracy of the defender classifier under different PNRs when the power of channel noise is fixed.(a) Protocol dataset with embedded AWGN noise (test SNR T −A = 15 dB), (b) RML 2016.10adataset with embedded AWGN noise (test SNR T −A = 18 dB).

32 FIGURE 15 :
FIGURE 15: Two-step structure for robust classification of the received adversarial signals.

FIGURE 17 :
FIGURE 17: Comparison between VT-CNN2 and the proposed ϵ prediction mechanism on classification accuracy vs. testing SNRs with adversarial data using different ϵ. γ = 0 is used for adversarial training.

FIGURE 18 :
FIGURE 18: Evaluation of the proposed defense mechanism under FGSM and DeepFool attacks for different SNRs, when adversarial training is done using FGSM or DeepFool perturbations.

FIGURE 20 :
FIGURE 20:  Comparison between the proposed and autoencoder-based defenses against FGSM attacks for various testing SNRs and ϵ.
data, benign CNN with I/Q data, benign

FIGURE 21 :
FIGURE 21: Comparison between the CNNs trained under benign raw I/Q data and DFT transformed data on classification accuracy vs. testing SNRs.

FIGURE 22 :
FIGURE 22:  Comparison between the proposed defense mechanism and ensemble-based defense on classification accuracy vs. testing SNRs with adversarial data using ϵ = 0.001.
Classification accuracy vs. SNRs for VT-CNN2 using RML 2016.10adataset.(a)FSGM, PGD, and DeepFool under white-box attacks (several values for ϵ are considered), (b) FSGM and AWGN under limited-knowledge attacks (ϵ = 0.001).settingplay a more critical role in CNNs.As a result, attacks A 3 and A 4 show different trends with varying ϵ.In contrast, attack A tr shows a strong similarity with attack A 1 , which implies the CNN model can suffer a more severe attack than the RNN, even when the attacker has limited knowledge of the data.
Impact of α, number of iterations, and ϵ in the PGD attack: (a) Classification accuracy vs. number of iterations with α, (b) classification accuracy vs. ϵ for different attacks (α = 0.1), using the DNN structure S 1 and the protocol dataset.(c) Classification accuracy vs. number of iterations under various α, averaged over all SNRs, (d) classification accuracy vs. ϵ for different attacks (α = 0.01, SNR = 16 dB), for VT-CNN2 using RML 2016.10adataset.

Table 2
We observe that the FGSM attack becomes more effective with more knowledge, as the defender's accuracy drops from 48.35% under attack A 4 to 19.69% under attack A 1 .The SPR under limited-knowledge FGSM attacks remains the same because ϵ is fixed when generating the FGSM perturbations.In the case of DeepFool, although an attack with more knowledge is supposed to cause more harm, this is not always the case.For example, DeepFool attack A 3 is more impactful than DeepFool attack A 2 , although it has less knowledge of the defender.Moreover, the SPR in DeepFool varies with knowledge levels since the attack does not have an ϵ parameter that can be directly controlled.
ϵ value, for instance ϵ = 0.25, which gives rise to a lower SPR, DeepFool is still a stronger attack.Table2summarizes the SPR and accuracy under limitedknowledge attacks A 1 -A 4 (previously defined in Section While DeepFool's perturbations force classification errors at the defender, the attack is not guaranteed to be more effective than FGSM, especially in the limited-knowledge scenarios.Under limited knowledge, the difference between estimated and actual classifiers may be amplified during the iterations of the DeepFool algorithm.In attack A 1 , even though we keep the same classifier structure for both attacker and defender, the different seeds for training initialization can still make the attacker's network slightly different in the final mapping function.As a result, the perturbation generated based on the attacker's classifier may not perform as expected on the defender's classifier.Attacks A 2 , A 3 , and A 4 are less effective than attack A 1 .This is expected given that such attack scenarios consider less information about the defender.

TABLE 2 :
Comparison between DeepFool and FGSM with ϵ = 0.2 (limited-knowledge attacks) using the protocol dataset.The attacker has significant knowledge of the defender in attack A 1 .Nevertheless, the reduction in performance is not as much as the white-box attack.The limitation of the knowledge weakens the impact of DeepFool.Even though, DeepFool can still outperform the AWGN attack on a similar power level by 15% in the worst situation (attack A 4 ).We then show that DeepFool attack has very low energy of the generated perturbation.The results of different attack schemes tested under 16 dB are summarized in Table3.All these attacks can reduce the accuracy of the defender's classifier while maintaining the high SPR.

TABLE 3 :
Comparison between DeepFool attacks under different knowledge levels using the modulation dataset.