Countering Physical Eavesdropper Evasion with Adversarial Training

Signal classification is a universal problem in adversarial wireless scenarios, especially when an eavesdropping radio receiver attempts to glean information about a target transmitter’s patterns, attributes, and contents over a wireless channel. In recent years, research surrounding the idea of Machine Learning (ML)-based signal classification has focused on modulation classification, with the downstream objective of demodulation. However, while the computer vision data domain has made significant progress in ensuring robust classification of images despite crafted perturbations, this success has not been translated to secure modulation classification. In this work, we perform the first-ever physical test of an eavesdropping ML-based modulation classifier radio, which we trained offline using a ensemble of i.i.d. models. Each model is trained with a weighted mixture of data perturbed by iterative, “least likely” white box attacks and non-attacked data. We then tested the ensemble online using coaxial-connected Software Defined Radios (SDRs). We conducted a case study comparing our results to the state-of-the-art computer vision approaches to investigate the presence of “label leaking”, model capacity sensitivity, understand the viability of parallel and sequential variations on perturbation training, and assess the effectiveness of iterative attack training. Our results show that perturbations can result in guessing-level classification performance from eavesdroppers, and that varying levels of robustness can be achieved against all presented attacks. These findings confirm that any receiver presents a new attack vector by utilizing ML techniques for classification tasks, and can be vulnerable to evasion attacks at little-to-no cost to transmitters. Consequently, we argue for the use of our training scheme in all ML-based classifying radios where security is a concern.


I. INTRODUCTION
I N 2014, Goodfellow [1] presented a picture of a panda that the world's state-of-the-art Machine Learning (ML) (Acronym Appendix) algorithms confidently decided is a gibbon. Utilizing the classifier's gradient, an 8-bit integer resolution-bounded noise image was computed and added to the original panda image. Ever since, an arms race has been ongoing between crafting adversarial perturbations and developing countermeasures [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. It is important that high-risk wireless communications systems employed in applications such as autonomous vehicles [16], agricultural Internet-of-Things (IoT) [17], and military networks [18] are secure from perturbation attacks because incorrect classifications can result in catastrophic financial and/or human costs. Thus the design of trained ML algorithms for these wireless networks must prevent the introduction of additional attack surfaces used by potential adversaries, which requires an understanding of effective perturbation designs in realistic, physical scenarios.
As a relatively new field of research, wireless adversarial perturbation papers present many open challenges that have yet to be resolved. Papers that synchronously add physical perturbations to other physical transmissions to fool physicallayer classification systems [19] do not model realistic synchronization errors between the two, with the exception of the time shifted Universal Adversarial Perturbation (UAP) [6] attacks simulated in [20]. Works that simulate transmitters that add optimized perturbations to their own signals [21], [22] generously assume a white box eavesdropper and have not made attempts to design deterministic perturbations that improve various loss metrics. Additionally, several works [21], [22], [23], [24] have noted the frequently used dataset for these studies are generated with several ratio and labeling errors. There has also yet to be an investigation on the effects of the wireless channel applied to the ML-based detection and isolation of these perturbations [7], commonly performed by a statistical distribution estimating algorithms such as Variational Auto Encoders (VAEs) [25] or Generative Adversarial Network (GANs) [26]. Additionally, there remains a need for experimentation to confirm the overwhelmingly simulated works published so far with respect to realistic real-world scenarios involving clock drift, Radio Frequency Front End (RFFE) noise, and other real-world phenomena. Finally, many simulated countermeasures [27], [28], [29] do not consider state-of-the-art adversarial attacks published in the computer vision domain.

A. RELATED WORK
The study of adversarial perturbations in the wireless data domain is relatively new and lagging behind that of leading data domains such as computer vision. Sadeghi and Larsson [20] synchronously added Fast Gradient Sign Method (FGSM) [1] and UAP [6] attacks to received signals over an Additive White Gaussian Noise (AWGN) channel model, highlighting how considerably less power is needed to fool a Convolutional Neural Network (CNN) with perturbations when compared with random jamming signals, and that the UAP [6] attack is robust to time shifts between it and the received signal that simulates synchronization errors. Kim et al. [30] analyzed the same threat model and explored how the adversary can use the channel state information matrix to synchronously deliver power and error-optimized white and black box perturbations. Flowers et al. [24] use a different threat model, where FGSM [1] perturbations were added by a transmitter to its own signal to fool a CNN-based eavesdropper, and investigated the trade-off between BER and adversarial accuracy. The authors found that perturbations strong enough to be effective at lowering eavesdropper classification accuracy came at the cost of a significant number of communication errors (especially higher order modulation signals), that frequency and timing errors possessed a small effect on perturbation effectiveness, and that relatively large, single-step perturbations did not always increase loss because of unstable gradient ascent. These lessons motivated Flowers et al. [21] to design a feedback loop to the transmitter from the adversary to optimize the multi-objective loss functions, which design the perturbations to minimize power consumption, minimize BER, and minimize eavesdropper accuracy. DelVecchio et al. [22] similarly optimized the frequency-domain power and bandwidth of perturbations to maximize communication effectiveness without increasing eavesdropper accuracy. Lin et al. [31], [32], [33] performed an analysis of many state-of-the-art attacks such as Projected Gradient Descent (PGD) [34] against simulated modulation classification datasets. Bao et al. [35] diverged from the modulation classification use-case to analyze the effectiveness of state-of-the-art perturbations used to disrupt (IoT) networks performing device identification. Maroto et al. [27] implemented adversarial training robust to iterative attacks, but experienced label leaking and weak models due to crafting ground-truth-based perturbations that are overly correlated to trained models, ground truth class, and the non-adversarial data. Zhang et al. [28] performed defensive distillation to protect the network from single-step adversarial perturbations, but the process of fooling these networks is well understood [36], [37]. Finally, Sahay et al. [29] performed a 4-class modulation classification adversarial training simulation using both time and frequency domain features, showing a clear improvement over using time-based features alone. However, they did not show evidence that their novel feature extraction offered improvement upon the moment-and cumulant-based features used in state-of-the-art works [38].
In this work, we explore defense approaches against adversarial perturbations in a white box attack regime, as seen in the state-of-the-art. Traditionally, the term "white box" is used in adversarial perturbation scenarios to describe the weights and architecture of the target ML classifier as fully observable by the agent who is crafting perturbations. In our white box scenario, the adversary can observe not only the trained model and its weights, but all aspects of the eavesdropper's radio and ML systems, such as perturbation detection networks or ensemble classification schemes. We implement our methodology with this assumption because an attack or defense executed with an informational advantage is trivial to study, as it will usually succeed. Additionally, if white box classifier knowledge has been obtained, as in the state-of-the-art, via malware or reverse engineering, it is unclear to us why the perturbation defense sub-system would be unavailable.
Consequently, we do not investigate the use of semisupervised perturbation detection algorithms [7] because it has been shown they increase the attack surface of the classifier when the adversary is aware of them [4]. We do not investigate the use of gradient masking [5] or defensive distillation [9] because the process of fooling these networks is well understood [36], [37], [39]. Finally, we do not investigate the use of network verification [40], [41], [42], [43] as these computational methods are still prohibitively expensive for all but the smallest datasets and models. While our classification architectures are relatively small (see Section II-A and Section II-B), we show in Section III-E that making them any smaller such that network verification would be possible, will make them more vulnerable to adversarial perturbations. Adversarial training has been described as a powerful regularization method [34] that performs a similar function to L 1 regularization on the activations of linear classifiers [1]. When a model is overfitting, adversarial training, defensive distillation, and gradient masking schemes have all doubled as defenses against adversarial perturbations, as well as regularizers that increase the classification PPV of non-adversarial test data.

B. RESEARCH CONTRIBUTIONS AND ORGANIZATION
In this work, we present a number of defensive contributions to the state-of-the-art. These contributions confirm or deny state-of-the-art best practices from the adversarial computer vision domain, as well as establish new ones for wireless communication scenarios through both simulated and experimental demonstrations to make defenses more robust to knowledgeable attackers. Those contributions are as follows: • Evaluation of the effectiveness of perturbation training to mitigate attacks on a modulation classification model as measured by PPV, ensuring we measure common defense pitfalls discovered in other contexts such as "label leaking"; • Improving network architectures for Radio Frequency Machine Learning (RFML) signal classification in more realistic settings by training models to be robust to attacks and to avoid common defense pitfalls; • Validation of the state-of-the-art and proposed techniques in a physical setting utilizing unsynchronized radios, such that real-world data demonstrates impact and implementability.
This study impacts wireless communication privacy implications significantly by providing an example of real-world perturbations disrupting an eavesdropper's demodulation efforts in the presence of channel and hardware noise between two unsynchronized radios. A study of the eavesdropper's trade-space in pursuit of demodulation despite those disruptions is also provided. This paper is organized as follows: • In Section II, we introduce the real-world physical scenario that motivates adversarial attack of a signal classifier and define metrics of success for such an adversary. • In Section III, we present several novel studies of state-of-the-art computer vision adversarial training schemes applied to the problem of wireless adversarial perturbation defense. • In Section IV, we review our contributions and suggest several open challenges to the wireless security community.

II. SYSTEM MODEL
The state-of-the-art perturbation approaches assume one of two diffrent three-player scenarios. In the first scenario, a transmitter and receiver communicate while a reactive adversary eavesdrops, computes the proper perturbation, then synchronously transmits those perturbations to fool the receiving radio, which classifies the observed sum of signals [20], [30]. Alternatively, a transmitter adds pre-channel perturbations to fool an adversarial eavesdropper, which classifies the transmissions while maximizing communication capabilities between the transmitter and receiver [21], [22], [24]. In this paper, we investigate the latter scenario ( Fig. 1) while leaving the former for future work.

A. METHODOLOGY
State-of-the art methodology trends are as follows.
Adversarial perturbations in the wireless communications domain are typically generated to fool classifiers trained on the RML2016.10A [44] simulated dataset, its successor the RML2018.01A [38] dataset, or datasets modeled after the RML datasets that tune or fix the various channel model parameters, meta-data, or SNR [23]. Previous papers that apply adversarial attacks to this dataset use the same or similar supervised learning model to compute perturbations, the VT-CNN or VT-CNN2 models [45]. Finally, previous papers typically compute white box attacks such as the FGSM attack.
In this work, we create variations inspired by the RML2018.01A [38] datasets in Section III. In a related work [23], the RML2016.10A [44] dataset experiences significant multicollinearity (correlated input data indices), uses a pulse shaping filter without zero-crossings, and does not properly compute energy per symbol ratios. These issues, respectively, limit classifier performance, do not allow bit estimation, and produce SNR-shifted classifier performance results. Consequently, we chose to create a new dataset similar to RML2016.10A in which we increase the number of samples per signal capture from 128 to 4096 to reduce multicollinearity by providing more symbols and symbol transitions per example. Finally, we implement a slightly different Finite Impulse Response (FIR) Root Raised Cosine [46] (RRC) with a rolloff α = 0.35 and 12 taps since the RML2016.10A does not possess zero crossings in its RRC filter, as the dataset is not designed for bit estimation.
Our version of the dataset (Fig. 2) implements the cumulative random walk of truncated Gaussian samples for Symbol Rate Offset (SRO) and the cumulative random walk of truncated Gaussian samples for CFO as: for sample index i = 1, . . . , N, vector length N = 4096, and sample rate f s = 200 MHz, which are all equal to the values used in [44]. Finally, we implemented an 11th order FIR Butterworth [47] filter with a normalized frequency cutoff of 0.65 to isolate the signals from OOB noise.

B. ADVERSARIAL TRAINING WAVEFORMS AND CLASSIFIER
To address the issue of multicollinearity, we used time slices of 4096 complex IQ samples instead of 128. By trial-anderror, we found that by using more samples per signal, we did not need to generate as many signals to achieve the same test PPV, such that our training dataset contains 1.4 million signals instead of 2 million [38]. We additionally implemented the dataset with the following differences from the GR channel model presented in [38]: 20,30] dB. For a summary of parameters, coefficients, and other mathematical terms, see Table 1.
Since the training of models to be robust to perturbations is an adjacent task to the training models that generalize well to test data that differs statistically from training data [34], we generated several physical test sets to see how well our adversarial training schemes perform as regularizers. These test sets are comprised of 1408 signals, which are also called examples, where each of the 88 GR channel models generate 16 signals, as opposed to the training sets wherein 2728 GR channel models generate 512 signals each. Each GR channel model has an independent, fixed modulation class, SNR, and Samples Per Symbol (SPS), I. To generate the data, two USRP N210 SDRs (Fig. 3) are connected via coaxial cable. No digital gain or digital attenuation is used, the radios sample captures using a 1 MHz bandwidth and 20 MHz carrier frequency. We add perturbations to these streams of data via a synchronous Out-Of-Tree (OOT) block in GR implemented before the USRP sink block, which loads the trained Pytorch model, predicts and computes the gradient using the time slice of data, and adds the perturbation to the output. For consistency, we enforce a unit energy constraint on all datasets before classification and perturbation crafting as: where N is the length of x. The received test signals do not have an observable DC offset, such that mean subtraction is not implemented in simulated data. For our IQ data modulation classification adversarial training presented in Section III, we implemented a model deeper than VT-CNN2 in Pytorch, which possesses a higher learning capacity necessary for this work (see Section III-E) when performing adversarial training and training using a dataset with a higher number of classes. We used, as found by trial-and-error, a deep model inspired by the Visual Geometry Group (VGG) 10 [48] CNN model comprised of 9 convolutional layers with ReLU activations [49] and the following number of filters per layer: [64,64,128,128,256,256,256,256,256]. The model is terminated with two dense layers with 512 neurons and 22 outputs from the second dense layer. Max pooling is implemented every two layers with stride 2 and size 2. All convolutional layers used have stride 1 and kernel size 3, totaling 18.2 × 10 6 parameters. All weights are initialized using Kaiming initialization [50]. We did not find that dropout [51] and weight regularization improve classification performance.
The model is optimized in Pytorch by minimizing log softmax plus categorical cross entropy loss via the Adam [52] quasi-Newton method over 20 epochs in mini-batches of size 256. The following Adam parameters are used: α adam = 0.0442, β = (0.9, 0.98), adam = 1 × 10 −9 , γ = 0.1 with 4,000 warm up steps. No early stopping is implemented, and batch normalization [53] is used. Currently, the literature has observed the strongest perturbations are those crafted exploiting Neural Networks (NNs) with skip connections, also known as ResNets [54], as a new attack surface [55]. Consequently, we opt to forgo skip connections, despite their advantage in training deep models that are robust to vanishing gradient problems.

C. ADVERSARY GOALS AND DESCRIPTION
An adversarial perturbation is defined as a signal that is added to another signal which is given to a ML model during either training or testing with the intent of causing incorrect estimation or classification during inference. This interaction can be generally described as: where the perturbation, η, is scaled by and added to the original signal x to form a adversarial example, x * . If the trained ML model's predictions are described as f (x) =ŷ, then the perturbations are crafted using the observed or estimated prediction loss function of the model given a signal: with the expectation that increasing the loss will decrease the performance metrics of the deployed model (i.e., F1score, precision, recall, AUC, ROC, IoU, mAP). The reason for the scalar was originally to craft computer vision perturbations that are imperceptible to the human eye, but generally used to minimize the perturbation according to the metric and scheme of choice. Surveys of adversarial attacks and countermeasures are available on these topics from references [2], [3], [4], [5].
In this work, we implemented several different attacks to confirm, deny, or establish best practices presented in leading ML data domains. Due to the variation of perturbations and non-adversarial signals, we define a generalized, adaptive scaling factor based on perturbation energy E p for all attacks: where x is the information signal and η is the perturbation signal, which achieves the desired signal (E s ) to perturbation energy ratio: The choice of E s E p represents the importance placed by the transmitter on each of the two objectives, being receiver BER and eavesdropper classification PPV.
Given a finite power constraint, it is intuitive that amplifying all samples equally would result in the lowest BER. Yielding some of that power to strategically amplify some samples more than others grants the transmission a measure of obfuscation from fragile ML-based classifiers, at a cost to BER proportional to the power given up. If the transmitter has an objective eavesdropper PPV, the optimal choice for the E s E p ratio cannot be determined without a PPV feedback loop (see [21], [22]) from the eavesdropper to the transmitter, even in a white box scenario where all ML weights and classification rules are known. However, if the wireless channel is well known, as it is in many full duplex links, a BER objective could be used to choose a necessary signal energy, while using the remaining power constraint for perturbation energy.
The attacks used in this work include FGSM [1]: where y true is the ground truth label of x, a relatively simple and efficient attack when compared to the others in this work which minimizes p(y true |x * ). Additionally, we use the One-Step Least Likely (stepLL) attack [34]: where y LL is the least likely predicted class of x as determined by a classifier, which uses the least likely class of the signal according to the class scores of the model to maximize p(y LL |x * ). This attack is used for adversarial training [34] because FGSM [1] perturbations are substantially deterministic and correlated to the true label. Consequently, adversarial models trained with FGSM attacks classified adversarial data more accurately than non-adversarial test data, while those trained with y LL attacks does not. We visualize some stepLL perturbations in Fig. 4. Finally, we use the Iterative Least Likely (iterLL) attack [34]: which achieves more powerful perturbations than its one step equivalent by recomputing the direction of the gradient multiple times. In our work, we sample the number of iterations N ∼ U (2, 10) and compute the iteration step size as a ratio, α iter = 2 N . We leave the investigation of Projected Gradient Descent (PGD) to future work.

III. ADVERSARIAL TRAINING
In this section, perturbation countermeasures are studied by implementing the adversarial training scheme outlined in [34] using our RML2018.01A [38] inspired dataset and the VGG10 [48] inspired modulation classifier from Section II-B, and all attacks presented in Section II-C. We do so using perturbations crafted after first training a non-adversarial model, as in [56], such that we transfer the knowledge of the end results of training. The idea behind adversarial training is to train the model using mini-batches with both perturbations and non-adversarial signals:  (10) where m is the mini-batch size, k is the number of adversarial examples per mini-batch, L(·) is categorical cross entropy loss, and λ is the weighting of learning step size for adversarial versus non-adversarial training examples. In this work, we use m = 256, k = 38, and λ = 1 such that we achieve what is an effectively equivalent training scheme as seen in [34], who choose m = 32, k = 16, and λ = 0.3. We quantify the similarity of these parameter choices as mλ k = 0.15. As in [34], we randomly vary perturbation strength such that the adversarial trained model generalizes well to test-stage perturbations of different strengths. We accomplish this variation using a truncated Gaussian distribution as:

Loss
and refer to the value of E s /E p for this scheme as "sweeping". We perform the costly, relative to computer vision, training schemes presented in this section using a Intel Xeon Gold 6248 CPU node with 20 cores and 192 GB of RAM, and one NVIDIA Volta V100 GPU node with 32 GB of RAM.

A. EVALUATION OF NON-ADVERSARIAL MODEL
We first evaluate the non-adversarial model as a base line, unprotected classifier. In evaluating the non-adversarial training scheme, we made a number of discoveries. We found that Frequency Shift Keying (FSK) modulation classes are the most difficult to fool, with only three false positives across all modulation orders of FSK in an FGSM attack. This is due to the frequency shifts between each symbol being large. We found that FSK modulation with smaller shifts were easier to fool. The crafting of frequency-domain perturbation is the subject of ongoing research and will be the focus of a subsequent publication. When stronger attacks, deeper models, or larger perturbation energy are used, more FSK signals are fooled. We found that FGSM attacks perform better than stepLL attacks, because they lower the class score of the true class rather than increased the score of the least likely class. We also observed the iterLL attack is the most effective attack because it most accurately ascends the gradient due to taking multiple, smaller steps. Additionally, most test sets showed that, when attacked, they attempt to fool all classifications to be one of a few classes. For instance, 57% of false positives caused by iterLL attacks on the nonadversarial trained model belonged to the 256FSK class, 23% to the 8 Amplitude Shift Keying (ASK) class, and 20% to all other classes. Finally, we observed that increasing perturbation strength decreases modulation classification PPV, which is to be expected. Specifically, E s /E p = 0 dB stepLL attacks are required to approach a PPV equal to that of a zero rule classifier, and that E s /E p > 35 dB stepLL attacks had no effect on physical test PPV. On average, perturbations sent over a physical channel are slightly less effective, relative to non-adversarial PPV, than perturbations transmitted over a simulated wireless channel (Fig. 5). This is due to a covariate shift between training phase simulated channels and test phase physical channels. The size of the PPV ratio gap is proportional to that covariate shift.

B. EVALUATION OF CASCADE AND PARALLEL MODELS
Here we evaluate the performance of the protected classifier, as well as the parallel and series extensions of that protection scheme. Parallel [2] (cascade [56]) adversarial training is a parallel (sequential), method of decoupling the generation of adversarial training examples from the model being trained. The theory behind parallel decoupling is that perturbations are transferable between models and that parallel adversarial training schemes will achieve a better approximation of the underlying distribution of perturbations than adversarial training using perturbations crafted from a single pre-trained model, providing greater protection against black box attacks or new white box attacks generated by the fully trained model. The knowledge transferred by a parallel set of perturbations is statistically diverse and high variance, competing with non-adversarial training data for learning capacity in small models [2], such that under fitting occurs if the model size is not increased appropriately. The theory behind cascade adversarial schemes is that each iteration of training transfers additional information about how perturbations are crafted from already trained models to the ultimate model. We hypothesize there is some number of cascade training iterations and parallel set size that is optimal for a given scenario, and seek to identify the performance trends of these schemes via physical experimentation on models trained offline.
The number of training samples and number of training epochs for the ultimate model were held constant across all of these schemes (Fig. 6) such that the resulting PPV of each scheme will be the result of the knowledge transferred by training perturbations and not the duration of training or quantity of data.
In Table 2, adversarial training maintains about 26% of its protection against current step attacks compared to attacks used in training. Additionally, the ultimate models trained using the parallel training scheme perform worse in all scenarios except for attacks crafted using a model other than that used in adversarial training, or that their robustness is transferable at the cost of regularization. Finally, these models trained using the cascade scheme follow the same trends, but to a greater magnitude than parallel training schemes.

C. LABEL LEAKING
Here we ensure that our protection scheme does not over fit the classifier to depend on perturbations for good TABLE 3. An investigation of "label leaking" [34] occurring when using FGSM adversarial training schemes, justifying the use of the stepLL attack in training over the use of the FGSM attack. While we do not see evidence of label leaking for this dataset, we find that stepLL training yielded higher protection against iterative and FGSM attacks than FGSM training, which are the most dangerous attacks.
performance. Label leaking is described in [34] as when adversarial training with the use of ground truth labels in attacks such as FGSM [1] results in a trained model that tests better on adversarial data than non-adversarial data for an individual signal, with and without its added perturbation. Specifically, a label has leaked for a test signal if x * is classified correctly but x is not. Label leaking is not possible in our experiments since we disjoint crafting by discarding x when we craft x * , as in [2], which is one of the reasons we have used such a technique. However, we can still interpret the modulation classification PPV obtained on i.i.d. populations of adversarial and non-adversarial test signals to determine if models have been over trained with perturbations. This is because the intuition behind label leaking is that ground truth based attacks perform a deterministic transform on data that is highly correlated to the ground truth. As a consequence, if we define the PPV ratio of a model as the PPV of adversarial data divided by the PPV of non-adversarial data, then test sets with leaked labels will achieve a PPV ratio > 1.
To validate the presence and severity of label leaking in wireless experiments and contrast those findings with those in relatively high dimension, zero noise computer vision works [34], we implement the adversarial training methodology presented by Fig. 6 with FGSM attacks. In Table 3, we do not observe any evidence of label leaking, but we do see evidence that stepLL training resulted in more robust models against iterative and FGSM attacks than FGSM training.

D. EVALUATION OF MODELS TRAINED WITH ITERATIVE ATTACKS
Here we investigate the trade-space of computational cost and attack effectiveness against our protected model. In [34], the authors found that adversarial training with iterative attacks did not train models robust to iterative attacks. They hypothesized that they did not have the computational resources to train their Inception v3 [57] model on ImageNet [58] data with a large enough learning capacity to learn the complex distribution of iterative attacks. In [59], the authors reduced the computational cost of iterative Projected Gradient Descent (PGD) [34] attack training by generating Canadian Institute for Advanced Research (CIFAR)-10 and CIFAR-100 [60] adversarial perturbations during training by using the gradient computed for SGD, rather than TABLE 4. IterLL attacks are significantly more effective than stepLL attacks. StepLL training offer almost no defense against iterLL attacks. We are able to achieve iterLL trained models with a small level of defense against iterLL attacks, and higher defense against stepLL and FGSM attacks with no significant loss to non-adversarial performance.
re-computing. They achieve a moderate level of protection at a very low computational cost.
In this work, we performed iterLL adversarial training using a RML2018.01A inspired dataset to see what degree of protection we may obtain from iterLL and other attacks. We do so without the dual-use of the gradient as in [59] because crafting perturbations during training rather than after does not result in disjoint crafting as in [2]. Additionally, we hypothesized that our relatively low dimension data (i.e., 8192 features/example for the RML2018.01A inspired dataset versus 544509 average features/example for ImageNet [58]), relatively smaller model (i.e., 18.2 × 10 6 parameters in our VGG10 inspired model versus 24 × 10 6 parameters in Inception v3), and several years of computational resource advancements (i.e., Volta 100 versus Tesla K80 Graphics Processing Units (GPUs)) will render the dual-use unnecessary.
In Table 4, we observed that iterLL attacks are 206% more effective than stepLL attacks for our dataset, model, and attack parameters. Additionally, stepLL training offered no significant defense against iterLL attacks, prompting the need for an iterLL training scheme. The results of our iterLL training are very positive, showing an increased defense against all attacks without losing non-adversarial performance. Most notably, it is the only training scheme that achieved any level of protection against iterative attacks.

E. MODEL CAPACITY
Here we ensure that our protection scheme does not under fit because it lacks enough trainable parameters to learn both perturbed and non-perturbed data distributions. In other works [34], the authors were unable to find a model deep enough to over fit in the presence of adversarial training using the stepLL method. We scale model width by increasing the number of convolutional filters in every convolutional layer by a factor ρ. While our model utilizes batch normalization to some effect, we do not find dropout to improve test-stage PPV.
In this work, we investigated the effectiveness of stepLL adversarial training as a regularizer in wireless experiments. We hypothesized the relatively low dimension data, relatively small models, and several years of computational resource improvements will make it more feasible to scale to extreme ρ values. 5. Effect of model capacity on adversarial training, evaluated using physical test data. We find that adversarial training prevents overfitting from occurring when training our VGG10 model scaled by ρ = 4. We additionally find that stepLL perturbations crafted after adversarial training are more effective against deeper models, indicating a model capacity trade-off between non-adversarial and adversarial test classification PPV. Models that are too shallow additionally make lower confidence classifications than deep models, such that they are easier to fool. "Clean" is short hand for non-adversarial data.
In Table 5, we were able to scale ρ ∈ [0.5, 4] before running out of memory. We found that at ρ = 4 the nonadversarial trained VGG10 began to over fit to training data because it had a lower physical test data classification PPV than the ρ = 2 non-adversarial trained model. However, with adversarial training, the model is regularized and physical test data classificaiton PPV continues to increase with ρ. Additionally, deeper models were more vulnerable to adversarial perturbations, which can be explained by [1], where it was shown that FGSM perturbations increased the magnitude of activations by × L × M, where M is the average value of weights in a layer and L is the number of weights in a layer. We hypothesized that by increasing ρ, we are increasing L, such that perturbations, all else equal, will have a greater impact on classification PPV. We tested this hypothesis by computing the ratio of mean class score magnitudes between clean physical and stepLL physical test data for adversarial trained models with ρ = 1 and ρ = 4. We obtained resulting ratios of 0.39 and 0.33, failing to reject our hypothesis that perturbations increase the magnitude of class scores, on average, proportional to the number of weights in each layer of a CNN.
We observed the shallow ρ = 0.5 model is also more vulnerable to attacks. One potential explanation for this is it made lower confidence classifications that are easier to fool. To test this, we computed for physical test sets the average difference in class scores between the largest and second largest class scores for ρ = 0.5 and ρ = 1 adversarial trained models. We found that they had an average top (second top) class score difference of 71.97 (91.11), failing to reject our hypothesis that the shallow model makes less confident classifications.
Consequently, we determined that model width must be carefully managed in adversarial training schemes to ensure that the model is deep enough to learn the nonadversarial and adversarial datasets, deep enough to make high-confidence classifications that require large changes to class scores to cause false positives, and shallow enough as not to become vulnerable to the compounding attribute of attacks. Additionally, we concluded this trade-off is relatively advantageous for adversarial training of wireless spectrum sensing, signal classification, and modulation classification when compared to computer vision tasks, which tend to require much deeper models to learn relatively high dimension data distributions that have large state spaces.
These simulations and experiments yielded a number of findings and confirmations to the state-of-the-art, including: 1) Training a CNN offline using channel models can achieve high accuracy modulation classification performance on physical signals. 2) Physical Adversarial perturbations of a transmitter can reduce the classification accuracy of an eavesdropping receiver's trained ML classifier to as low as guessing despite phase, frequency, and amplitude noise sources from both the RFFE and the channel. 3) Adversarial training of the eavesdropping receiver using simulated channel models can achieve some level of defense against adversarial perturbations, where the best results are achieved when adversarial training is done using perturbations crafted from a fully trained, i.i.d. non-adversarial model. 4) Label leaking does not appear to occur in lowdimensional data domains. 5) Parallel and cascade adversarial training schemes over-emphasize adversarial examples during training, reducing testing accuracy for non-adversarial data. This defeats the primary objective of adversarial training, which is to increase robustness without sacrificing non-adversarial performance 6) A measure of protection against iterative attacks is possible with iterLL training. 7) The model width of the eavesdropping receiver must be carefully managed to achieve an "elbow" point in the trade-off between non-adversarial and adversarial test performance. Specifically, we found the CNN must be wide enough to make correct and high confidence classifications, wide enough to have the learning capacity for both adversarial and non-adversarial PDFs, and thin enough as not to compound the increase to the loss function caused by perturbations.