Physical Layer-Based IoT Security: An Investigation Into Improving Preamble-Based SEI Performance When Using Multiple Waveform Collections

The Internet of Things (IoT) is a collection of inexpensive, semi-autonomous, Internet-connected devices that sense and interact within the physical world. IoT security is of paramount concern because most IoT devices use weak or no encryption at all. This concern is exacerbated by the fact that the number of IoT deployments continues to grow, IoT devices are being integrated into key infrastructures, and their weak or lack of encryption is being exploited. Specific Emitter Identification (SEI) is being investigated as an effective, cost-saving IoT security approach because it is a passive technique that uses inherent, distinct features that are unintentionally imparted to the waveform during its formation and transmission by the IoT device’s Radio Frequency (RF) front-end. Despite the amount of research conducted, SEI still faces roadblocks that hinder its integration within operational networks. Our work focuses on the lack of feature permanence across time and environments, which is designated herein as the “multi-day” problem. We present results and analysis for six distinct experiments focused on improving multi-day SEI performance through multiple waveform representations, deeper Convolutional Neural Networks (CNNs), increasing numbers of waveforms, channel model impacts, and two-channel mitigation techniques. Our work shows improved multi-day SEI performance using the waveform’s frequency-domain representation and a CNN comprised of four convolutional layers. However, the traditional channel model and both channel mitigation techniques fail to sufficiently mitigate or remove real-world channel impacts, which suggests that the channel may not be the dominant effect hindering multi-day SEI performance.


I. INTRODUCTION
The Internet of Things (IoT) is a collection of inexpensive, semi-autonomous, Internet-connected devices that sense and interact within the physical world [1]. IoT security is of paramount concern because most IoT devices implement The associate editor coordinating the review of this manuscript and approving it for publication was Lukasz Wisniewski . weak or no encryption at all [2]. The use of weak or no encryption is attributed to (i) limited on-board resources such as memory and power; (ii) high manufacturing costs that prohibit the integration of effective encryption techniques on low-cost devices; and (iii) difficulties associated with the implementation and management of encryption at the scales often associated with large IoT deployments [3], [4]. The concern surrounding IoT security continues to grow as more VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and more IoT devices are integrated into key infrastructures such as the electric power grid, and abused by nefarious actors [5], [6], [7], [8], [9], [10], [11], [12], [13]. Recently, a physical layer technique known as Specific Emitter Identification (SEI) has received a lot of attention as a possible means through which IoT devices and their associated infrastructure can be secured. SEI was introduced almost thirty years ago to grant electronic warfare systems the capability of detecting, characterizing, and identifying radar systems using unintentional features present within their emitted signals. These features have been attributed to slight manufacturing and assembly process defects that exist within the components, subsystems, and systems that comprise a radar system's Radio Frequency (RF) front end. SEI features have been shown to be sufficient enough to achieve serial number discrimination (i.e., the emitters are all of the same manufacturer and model) without their presence negatively impacting normal system operations such as signal detection and demodulation. The success of radar-focused SEI has resulted in it being recommended as a physical layer mechanism for enhancing existing, higher-layer network security approaches such as username & password and MAC address filtering. SEI proves advantageous because: (i) it is a passive technique so the emitters are unaware they are being identified, (ii) it quantitatively measures the exploited features, (iii) it can ensure information is coming from known, authorized devices instead of unknown, unauthorized devices that can negatively impact the dependability and timeliness of communications within an IoT infrastructure [14], [15], and (iv) the exploited features are generated during normal emitter function so there is no need for modification, stimulation, or augmentation of the emitter to enable identification. The last advantage is ideally suited for IoT applications because it means that SEI-based security can be applied to even the simplest, lowest-cost devices and deployments.
Despite the amount of research conducted, SEI still faces roadblocks that hinder its adoption and integration within operational networks. Our work focuses on addressing one of these roadblocks as it relates to the lack of SEI exploited feature permanence across time and environments [16], [17]. We designate this lack of permanence herein as the ''multi-day'' problem. Currently, SEI processes are trained/developed using a single set of signals that are collected from each device during a single collection event. However, these SEI processes have been shown to perform poorly when tasked with identifying the same set of emitters using signals collected during a separate event conducted a day or days after the initial set of signals was collected and used to develop the SEI process. In other words, SEI does not currently support emitter discrimination using signals collected over multiple days when the process is trained using a single day's worth of signals. Although the time between collections can vary from hours to weeks, we designate all such cases as part of the multi-day problem. The contributions of this work are as follows.
• An analysis of time and frequency representations of preambles for improving multi-day SEI performance.
• An analysis of CNN network depth to maximize multi-day SEI performance.
• An analysis of using additive, white Gaussian noise to model wireless interference to improve multi-day SEI performance.
• An analysis of using traditional channel mitigation techniques to improve multi-day SEI performance.
The remainder of this paper is outlined as follows. Sect. II provides a brief description of related SEI works and how our work differs from them. Sect. III describes the signal of interest, signal collection, and post-processing procedures while Sect. IV describes the methodology followed to generate the results presented in Sect. V. Finally, the paper is concluded in Sect. VI.

II. RELATED WORKS
Performing SEI using waveforms collected over multiple days is relatively recent; thus, there are not many published investigations into the multi-day SEI problem. However, this section summarizes those investigations that considered multi-day SEI. The authors of [17] show that SEI performance degrades when using waveforms collected over multiple days and environments. The authors use two USRP B210 Software-Defined Radios (SDRs) to capture LoRa packets from sixty Pycom emitters transmitting at a carrier frequency of 915 MHz. When classifying the Fast Fourier Transform (FFT) coefficients, of the emitters' waveforms, an accuracy of 82% is achieved when the training and testing waveforms are collected on the same day. However, accuracy drops to just 7% and 5% when classifying waveforms from the second and third collections. When using the In-phase and Quadrature (IQ) samples an accuracy of 72% is achieved when classifying waveforms collected on the same day as those used for training and drops to 48% and 46% when classifying waveforms from the two subsequent collections. The authors propose the ''Tweak'' [18] and ''WideScan'' [19], [20] methods to improve classification performance across collections. However, the authors do not provide results supporting either approach's effectiveness.
The authors of [21] improve multi-day SEI by reducing the effect of the wireless channel on each emitter's set of SEI features. The channel is mitigated by implementing IQ balancing and equalization on the payload portion of a BPSK waveform. This increases SEI accuracy from 23% to 34% when training a network on fifty Wireless-Fidelity (Wi-Fi) emitters collected on a single day and tested using waveforms collected on a later date. These emitters are recorded across multiple Wi-Fi-compliant carrier bands and are sampled at 200 MHz. IoT devices may not be capable of achieving such a high sampling frequency. In this work, we address this by sampling the waveforms at a frequency of 20 MHz.

A. SIGNAL OF INTEREST
The signal of interest is the IEEE 802.11a Wi-Fi frame, which is an Orthogonal Frequency Division Multiplexing (OFDM) waveform that is capable of data rates as high as 54 Mbps and operates within the 5 GHz Industrial, Scientific, and Medical (ISM) band. The first 16 µs of every IEEE 802.11a Wi-Fi frame is dedicated to the transmission of a preamble that is used for time synchronization, channel compensation, as well as frequency and phase offset correction. Fig. 1 shows the structure of the 802.11a preamble, which is comprised of ten Short Training Symbols (STS) that are designated t 1 through t 10 , a Guard interval (GI), and two Long Training Symbols (LTS) that are designated T 1 and T 2 .
The use of IEEE 802.11a Wi-Fi is motivated by the facts that: (i) Wi-Fi, which includes 802.11a, is an IoTdesignated communications standard [22], [23], (ii) it is a preamble-based communications standard that facilitates SEI using a steady-state portion of the signal, (iii) allows the extension to other preamble-based IoT standards such as 802.11ax (a.k.a., Wi-Fi 6), ZigBee, Z-Wave, and Bluetooth, (iv) the IEEE 801.11a preamble and first 16 µs of the Ultra-Reliable Low Latency Communications (URLLC) compliant IEEE 802.11ax preamble are constructed using the same symbols [24], (v) it has been used extensively in our and other published SEI works [25], [26], [27], [28], [29], and [30], and (vi) we have an on-hand set of commercial-off-theshelf, IEEE 802.11a Wi-Fi compliant emitters. FIGURE 1. Block diagram of the IEEE 802.11a OFDM Wi-Fi preamble that includes ten short training symbols t 1−10 , a guard interval GI, and two long training symbols T 1,2 that occurs within 16µ s.

B. SIGNAL COLLECTION
The signal collection is performed in an open-air laboratory over a period of eight, consecutive weeks. Signals are collected from thirty-two TP-Link Archer T3U USB Wi-Fi emitters that each transmit the same 2 Gb binary file over an IEEE 802.11a Wi-Fi connection. Each week an Ettus Research USRP B210 collects four seconds worth of transmissions from each TP-link emitter at a sampling rate of 40 MHz. Once collected, each recording is gated to remove the channel-only segments that exist between transmissions. Initially, the preamble of every transmission is coarsely detected by finding every point in the recording where the magnitude changes from zero to greater than zero (i.e., the start of each transmission). Once coarsely detected, the preamble-along with thirty-two samples before and after the preamble-is extracted and stored for subsequent fine preamble detection.

C. FINE PREAMBLE DETECTION
After course detection, each preamble is finely detected using the Mean Squared Error (MSE). In this case, the magnitude representation of an ideal 802.11a preamble is slid across the coarsely detected preambles' magnitude representations and MSE is calculated. Fine detection using the magnitude representation is advantageous because it is not negatively impacted by the presence of Carrier Frequency Offset (CFO), which can adversely impact the average percent correct classification performance of amplitude-and phase-based detection approaches. A preamble is detected when the MSE is at its minimum. For each emitter, this process is repeated for all transmissions within its corresponding collection records and ten thousand preambles are retained. The retained preambles correspond to the smallest MSE values for that emitter.

D. CARRIER FREQUENCY OFFSET CORRECTION
CFO is estimated using the process stated in [31] in which the offset is calculated by, where R 1 and R 2 represent the Fourier coefficients of the first and second training symbols consecutively selected from the received waveform r[n], respectively. N is the length of the training symbol, and * represents the complex conjugate. Once calculated, CFO present in the j th received waveform of emitter e is corrected by, All waveforms undergo a two-stage CFO compensation process. The first stage performs course CFO correction using equation (1) and the Fourier coefficients of the eighth, t 8 , and ninth, t 9 , STS. The second stage performs fine CFO compensation using both LTS sequences, T 1 and T 2 , in lieu of t 8 and t 9 . Following the course and fine CFO compensation, all waveforms are filtered using a fourth-order elliptical filter that has a passband ripple of 0.5 dB, stopband attenuation of 20 dB, and corner frequency of 8.865 MHz. Filtering after CFO correction is advantageous because CFO manifests as a shift in the center frequency of the waveform's Power Spectral Density (PSD). This shift causes the PSD to be misaligned with the filter's passband, which can result in unnecessary coloration being applied that can negatively impact SEI performance [32]. All waveforms are then normalized to unit energy [28] by,r e,j (n) = r e,j (n)  where N p is the number of samples in the CFO-corrected preamble r e,j (n), and the result stored for experimentation.

IV. METHODOLOGY A. NOISE SCALING AND WAVEFORM FILTERING
Wireless channel interference is modeled using complexvalued Additive White Gaussian Noise (AWGN) that is scaled to achieve the desired Signal-to-Noise Ratio (SNR). The noise is scaled by multiplying it by, where δ z is the desired SNR in decibels. The waveform at the desired SNR is expressed as, where n = 1, 2, . . . , N p and w(n) is AWGN. The resulting waveform, r χ e,j (n), is filtered using the same elliptical filter used in Sect. III-D and normalized to unit energy. This process is repeated ten times for all waveforms to ensure all, presented numerical results are generated using Monte Carlo methods.

B. PREAMBLE REPRESENTATIONS
SEI is performed using either a time or frequency domainbased 802.11a preamble representation that are adopted from our published work in [26] and [28]. The time domain representation is, where i is the in-phase component of the complex-valued preamble, q is the complex-valued preamble's quadraturephase component, • λ is the real part of the complex-valued preamble's natural logarithm that captures magnitude, and the phase of the complex-valued preamble is captured by the imaginary part of its natural logarithm and is defined here as θ. The frequency domain representation is, where I is the Fourier coefficients' real component, Q is the Fourier coefficients' imaginary component, • and are the Fourier coefficients' magnitude and phase, respectively.

C. EXPERIMENTS CONDUCTED
A total of six multi-day SEI experiments are conducted using Convolutional Neural Network (CNN) architectures. Our use of a CNN is motivated by it being the most common DL architecture used in SEI experimentation [18], [19], [20], [21], [25], [30] as well as its use in our prior works [26], [27], [28], and [29]. The remainder of this section describes each experiment and provides the details needed to replicate them.

1) EXPERIMENT #1: PREAMBLE REPRESENTATION SELECTION
This experiment focuses on determining the preamble representation combination that results in the highest multi-day emitter identification performance. A total of two-hundred and fifty-five combinations are tested. Every preamble representation-described in Sect. IV-B-is investigated to determine the representation that maximizes multi-day SEI performance. Based on equations (6) and (7), there are , and . This investigation is performed by adopting an n-choose-k preamble representation selection approach in which n=8 and k ranges from one to eight. This generates a total of two hundred and fifty-five preamble representation combinations. These combinations, ranked from highest to lowest multi-day average percent correct classification performance, are shown in Table 2. The CNNs used to conduct this experiment are configured such that their convolutional filter (a.k.a., kernel) size matches the selected combination value k. Other than this experiment-specific configuration, the CNNs are as described in Sect. IV-C2.

2) EXPERIMENT #2: INCREASING CNN DEPTH
This experiment investigates whether or not increasing CNN depth improves multi-day SEI performance. This experiment uses CNNs based on the architecture shown in Table 1 with depths of one, two, four, eight, sixteen, or thirty-two stacked convolutional layers. The first, second, and third convolutional layers of all CNNs are constructed using a total of sixtyfour, thirty-two, and sixteen [4 × 4] filters, respectively. All remaining layers are constructed using eight filters. CNNs constructed using one, two, four, and eight stacked convolutional layer(s) use a maximum pooling (a.k.a., MaxPooling) window size of [2 × 2], a stride of two, and a rectified Linear Unit (reLU) layer. Only the first eight of the sixteen and thirty-two convolutional layer CNNs are activated. Every CNN-regardless of depth-is trained using a single noise realization of the preambles collected during Week #1 at an SNR of 30 dB. For this experiment, the preambles are represented using equation (7), Sect. IV-B.

3) EXPERIMENT #3: INCREASING PREAMBLES AND NOISE REALIZATIONS
This experiment assesses multi-day SEI performance when CNNs are trained using multiple noise realizations and an increasing number of real-world collected preambles. One-, two-, five-, and ten-thousand preambles are randomly chosen from each emitter. For each set of preambles, a total of one, two, four, eight, and sixteen noise realizations are generated. All results are generated at an SNR of 30 dB using a fourlayer CNN, Sect. IV-C2, and the frequency representation P f , Sect. IV-B.

4) EXPERIMENT #4: AVERAGING STS SEQUENCES
This experiment attempts to improve multi-day SEI performance by reducing cross-week channel effects through the use of one of four average STS methods. The average STS is calculated over the third, t 3 , through ninth, t 9 , STS of the j th received preamble, r χ e,j , of emitter e. Average STSbased multi-day SEI is performed using one of the following four cases: (i) the average STS replaces the third through ninth STS in the corresponding preamble, (ii) the average STS is replicated N S (a.k.a., the number of STS used to compute the average) times and the result replaces the third through ninth STS sequences in the corresponding preamble, (iii) only the N S replicated, average STS is used for SEI, and VOLUME 10, 2022 • , or . The presence of a specific representation-within a combination-is indicated using a '' ''. Combinations are ranked from highest (a.k.a., Combination #1) to lowest (a.k.a., Combination #255) Average Week Classification (AWC) performance (%).   • , or . The presence of a specific representation-within a combination-is indicated using a '' ''. Combinations are ranked from highest (a.k.a., Combination #1) to lowest (a.k.a., Combination #255) AWC performance (%).
(iv) no averaging is performed (i.e., the original preamble). For Method #3 the first and tenth STS along with the GI and both LTS sequences are discarded and not used for multiday SEI. In other words, Method #3-based multi-day SEI makes use of only the highlighted portion-enclosed in a dashed line box-of Method #2. Fig. 2 provides a representative illustration of the STS sequences used to calculate the average STS and implement Method #1, Method #2, and Method #3. Due to the 20 MHz sampling frequency, the preamble r χ is comprised of three-hundred twenty, complexvalued samples. Method #2 and Method #4 use preambles comprised of three-hundred twenty, complex-valued samples. However, Method #1 and Method #3 use preambles or preamble regions comprised of two-hundred forty and ninety-six complex-valued samples, respectively. For these two methods, the result has zeros appended to its end to reach a length of three-hundred twenty complex values. For this experiment, multi-day SEI is performed using frequency representation P f , Sect. IV-B, and the four-layer CNN in Sect. IV-C2.

5) EXPERIMENT #5: STS AND LTS SYMBOL RESIDUAL
This experiment attempts to improve multi-day SEI performance by reducing cross-week channel effects through the use of ''residual'' preambles. A residual preamble is generated by dividing the complex coefficients of STS t 1 through t 5 by STS t 6 through t 10 and LTS T 1 by LTS T 2 . The results of both divisions are then concatenated to form a residual preamble. The residual has a sample length of one hundred forty-four complex coefficients. The result is padded to have an overall sample length of three-hundred twenty before the frequency representation P f is calculated. This is done to ensure fairness when comparing multi-day SEI performance to the original three-hundred twenty sample length preamble. The multi-day SEI results associated with this experiment are generated using the four-layer CNN in Sect. IV-C2.

6) EXPERIMENT #6: TRAIN USING MULTIPLE DAYS OF COLLECTED SIGNALS
For the sixth and final experiment, multi-day SEI performance is assessed using CNN training sets comprised of VOLUME 10, 2022 preambles selected from more than one of the weekly collections. The intent is for the CNN to learn a set of SEI features that facilitate emitter discrimination regardless of when the set of to-be-classified preambles are collected. In other words, this experiment investigates whether or not a CNN can be trained using preambles collected Week #1 through Week #N w -where N w is greater than 1 and not equal to 8-and classify Week #8 collected preambles with the same AEC as it classifies the test set of preambles associated with training weeks' collections. This experiment uses the preambles' frequency representation P f , Sect. IV-B, and a four-layer CNN, Sect. IV-C2.

D. DEEP LEARNING CONFIGURATION 1) CONVOLUTIONAL NEURAL NETWORK PARAMETERS
All CNNs are trained using a learning rate of 0.01, a gradient threshold of 1, ADAM optimization, five hundred training epochs, and a mini-batch size of fifty-thousand predictors. Additionally, all CNNs are randomly initialized using the same random seed value(s) to ensure fairness across all experiments. Batch normalization is implemented directly after each convolutional layer in every CNN to enable multi-Graphical Processing Unit (GPU) distributed learning. Please note that batch normalization is not required when training on a single node.

2) COMPUTER ARCHITECTURE
All results are generated using MATLAB R R2022b. Training and testing is conducted across four Dell PowerRidge server nodes, each with dual Epyc 7662 64C/64T 64-bit CPUs, 512 Gb of DDR4 EEC RAM, and a Nvidia R A100 80Gb PCI-E GPU accelerator running CUDA 11.7. The operating system is RedHat Linux 3.10.0-1160.21.1.e17.x86_64.

V. RESULTS
This section provides the results generated for each of the six experiments detailed in Sect. IV-C. For Experiment #1 through Experiment #5, training is performed using the preambles collected during Week #1 (a.k.a., Day Zero). The only SNR level used for training and testing is 30 dB. When testing each case, ten noise realizations are generated for each week's preambles and multi-day performance presented using average percent correct classification performance. Results are presented using either Average Emitter Classification (AEC) or Average Week Classification (AWC) performance. AEC performance is the average percent correct classification performance calculated across all emitters-using the corresponding label predictions-of a selected week. AWC performance is the average AEC calculated across all eight weeks. For the results presented herein, all tables show AWC and all figures include the AEC and AWC.
A. RESULTS FOR EXPERIMENT #1 Table 2 presents AWC performance-calculated across all emitters and each week's set of preambles-for each of the two hundred fifty-five preamble representation combinations generated using the n-choose-k selection approach in Sect. IV-C1. The presence of a specific preamble representation (a.k.a., i, q, • λ , θ, I , Q, • , or ) within each combination is denoted using a '' '' under that representation's symbol, Sect. IV-B. Combinations are ranked from highest (a.k.a., Combination #1) to lowest (a.k.a., Combination #255) AWC performance. The highest ranked preamble representation combination consists of i, q, • λ , I , Q, and • (a.k.a., all but the time-and frequency-representation phase components are present) and achieves an AWC performance of 19.33%. Use of only the four frequency components (a.k.a., I , Q, • , and ) result in an AWC of 18.56%, which corresponds to the nineteenth highest, overall AWC, an AWC that is only 0.75% lower than Combination #1, and a 33% data reduction. In fact, all of the highest AWC combinations (a.k.a., Combination #88 and higher) are associated with frequency representation components and without a high reliance (i.e., use of three or more components) upon time representation components. This is shown by the I and Q representation being present in nine of the top ten ranked combinations, and • being in eight of the top ten ranked combinations. However, • λ is present in only seven of the top ten ranked combinations, and q is in five of the top ten ranked combinations. Additionally, θ is not included in any of the top ten ranked combinations. This suggests that the preambles' time representations do not provide discriminating features that are beneficial for multi-day SEI. Use of only the time representation (a.k.a., i, q, • λ , and θ) results in an AWC performance of 16.41%, which corresponds to a rank of one-hundred and ninety-eight out of two hundred fifty-five.
Based upon the Experiment #1 results and analysis, all remaining experiments are conducted using only the preambles' frequency representations comprised of I , Q, • , and . . Experiment #3: AEC performance when a four convolutional layer CNN-trained using the Week #1 preambles' frequency representations, 1,000, 2,000, 5,000, or 10,000 (using ''e3'' to denote thousand) preambles per emitter with one, two, four, eight, or sixteen noise realizations per preamble-classifies 10,000 preambles per emitter with ten noise realizations per preamble collected on Week #1.

B. RESULTS FOR EXPERIMENT #2
When considering CNN of increasing depth, the AWC performance results-presented in Table 3-show that an eight convolutional layer CNN achieves an AWC performance of 22.81% that is the highest amongst all considered CNN depths. When sixteen and thirty-two convolutional layers are used, the AWC performance decreased to 16.81% and 22.59%, respectively. Four convolutional layers achieve an AWC performance of 22.25%, which is 0.56% lower than the eight convolutional layers CNN. However, only 972.12 seconds is needed to train the four convolutional layer CNN versus the 1, 261.19 seconds needed to train the eight convolutional layer CNN; thus, reducing the training time by 23%. Fig. 3 shows a decrease in AEC performance as the number of convolutional layers increases from eight to sixteen and onto thirty-two. For the one and two convolutional layers CNNs, the results suggest that the trainable parameters are insufficient to learn the discriminating features necessary for multi-day SEI. Conversely, there is not enough training data to sufficiently tune all of the sixteen and thirty-two convolutional layer CNNs' trainable parameters to achieve effective multi-day SEI. Based upon the Experiment #2 results and observations, all subsequent experiments are performed using CNNs comprised of four convolutional layers. Fig. 4 shows AEC performance when a four convolutional layer CNN is trained using the frequency representation, equation (7), of Week #1's preambles as their numbers increase from 1,000 to 10,000 and one, two, four, eight, or sixteen noise realizations are generated per preamble. Regardless of the number of preambles per emitter or number of noise realizations per preamble used, all Experiment #3 results are generated by classifying 10,000 preambles per emitter with ten noise realizations per preamble. Fig. 4 shows that a CNN trained using 1,000 preambles with one noise realization per preamble achieves an AEC performance of 55%. However, the AEC performance reaches a maximum of 61.43% when sixteen noise realizations are generated for each of the 1,000 preambles per emitter collected Week #1. Fig. 4 shows that AEC performance increases when the number of preambles per emitter is increased. For instance, the AEC performance is 64.77% when using 2,000 preambles per emitter with one noise realization per preamble. This is a 3.43% increase in AEC performance and only  one-eighth of the data when compared to the 1,000 preambles per emitter with sixteen noise realizations per preamble case. When using 2,000 preambles per emitter and two, four, eight, and sixteen noise realizations per preamble, AEC performance increases to 68.46%, 69.73%, 71.13%, and 72.59%, respectively. Increasing the number of noise realizations per preamble to sixteen results in a maximum AEC performance improvement of 7.82% when using 2,000 preambles per emitter; however, it is at the expense of a sixteen-fold increase in the amount of data being stored and processed. It is clear from Fig. 4 that increasing the number of preambles per emitter provides the greatest improvement in AEC performance. When using one noise realization per preamble, increasing the number of preambles per emitter-from 1,000 to 10,000increases AEC performance to a maximum of 93.34%, which is a 38.34% increase (55% versus 93.34%). This maximum is 10.71% higher than the 82.43% AEC performance achieved using 5,000 preambles per emitter with sixteen noise realizations per preamble. For the 10,000 preambles per emitter case, AEC performance increases to 95%, 96.02%, 96.99%, and 97.1% when combined with two, four, eight, and sixteen noise realizations per preamble, respectively. The difference between the eight and sixteen noise realizations per preamble results is marginal and within the statistical error of the experiment; thus, the use of eight noise realizations per preamble is beneficial because it cuts the data set size in half without severely degrading performance.

C. RESULTS FOR EXPERIMENT #3
Since 10,000 preambles per emitter results in the highest AEC performance, all AWC performance results are generated using the same number of preambles per emitter.  Table 4 provides AWC performance when 10,000 preambles per emitter are combined with one, two, four, eight, or sixteen noise realizations per preamble. AWC performance is 19.75% with a training time of 964.61s when using one noise realization per preamble. When the number of noise realizations per preamble is increased to two, four, eight, and sixteen the AWC performance increases to 20.31% (a 0.53% increase), 22.25% (a 2.5% increase), 22.81% (a 3.06% increase), and 22.59% (a 2.84% increase), respectively. However, this increase in AWC performance comes at the expense of increased training times. Doubling the number of realizations in the training set roughly doubles the training time of the network. The longest training time-of 18,097.8 seconds (a.k.a., 301.63 minutes or just over 5 hours)-corresponds to the sixteen noise realizations per preamble case that achieves only a negligible improvement in AWC. Overall, the results in Table 4 show that increasing the number of noise realizations does not appreciably increase multi-day SEI performance; thus, the remaining three experiments are conducted using  10,000 preambles per emitter with one noise realization per preamble.

D. RESULTS FOR EXPERIMENT #4
AEC performance results are presented in Fig. 5 for each of the eight weeks of collected preambles classified by a four convolutional layer CNN trained using Week #1 preambles. Recall that each emitter is represented using the frequency representation of their 10,000 preambles with one noise realization per preamble. These results show that Week #1 AEC performance is 96% for Method #1 (red circle), Method #2 (green triangle), and Method #4 (black diamond) while Method #3 (blue square) results in an AEC performance of 91%. When classifying Week #2 preambles, Method #4 results in the highest AEC performance of 6.74%, which is only 2% higher than the three average STS methods' AEC performances and 1% above the ''guess threshold'' of 3.13%.
Week #3 is the only one in which Method #4 results in an AEC performance of 16.63% that is distinctly higher than those of the three average STS methods. Classification of Week #4, Week #5, and Week #7 collected preambles results in Method #4 achieving the highest AEC performance of 8.52%, 17.85%, and 10.31%, respectively, which is only 1% higher than the AEC performances of the three average STS methods. When classifying Week #6 and Week #8 collected preambles, Method #3 results in an AEC performance of 14.38% and 10.13%, respectively, which is only marginally higher than the performances of the other three.
AWC performance is presented in Table 5, which shows that Method #4 (a.k.a., the original preambles without any averaging) achieves the highest performance with an AWC of 21.53%. Method #1, Method #2, and Method #3 achieve AWC performances of 20.8%, 20.48%, and 20.32%, respectively. These results all fall within each other's margin of error.
Overall, Experiment #4 results show that the averaging of STS sequences may be doing more harm than good in terms of improving multi-day SEI performance. This is attributed to the possibility that averaging across STS sequences may remove STS-specific coloration that is essential to discriminating one emitter from another rather than mitigating channel effects that hinder multi-day SEI performance. Based upon these observations as well as the removal of a preprocessing step, no average STS methods are performed in Experiment #5 and Experiment #6.

E. RESULTS FOR EXPERIMENT #5
Experiment #5's AEC performance results are shown in Fig. 6. These results show-with the exception of Week #2that the use of the residual (red circle) preamble does not improve multi-day SEI performance. This is most evident when looking at Week #1 results, which show an AEC performance of 35% and 96.5% when using the residual and ''no residual'' (a.k.a., collected, black diamond) preambles. Recall that the four convolutional layers CNN is trained using only Week #1 collected preambles; thus, these results correspond to CNN classification of preambles drawn from the same collection used to train it and should result in the highest AEC performance when compared to the remaining seven weeks' AEC performances (i.e., these results represent the ''best'' case scenario within multi-day SEI). AWC performance results are presented in Table 6 and show that performance decreases from 20.71% to 11.62% when using the residual preamble in lieu of the no residual (a.k.a., collected) preamble. The results presented in Fig. 6 and Table 6 confirm that the use of the residual preamble hinders same-day SEI while providing no benefit to addressing the multi-day SEI problem. It is for these reasons that Experiment #6 is performed without the use of the residual preambles. Fig. 7 shows both AEC and AWC performance for a four convolutional layer CNN trained using one, two, three, four, five, six, or seven weeks of collected preambles and tested using the remaining weeks' collected preambles. For example, when the CNN is trained using Week #1, Week #2, and Week #3 collected preambles, it classifies preambles collected during Week #4, Week #5, Week #6, Week #7, and Week #8. It is important to note that this experiment focuses on whether or not a CNN can be trained using preambles collected Week #1 through Week #N w -where N w is greater than 1 and not equal to 8-and classify Week #8 collected preambles with the same AEC as it classifies the test set of preambles associated with the training weeks' collections. Thus, the remainder of this section focuses on the analysis of Week #8 classification performance. When trained using only Week #1 collected preambles, classification of Week #8 collected preambles results in an the AEC performance of 8%. Adding Week #2 collected preambles to the CNN's training set results in an AEC performance of 15%. Similarly, when Week #3, Week #4, and Week #5 collected preambles are added to the training set, then the AEC performance increases to 19%, 24%, and 31%, respectively. However, adding preambles collected during Week #6 and Week #7 does not increase AEC performance. This might be the maximum generalization that can be achieved by the selected CNN architecture. In fact, Week #8's AEC performance decreases to 28% when the training set is comprised of seven weeks worth of collected preambles. This suggests that the CNN could be overfitting and losing its general representation of each emitter's discriminating SEI features.

F. RESULTS FOR EXPERIMENT #6
AWC performance linearly increases for each week represented within the CNN's training set, Table 7. However, this increase in AWC performance comes at the cost of increased training times, which is roughly an additional 1,000 seconds for every week added to the training set.

VI. CONCLUSION
The work presented in this paper shows that multi-day SEI performance improves when using the preambles' frequency-domain representation-in lieu of the time-domain representation-and four convolutional layers CNN. The latter provided a sufficient number of trainable parameters to maximize multi-day SEI performance without providing so many that there is insufficient data or time to support the learning of emitter-specific features. This work also shows that an AWGN channel model is an insufficient representation of real-world channels when it comes to SEI performed over multiple collections or days. If this were not the case, then one would expect multi-day SEI performance to significantly improve as more and more noise realizations are added to the training set. However, this is not the case and in fact, SEI performance sees greater improvement when the number of preambles per emitter is increased versus when noise realizations increase. The work also shows that the average STS and residual preamble channel mitigation approaches do not mitigate or remove the real-world channel impacts present in the collected preambles. Our results suggest that the channel may not be the dominant effect hindering multi-day SEI performance. Therefore, future research will continue to investigate possible methods for isolating channel agnostic emitter-specific features as well as determining whether or not the channel is indeed the cause of degraded SEI performance across multiple collections or days. Additionally, since SEI is being put forward as an IoT security solution, future research will investigate more computationally efficient and less memory-intensive machine learning approaches.