Channel-Agnostic Radio Frequency Fingerprint Identification Using Spectral Quotient Constellation Errors

Radio frequency fingerprint identification (RFFI) is a physical layer security methodology to recognize individual devices by leveraging hardware imperfections inevitably induced in the manufacturing process. However, the performance degradation caused by the time-varying channel impacts and interferences has severely restricted the development of RFFI. To this end, we present a channel-agnostic RFFI system, which consists of three modules, i.e., signal preprocessing module, feature extraction module, and classification module. In the signal preprocessing module, we first propose a novel approach, referred to as limiter-based spectral circular shift bidirectional division (LB-SCSBD), to generate two parallel spectral quotient (SQ) sequences. Then, we define the spectral quotient constellation (SQC) symbols according to different modulation formats, and thereby transform the SQ sequences into four magnitude-based sequences in terms of two channel-robust signal representations, i.e., the SQ magnitude (SQM) and SQC error vector magnitude (SQC-EVM). In the feature extraction module, we present a moment-based statistical feature extractor (MB-SFE) to extract the device-specific information from the above four sequences. In the classification module, the extracted statistics are fed into the multi-class support vector machine (SVM) for training and testing. We take WiFi as a case study and evaluate the performance of the proposed RFFI system by classifying eight simulated device models and six universal software radio peripheral (USRP) transmitter radios. Experimental results show that (i) the proposed method achieves the accuracies of 99.84% and 98.26% with eight devices in QPSK and 16QAM cases, as well as the accuracy of 92.42% with six USRP devices (ii) the proposed method exhibits superior classification performance in comparison to some existing RFFI methods, leading to a significant accuracy improvement of at least 38.33%.


I. INTRODUCTION
I N RECENT years, the Internet of Things (IoT) has gained great popularity and achieved unprecedented growth in both the number and variety of applications, such as connected healthcare, smart home and industrial control [1], [2].With the explosive growth of IoT device numbers, safeguarding IoT systems in wireless connectivity will be accompanied by more challenges.Conventional authentication techniques including cryptographic schemes on software addresses and pre-shared keys are effective strategies for physical layer security authentication [3].However, cryptography-based authentication techniques usually consume massive computing resources, which makes them difficult to deploy in the limited power and computation resources, such as IoT devices, and their effectiveness can be impacted by robustly detecting and revoking compromised keys [4].
Radio frequency fingerprint identification (RFFI) has emerged as an effective physical layer security methodology, which employs the distinctive transmitter imperfections extracted from the received signals to recognize individual devices.Since the hardware imperfections are unintentionally introduced in the manufacturing process, the radio frequency fingerprints (RFF) resulting from them are nearly impossible to mimic.For this reason, RFFI has attracted great interest and has been widely investigated in WiFi [5], ZigBee [6], LoRa [7], and Bluetooth [8].
Generally speaking, RFF can be extracted from both the transient and steady-state portions of a signal.The corresponding transient-based method involves recognizing distinctive RFF presented in the transient turn-on waveforms.The challenging issue is how to properly capture the transient signal portion in a short time [9], [10], [11].In contrast, the steady-state signals are comparatively simple to capture and detect.Therefore, RFFI based on the steady-state signals has been investigated in many works [12], [13], [14], [15], [16], [17].Since a vast majority of existing wireless communication systems send the preambles for synchronization, the attention to feature extraction is initially transferred into the preamble of the steady-state signals.In [12] and [13], the mean, variance, skewness, and other statistics extracted from the time-frequency analysis of preambles are utilized as the discriminative features for identification.Subsequently, the RFF research on the payload instead of the preamble has been a hotspot.According to the literature [4], [14], [15], Fig. 1.The flow chart of the WiFi datasets generation and identification.[16], and [17], the synchronization correlation, mixer offset, constellation error as well as some statistics are employed as distinct RFF and achieve significant classification performance.Moreover, other RFFI methods like the deep neural network (DNN) and conventional neural network (CNN) also have been conducted in many works [18], [19], [20], [21], [22], [23], [24], as this end-to-end approach can directly process the raw signal and make predictions without feature engineering.However, these approaches require intensive computational complexity and have poor generalizability.Considering the limited computation resources on the low-cost IoT devices, our goal is to extract the handcraft features from the payload as the distinct RFF for device recognization.
A major challenge for RFFI is that the time-varying channel effects can result in unreliable classification performance.At present, most current RFFI works only consider the noise effects without the channel or simply assume the static channels in the controlled environment [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], and only a few works have considered the time-varying impacts of the multipath fading channel [26], [27], [28], [29], [30].For instance, Tugnait added the Gaussian artificial noise to the received signal in order to compensate for the channel changes [26].Besides, Zhou et al. also proposed an artificial noise adding algorithm to improve the classification accuracy by regularization and channel adaptation [27].However, the level of artificial noise required to add is still uncertain.In [28], Sankhe et al. proposed the ORACLE framework to mitigate the channel effects through the undercomplete demodulation approach.However, this method requires channel estimation and equalization, which can induce extra errors and additional computational complexity.Shen et al. in [29] first employed the short-time Fourier transform (STFT) to construct the channelindependent spectrogram, and then fed it into the CNN for devices recognization.Their RFFI framework successfully achieved excellent classification performance and effective channel mitigation.However, this method only focuses on the preambles and neglects the phase information of the spectrogram.In our prior work [30], we attempted to use the signal preprocessing method named spectral circular shift division (SCSD) to generate the channel-robust spectral quotient (SQ) signals.However, the SQ signals generated by the SCSD method fluctuate heavily, and this decreases the stability of the extracted RFF as well as degrades the classification performance.
In this paper, a channel-agnostic RFFI system is designed, which consists of three modules, i.e., signal preprocessing module, feature extraction module, and classification module.To combat the time-varying channel effects, our approach first converts the received signals to other channel-robust representations in the signal preprocessing module and then uses the moment-based statistical feature extractor (MB-SFE) to extract the device-specific RFF in the feature extraction module.After that, the extracted feature samples are fed into the multi-class support vector machine (SVM) for training and testing in the classification module.During the training stage, we train the SVM using the feature samples without any channel effects.During the testing stage, we evaluate the classification performance of the trained SVM using the feature samples extracted under different channel conditions.In our experimental evaluation, we take WiFi as a case study and employ eight simulated device models and six universal software radio peripheral (USRP) transmitters (in an open dataset) for classification.The main contributions of this work are summarized as follows: • We propose a novel approach, referred to as limiter-based spectral circular shift bidirectional division (LB-SCSBD), to generate two parallel SQ sequences in the submodule of the signal preprocessing module.Moreover, we show that the proposed RFFI system using the LB-SCSBD method can enhance the classification accuracies of 14% -47% in comparison to that using the SCSD method at SNR = 27 dB.In comparison to the RFFI methods given in [17] and [30], our method exhibits the best classification performance, with at least 38.33% accuracy improvements when the signal-to-noise ratio (SNR) level is equal to 30 dB.Moreover, the proposed RFFI system can achieve the accuracies of 99.84% and 98.26% with eight devices in QPSK and 16QAM cases at SNR = 32 dB, as well as the accuracy of 92.42% with six USRP devices in the open dataset.The rest of the paper is organized as follows.Section II details the generation of the WiFi datasets used in our experiments.The identification process of the proposed channel-agnostic RFFI system is briefly given in Section III.In Section IV, we first introduce the experimental setup and then analyze the experiment results of the proposed RFFI system.Finally, we conclude this paper in Section V.

II. DATASET GENERATION AND SIGNAL MODEL
As shown in Fig. 1, the overall work can be divided into two steps: the WiFi datasets generation and the identification process.In this section, we first introduce the generation of the simulated datasets, where the standard-compliant IEEE 802.11aWiFi frames are generated as the transmitted signals.Then, we give the impairments modeling of the transmitter with a special focus on the in-phase (I) and quadrature (Q) imbalance, power amplifier (PA) nonlinearity, frequency and phase mismatch, typically seen in actual hardware implementations.Finally, the received signal model is given, where the carrier frequency offset (CFO) estimation and correction are performed to decrease the impacts of oscillator imperfection.The detailed operations are provided in the following.

A. WiFi Frame Structure
Fig. 2 shows the IEEE 802.11aWiFi OFDM frame structure [28], which consists of a legacy short training field (L-STF, 8 microseconds, i.e., µs), legacy long training field (L-LTF, 8 µs), legacy signal field (L-SIG, 4 µs), and data field.The data field contains K random OFDM symbols and each OFDM symbol lasts for 4 µs.The L-STF is primarily used for coarse CFO estimation, while the L-LTF is mainly used for fine CFO estimation.
For simplicity, the WiFi signals are represented in the complex form as follows where x I (t) and x Q (t) denote the WiFi signals on the I and Q branches, respectively; T (in µs) is the duration of each WiFi full frame.

B. IQ Imbanlance
Quadrature mixers are used for upconversion and are often impaired by IQ imbalances, which is one of the main aspects of the transmitter's impairments.Considering the distortion Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.caused by the IQ imbalances, the corrupted baseband signal can be modeled as where θ tx (in rad) is the phase mismatch; g tx I and g tx Q denote the gain on the I and Q branches, respectively.As referred from [24], the IQ gains in the linear scale can be denoted as where G tx (in dB) is the gain imbalance.

C. Power Amplifier Nonlinear Distortion
Generally speaking, a power amplifier in a communication system is used to boost a signal to a power level suitable for transmission.Due to the demand for PA efficiency, nonlinearity is caused in the process of power amplification and plays a key role in the generation of transmitter imperfection.Considering the memoryless nonlinearity caused by PA, the distorted baseband signal can be expressed as where y(t) is the PA output at t time, and F (•) is the power amplifier transfer function.Several memoryless PA models including the Saleh model, Rapp model, and polynomial model are employed, and the detailed descriptions are given in Appendix A.

D. Received Signal Model
The transmitted signal attached by the distinct RFF will be captured at the receiver after passing through the wireless channel.Due to the frequency mismatch between the transmitter and receiver, CFO and phase offset (PO) occur in the process of downconversion.Hence, the continuously received baseband signal interrelated with the transmitter RFF and these time-varying distortions can be represented as (6) where B (in MHz) is the transmission bandwidth and ε is the normalized CFO with respect to B; Φ is the PO within [−π, π]; I is the maximum channel delay taps and τ i denotes the channel delay of the i th tap; h τi (t) is the channel coefficient of the i th delay tap at t time; w(t) is the additive white Gaussian noise (AWGN) and w(t) ∼ CN (0, σ 2 n ).Synchronization is often employed to detect the accurate start of the received packet so that we can extract the signal of interest easily with the prior information of the signal configuration.The well-known Schmidl-Cox algorithm [31] can be implemented when there is a need for synchronization.Since synchronization is unnecessary in the simulated WiFi frames (because the start of the WiFi signal is already known in the simulated cases), we straightly divide the received WiFi frame into two parts: preamble and OFDM data.As referred from the literature [32], we can use the L-STF (t ∈ [0, 8)µs) and L-LTF (t ∈ [8, 16)µs) signals in the preamble for the coarse and fine CFO estimation according to the conventional two-step CFO estimator.Assuming the overall estimated CFO is ∆f , after performing the CFO correction, the corrected baseband signal without oversampling in the data field can be expressed as where n is the discrete-time index of the sampling signal and n = (t − 20)B (i.e., the start of the data field); N is the length of the sampling signal in the WiFi data field; Φ ′ denotes the residual PO; ε ′ is the normalized residual CFO and ε ′ = ε − ∆f /B; τ ′ i is the discrete-time channel delay of the i th path and h τ ′ i (n) is the corresponding n th channel coefficient; ŵ(n) is the AWGN after performing the CFO correction.
After performing the above operations, the simulated WiFi frames will be kept in several datasets according to the channel conditions and modulation formats.Meanwhile, this completes the generation of the simulated WiFi datasets.

A. System Overview
As shown in Fig. 1, the identification process comprises two essential stages, namely training and inference stages.In the training stage, we only use the noise-affected dataset Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
to train the proposed RFFI system.In the testing stage, the well-trained RFFI system will predict the device label according to the received samples of the multipath fading channel datasets.Fig. 3 shows the detailed architecture of the proposed channel-agnostic RFFI system, where three modules, i.e., signal preprocessing module, feature extraction module, and classification module are illustrated.In the signal preprocessing module, we first generate two parallel SQ sequences through the LB-SCSBD submodule.Then, we convert the SQ sequences to two channel-robust signal representations, i.e., the SQM and the SQC-EVM, so that we can obtain four magnitude-based sequences in the next submodule.In the feature extraction module, we present the MB-SFE to explore the hardware-introduced information from these sequences, where the first, second, third, and fourth moments are extracted from each sequence and then applied as the RFF features.At last, the multi-class SVM classifiers are trained and tested with the extracted feature samples in the classification module.

B. Limiter-Based Spectral Circular Shift Bidirectional Division
Since the division-based algorithm is sensitive to the denominator value, the SQ signal value generated by the SCSD method is unstable and fluctuates heavily.This characteristic decreases the statistical stability with a small amount of data and then degrades the identification accuracy of the statistical features-based RFFI system.On these bases, we propose a novel approach named LB-SCSBD to generate two parallel SQ signal sequences within a limited range.Additionally, we take into account the null and pilot subcarriers of the WiFi OFDM data in the proposed method.
Let s k = [s k (0), s k (1), . . ., s k (I 1 − 1)] denote the k th corrected OFDM signals in the data field of a WiFi frame and ID = [id(0), id(1), . . ., id(I 2 − 1)] denote the data subcarrier indices, where I 1 is the length of an OFDM symbol after removing the cyclic prefix (CP) and I 2 is the total number of the data subcarriers.Then, we can derive the OFDM symbol S k = [S k (0), S k (1), . . ., S k (I 1 − 1)] by performing the fast Fourier transform (FFT) as Due to the fact that the duration of an OFDM symbol is 4 µs, the slow fading channel behaves in a correlated manner during such a short period.Thus, we can expect the channel coefficients to remain unchanged during the transmission of each OFDM symbol.In this case, according to [30], S k (n 1 ) can be approximated as where λ is a constant factor related to the PO and residual CFO; H(n 1 ) is the n th 1 channel frequency response; Y k (n 1 ) is the n th 1 element of the k th OFDM symbol distorted with the transmitter imperfections; Ŵ (n 1 ) is the n th 1 frequency-domain Algorithm 1 Limiter-Based Spectral Circular Shift Bidirectional Division (LB-SCSBD) Input: A complete OFDM signal without CP, s k ; the data subcarrier indices, ID; the maximum limiter output, A max ; the initial index of output, i 3 = 0; Output: The spectral quotient signal sequences: v r k ; v l k ; 1: Performing the FFT operation on s k to derive S k ; 2: Generating the shifted data subcarrier indices vector ID rcs by Eq. ( 11); 3: Calculating the SQ vector Υ r k with Eq. ( 12); 4: for i 2 = 0; i 2 < I 2 ; i 2 + + do Extracting the qualified SQ signal as: Deriving the SQ signal of left circular shift as: Generating v r k (i 3 ) and v l k (i 3 ) by Eq. ( 14) 11: end if 14: end for 15: return v r k , v l k .
noise.Moreover, λ, H(n 1 ) and Y k (n 1 ) can be expressed as where W I1 = e −j2π/I1 and y k (i 1 ) is the i th 1 element of the k th transmitted OFDM signal.
It is clear that the hybrid impacts caused by the multipath fading channel, PO and residual CFO can be roughly deemed as the multiplicative interferences in the frequency domain.Hence, by leveraging the strong correlations of the channel frequency responses at the neighboring subcarriers, the multiplicative interferences can be significantly suppressed in the SQ domain [30].
Hence, we first perform the right circular shift by one step on ID vector, then a new vector of the data subcarrier indices can be obtained as Thereafter, we can generate the index pairs of the data subcarriers as {id(i 2 ), id rcs (i 2 )}, (0 ] denote the right circular shift SQ signal vector, then its i th 2 element can be calculated as To effectively mitigate the channel effects, we extract the SQ signals that can satisfy the condition of id(i 2 )−id rcs (i 2 ) = Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
After passing through a limiter with the maximum output amplitude of A max , we can derive the parallel SQ sequences, i.e., v and their elements can be calculated as where the superscript ℘ denotes r or l.
The detailed steps of the LB-SCSBD are summarized in Algorithm 1.After K times repetitive operations on different OFDM data, we can derive the following parallel SQ signal vectors from a complete WiFi frame as where the length of each vector is KI 3 .

C. Channel-Robust Signal Representations
Considering the need for signal analysis, we first define the SQC symbols as follows: Definition 1: Given Q is the complex-valued set comprised of M -QAM symbols, then it can be used to generate a second-dimension space D = {(A , B)|A ∈ Q, B ∈ Q}.Let f : D → P denote a function of two variables, which can also be written in the following form: where P is the set of spectral quotient constellation symbols based on M -QAM and P ∈ P .Fig. 4 provides the spectral quotient constellation diagrams in terms of QPSK and 16QAM.It should be noted that the SQC symbols are the transformation of the QAM symbols and don't contain any imperfections.Hence, the variations between the SQ signal and SQC symbols can be attributed to the hybrid effects of the transmitter impairments and interferences (noise, residual channel effects, etc.).
The SQC error vector is a measure of how accurately the generated SQ signal is within its constellation, which can be obtained as where p r and p l are the vectors of decided symbols after performing the minimum Euclidean distance between each SQ signal and the SQC symbols in P , and their n th elements are derived as In the following, we investigate two channel-robust and magnitude-based signal representations that can be fed into the subsequent feature extractor.
1) Spectral Quotient Magnitude: The SQ signal, especially its magnitude, contains abundant device-specific information, which can be used for device identification.The SQM sequences are expressed as 2) Spectral Quotient Constellation Error Vector Magnitude: EVM is a popular system-level performance metric that helps gauge the impacts of all impairments simultaneously from a single value.Therefore, the signal representation of the SQC-EVM vectors can be derived as

D. Moment-Based Statistical Feature Extractor
In the feature extraction module, we propose a novel feature extractor named MB-SFE to exploit the discriminant information induced by transmitter imperfections.Specifically, a total of sixteen moment-based statistics (i.e., first, second, third and fourth moments) are extracted from four magnitudebased and then they are employed to serve as the discriminative features in the proposed RFFI system.Let Ψ = [Ψ 1 , Ψ 2 , Ψ 3 , Ψ 4 ] denote the extracted statistical feature vector and ] is the λ-order moment vector.Then, the elements of Ψ λ can be calculated as follows

E. SVM Classifier
SVM is originally designed for binary classification.Broadly, RFFI is used for multi-class classification scenarios.The conventional way to extend binary-classification SVM to multi-class scenarios is to decompose a multi-class problem into several two-class classification problems, and then we can implement the one-against-one strategy for the multi-class SVM classifier training [33].
Considering a γ-class classification scenario in an RFFI system, where we have L training samples: Here Λ i is the device code and i ∈ [1, 2, . . ., γ].According to the one-against-one strategy, we should construct J = γ(γ − 1)/2 binaryclassification SVM classifiers.During the SVM training stage, the polynomial is chosen as the kernel function and the hyperparameters of each SVM are independently updated in terms of the training samples.Supposing that we have trained J binary-classification SVM j (j ∈ [1, 2, . . ., J]) and a testing , Λ i l } is fed into the well-trained classifier, then each SVM classifier will make a prediction on the testing sampling label Λ j (prediction of SVM j ).Obviously, J prediction results of testing sampling will be obtained in the meanwhile.To make the final prediction, we adopt a voting approach named max wins strategy [17] to decide the predicted device code Λ ℵ (ℵ ∈ [1, 2, . . ., γ]).

IV. EXPERIMENTAL RESULTS
In this section, we first introduce the experimental setup for the WiFi datasets generation and multi-class SVM training as well as the evaluation metrics.Then, we will validate the effectiveness of the LB-SCSBD method.Meanwhile, the classification performance of the proposed channel-agnostic RFFI system is investigated by experimental evaluations.Moreover, we compare the performance of our methods with some other existing RFFI methods on the simulated datasets.Finally, we use the data originating from an open dataset [28] to evaluate the proposed RFFI system in the face of the real-world collected signals.The detailed experimental designs and results are given in the following.

A. Experimental Setup
This subsection will introduce the configuration parameters used for the generation of the datasets in terms of the device Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE III THE GENERATED CONDITIONS OF TWELVE SIMULATED DATASETS
impairments and WiFi signal settings as well as the multipath fading channel models.Meanwhile, the evaluation metrics are also provided in this part.
1) Device Impairments: To generate the simulated datasets, eight device models with different impairments are configured in this subsection.As reported from [24], the phase imbalance usually ranges from 2 to 11.42 degrees and the absolute gain imbalance generally runs from 0.02 to 1 dB, so we use a set of gain and phase imbalances within these ranges.Moreover, the utilized power amplifier models are referred from the literature [34], [35], [36], [37], [38], [39].Since the OFDM technique will cause a large peak-to-average power ratio (PAPR) in waveforms, the input back-off (IBO) technique 1 prior to PA is adopted to keep the simulated signals away from severe nonlinear distortions (especially the saturated distortions).Furthermore, the CFO values of different devices are set within limited ranges 2 and follow the uniform random distribution [42], while the PO values follow the same distribution within [−π, π].The detailed parameters of the device impairments used in our simulations are summarized in Table I.
2) WiFi Settings: The carrier frequency and transmission bandwidth are set to 5GHz and 80MHz, respectively.The preamble duration is 20 µs, including 8 µs of L-STF, 8 µs of L-LTF, and 4 µs of L-SIG.We adopt both the QPSK and 16QAM modulation techniques to generate the OFDM in the WiFi data field.On the hand, the WiFi frame modulated with QPSK lasts for 0.48 milliseconds (ms) and contains 115 OFDM symbols (3.2 µs) with CP (0.8 µs).On the other the WiFi frame modulated with 16QAM lasts for 0.364 ms and contains 86 OFDM symbols with CP.After removing the CP, the length of each OFDM symbol in the data field is 256.
3) Multipath Fading Channel Models: As shown in [43], the varying channel will interfere with the transmitter impairments and degrade the classification performance of the RFFI 1 According to [40], the IBO level is defined as 10lg P sat P in , where Psat is the input saturation power and P in is the average input power.Since the average PAPR is about 7.8 dB, we set the IBO level to 12 dB for all simulated models in this paper. 2According to the IEEE 802.11a specification [41], the CFO tolerance with respect to fc is equal to ±20 parts per million (ppm, 10 −6 ), hence the maximum tolerable value of ε in Eq. ( 6) is ±40 • fc/B ppm.system.In order to focus on the channel-agnostic RFFI system, we take the Rician multipath fading channel into consideration, where five different channel conditions are detailed in Table II.In the first delay tap, there is a line-of-sight (LOS) component and a complex Gaussian variable, while the envelope follows the Rayleigh distribution in other delay taps.Since the channel's coherence time is almost always larger than the duration of 8 µs in wireless local area network (WLAN) settings [44], it is reasonable to assume that the channel is time-invariant in each 8 µs duration.Therefore, for each WiFi frame, the channel fading coefficients are randomly and periodically regenerated every 8 µs.
4) Datasets Description: To test the proposed RFFI method with different channel conditions and modulation types, eight simulated datasets are generated in terms of Table III, where the WiFi datasets under the AWGN channel are also included due to the need for the RFFI system training.Each dataset contains 800 samples of the WiFi frames, where the preamble (after performing the CFO correction) and CP (in the data field) are deliberately neglected.In other words, 100 samples of each device are kept in each dataset.

5) Multi-Class SVM Training:
To investigate the performance of the proposed RFFI method, we run LIBSVM3 [45] to train the multi-class SVM classifiers in the following experiments.It should be noted that the moment-based features are extracted the complete data field (i.e., 115/86 OFDM symbols of QPSK/16QAM) of a single WiFi frame.
6) Evaluation Metrics: The confusion matrix and the overall classification accuracy are used as evaluation metrics, which allow visualization of the classification performance.Generally, the probability of correct classification P cc can be measured as where P (Λ i ) is the prior probability of the device Λ i and P (Λ i ) 1/γ; P (Λ = Λ i |Λ i ) is the conditional probability of the event that the predicted device code of testing sample (Λ ℵ ) is Λ i given that the device code of testing sample is Λ i .

B. Effectiveness of the LB-SCSBD Method
First of all, we explore the impacts of the maximum output amplitude (A max ) employed in the LB-SCSBD submodule on the classification performance of the proposed RFFI system.As shown in Fig. 5, the classification results with different A max are provided at SNR = 27 dB.It is clear that we can obtain the best P cc performance when A max = 1.2 in the QPSK cases.Meanwhile, all of the P cc values are close to the best performance when A max = 3.8 in the 16QAM cases.Hence, the A max values are set to 1.2 (QPSK) or 3.8 (16QAM) in the following simulations, respectively.
As mentioned in Section III, the LB-SCSBD is a novel method to generate parallel SQ signals within a limited range.To validate its effectiveness, we compare the classification performance of the proposed system under different SQgeneration methods, i.e., the SCSD and LB-SCSBD.The comparison results are provided in Fig. 6, where the SCSD and LB-SCSBD methods are tested in the proposed RFFI system, respectively.We can find that the proposed RFFI system using the LB-SCSBD method leads to the best classification performance in these cases, with 14% -47% accuracy improvements in comparison to that using the SCSD method.Therefore, we can conclude that the LB-SCSBD method is effective in the experimental scenarios.

C. Evaluation of the Proposed RFFI System
In this subsection, we evaluate the proposed channel-agnostic RFFI system.To investigate the noise effect on classification accuracy, we add artificial AWGN of different SNR levels to the simulated datasets.Since the channel-mitigation effect of the SQ signal degrades severely at high noise levels [30], we simulate the SNR range within 19 dB to 32 dB and the classification results are given in Fig. 7.
It can be observed from Fig. 7(a) and Fig. 7(b) that the overall identification accuracies are dependent on the channel conditions when the noise level is fixed, especially in the medium-level SNR regions (i.e., 23 dB -29 dB).For instance, there are 1.6% -29.1% gaps among the accuracy results tested under different channels at SNR = 25 dB.Moreover, when SNR is equal to 32 dB, it is clear that the recognization accuracy of our RFFI system can reach up to 99.84% and 98.26% in the QPSK and 16QAM cases, respectively.Meanwhile, focusing on the SNR regions within 21 dB to 29 dB, we can clearly observe two interesting phenomena from these curves.On the one hand, in terms of the curves marked with the blue pentagram, red triangle, and yellow square, it is apparent that the recognition accuracies will rise up as the normalized power in the main fading path increases when the path delay taps are fixed.On the other hand, according to the curves marked with the yellow square, violet circle, and green snowflake, the recognition accuracies can be degraded with the increase of the path delay taps when the normalized power in the main fading path is fixed.These phenomena can be explained by the fact that the concentration of the channel power distribution will increase the channel frequency correlation between the adjacent subcarriers, then the channel effects can be suppressed more significantly in the generated Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.SQ signals, and hence improving the overall identification accuracies.

D. Performance Comparison With Existing Methods
In this subsection, we further compare the classification performance of our method with other two existing RFFI methods based on statistical features: 1) In [17], the skewness and kurtosis extracted from the first two decomposed signals of empirical mode decomposition (EMD) are served as RFF under the fading channels.For simplicity, this method is referred to as EMD-SK.2) In [30], the root mean square, variance, skewness, and kurtosis are extracted from I and Q branches of the spectral quotient sequences generated by the SCSD method, and then they can be employed for devices classification.This approach is called SQ-RVSK.Note that the EMD-based RFFI method is operated with the baseband signal of the WiFi data field in our experiment.In detail, we first use the EMD algorithm to decompose the I and Q branches of the baseband signals, respectively.Then, we extract the skewness and kurtosis from the first two decomposed signals in each branch.Finally, the extracted features will be fed into the SVM for training and testing.
Table IV shows the classification results of these methods with respect to different datasets at SNR = 30 dB, where the same training conditions are considered.Obviously, the proposed method achieves the best performance in all experiments.However, the accuracy of EMD-SK can only reach 12.71% -13.81%, which means this method is inefficient in the simulated scenarios.This is because EMD can't alleviate the multipath fading channel effects and then the decomposed signals are heavily impacted by the channel effects.Although the SQ-RVSK method is effective, its accuracies have significant gaps (38.33% -60.44%) in comparison to the accuracies of the proposed method.Hence, we can draw the that the proposed method exhibits robustness and superiority in comparison to the SQ-RVSK and EMD-SK methods for the channel-agnostic RFFI tasks.

E. Performance on the Open Dataset
In [28], the authors first use the B210 radio receiver to collect the raw IQ samples from over-the-air transmissions of different USRP X310 transmitter radios and then release this dataset online.As can be learned from their work, it is hard to classify raw samples collected from the same devices but at different times (due to the dynamic channel), and the classification result will be unpredictable even for four devices.
In this part, we use the six devices' data collected at different times 4 to verify the effectiveness of our RFFI system.Specifically, we first implement the Schmidl-Cox algorithm to detect the start of the WiFi frame in the received data streams.Then, the CFO estimation and correction should be performed by the two-step CFO estimator.Afterward, we use L f WiFi frames to generate the parallel SQ sequences according to the proposed LB-SCSBD method, where the A max is 1.08.After extracting the moment-based features, we will feed them into the multi-class SVM classifier for training.Finally, we test the classification performance of the trained SVM by predicting the feature vectors extracted from the samples collected at different times.Fig. 8 shows the confusion matrixes of the classification results using the proposed RFFI system with different L f , where the overall classification accuracies are 80.17% at L f = 1 and 92.42% at L f = 5, respectively.A significant improvement of the overall accuracy can be made with the growth of L f , since in this case the statistical stability can be enhanced, and then the extracted statistical features are more separable in multi-class SVM.Finally, we can make a conclusion that our RFFI system is still effective on the real-world collected dataset.

V. CONCLUSION
In this paper, we proposed a channel-agnostic RFFI method and employed the legacy WiFi frame as a case study for experimental evaluation.We first configured eight device models of the transmitter with different IQ imbalances and PA nonlinearity.Then, we generated the simulated WiFi datasets in terms of these models under different channel conditions, where two types of modulation formats were considered.The AWGN datasets were used for training the multi-class SVM, while others were used for testing.In our experimental evaluation, we showed that the proposed RFFI system using the LB-SCSBD method outperformed that using the SCSD method, resulting in 14% -47% accuracy improvements at SNR = 27 dB.Moreover, when SNR = 32 dB, our RFFI system can reach up to 99.84% and 98.26% accuracies in the QPSK and 16QAM cases, respectively.In comparison to two existing RFFI methods based on statistical features, our method provided the superior and the most robust classification performance when facing channel-agnostic RFFI tasks.At last, we tested the proposed method on the open datasets collected at different times, the experimental results showed that our method was also effective and achieved an accuracy of 92.42% with six USRP devices.

B. Rapp Model
Rapp model is a memoryless semi-physical behavioral model, which only considers the AM-AM effects.Hence, the Rapp model can be expressed as [37] y(t) = F (z(t)) = az(t) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 4 .
Fig. 4.The scatterplot of the QAM constellation diagrams and the corresponding spectral quotient constellation diagrams.

Fig. 6 .
Fig. 6.The performance comparison of Pcc considering different SQ-generation methods.

Fig. 7 .
Fig. 7.The overall classification accuracy curves of the proposed RFFI system on different datasets (i.e., collected under different channel conditions).

Fig. 8 .
Fig. 8. Classification results on the open dataset with different L f .
1 from Υ r k , and then the extracted SQ vector is denoted as Υr k = [ Υr k (0), . . ., Υr k (i 3 ), . . ., Υr k (I 3 − 1)], where I 3 is the total number of the qualified SQ signals.Meanwhile, we also generate the left circular shift SQ signal vector Υl k = [ Υl k (0), . . ., Υl k (i 3 ), . . ., Υl k (I 3 − 1)] via the above steps.It is noted that the elements in Υl k are exactly the reciprocal of that in Υr k , which can be expressed as

TABLE I IMPAIRMENTS
OF EIGHT DEVICES USED IN SIMULATIONS