Radio Frequency Fingerprint Identification With Hybrid Time-Varying Distortions

Radio frequency fingerprint identification (RFFI) is a promising physical layer security technique that employs the hardware-introduced features extracted from the received signals for device identification. In this paper, we consider an RFFI problem in the presence of hybrid time-varying distortions (HTVDs) induced by multipath fading channel, carrier frequency offset (CFO), and phase offset. To solve this problem, an HTVDs-robust RFFI framework is proposed. Firstly, we derive that the residual HTVDs after CFO correction can be approximated as multiplicative interference in the frequency domain. Secondly, we define a novel signal analysis dimension named spectral quotient (SQ) representation and then present the spectral circular shift division (SCSD) method to generate the HTVDs-robust SQ signals, where the multiplicative interference can be suppressed. Thereafter, the statistics including root mean square (RMS), variance (VAR), skewness (SKE), and kurtosis (KUR) are extracted from the real and imaginary components of the SQ signals, respectively. Finally, the statistical features are used for the training and testing of the support vector machine (SVM) classifiers. To further enhance the performance of the proposed RFFI scheme, we also present the spectral circular multi-shift division (SCMSD) method, which increases the flexibility in the generation of the HTVDs-robust SQ signals. Given what we knew, this is the first time attempting to mitigate the HTVDs by leveraging the strong frequency correlation at the neighboring subcarriers in the multivariate hypothesis tasks. Compared to several handcraft feature-based RFFI methods, the proposed method exhibits superior identification accuracy and strong robustness. Experimental results show that the proposed RFFI scheme can achieve the accuracy of 91.3% with five devices and 86.4% with sixteen devices when the classifiers are trained with the additive white Gaussian noise but are tested with the Rayleigh channel.


Radio Frequency Fingerprint Identification With
Hybrid Time-Varying Distortions Jiashuo He , Sai Huang , Senior Member, IEEE, Shuo Chang , Member, IEEE, Fanggang Wang , Senior Member, IEEE, Ba-Zhong Shen, Member, IEEE, and Zhiyong Feng , Senior Member, IEEE Abstract-Radio frequency fingerprint identification (RFFI) is a promising physical layer security technique that employs the hardware-introduced features extracted from the received signals for device identification.In this paper, we consider an RFFI problem in the presence of hybrid time-varying distortions (HTVDs) induced by multipath fading channel, carrier frequency offset (CFO), and phase offset.To solve this problem, an HTVDs-robust RFFI framework is proposed.Firstly, we derive that the residual HTVDs after CFO correction can be approximated as multiplicative interference in the frequency domain.Secondly, we define a novel signal analysis dimension named spectral quotient (SQ) representation and then present the spectral circular shift division (SCSD) method to generate the HTVDs-robust SQ signals, where the multiplicative interference can be suppressed.Thereafter, the statistics including root mean square (RMS), variance (VAR), skewness (SKE), and kurtosis (KUR) are extracted from the real and imaginary components of the SQ signals, respectively.Finally, the statistical features are used for the training and testing of the support vector machine (SVM) classifiers.To further enhance the performance of the proposed RFFI scheme, we also present the spectral circular multi-shift division (SCMSD) method, which increases the flexibility in the generation of the HTVDs-robust SQ signals.
Given what we knew, this is the first time attempting to mitigate the HTVDs by leveraging the strong frequency correlation at the neighboring subcarriers in the multivariate hypothesis tasks.Compared to several handcraft feature-based RFFI methods, the proposed method exhibits superior identification accuracy and strong robustness.Experimental results show that the proposed RFFI scheme can achieve the accuracy of 91.3% with five devices and 86.4% with sixteen devices when the classifiers are trained

I. INTRODUCTION
T HE Internet of Things (IoT) has permeated every aspect of our daily life and is blooming with massive useful applications such as connected healthcare, smart home, and intelligent industries [1], [2], [3].The number of IoT devices is expected to reach 75.44 billion 1 by 2025 and 300 billion by 2030 [4].Physical layer authentication technique is a fundamental security measure to safeguard IoT systems, allowing legitimate users to access the network while blocking malicious interferences.Given the billions of IoT devices, this task is becoming challenging.Cryptographic schemes on software addresses and pre-shared keys are effective strategies that serve in conventional authentication techniques.However, these schemes have difficulty in management (pre-shared keys) and are vulnerable to spoofing attacks (software address) [5].Thus, there is a critical need to develop more straightforward and advanced security measures as the proliferation of IoT devices continues.
Radio frequency fingerprint identification (RFFI) is a promising noncryptographic authentication technology that employs the hardware-introduced features extracted from the received signals for device identification.The hardwareintroduced features, also named radio frequency fingerprints (RFF), are usually generated in the preparation of the base materials of the components and in the manufacturing process, which suggests that their occurrence is unintentional to the process.Hence, these features are unique and hard to tamper with malicious users.Since the inexpensive components have tremendous hardware imperfections, RFFI is particularly suitable for low-cost IoT devices.At present, RFFI has been widely investigated in mobile phones [6], unmanned aerial vehicles (UAV) [7], self-organized networks [8], and wireless local area network (WLAN) cards [9] for physical layer authentication and identification.
In general, RFFI can be regarded as a classification problem where RFF feature extraction plays a pivotal role.Existing literature on RFF feature extraction methods can be mainly divided into two categories: the transient-based and modulation-based methods [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25].Transient signals are usually generated during the on-off switch or mode transformation, and they can provide unique and unintentional characteristics that are suitable for device identification.As studied in many works, signal power [9], transient phase and amplitude [10] and power spectral density [11] have been employed for feature extraction.However, the transient signals are comparatively short, which is difficult to capture properly.In comparison to the transient signals, the modulated signals are relatively simple to detect and receive.In the preliminary works, many handcraft features of the modulated signals are investigated and achieve significant classification performance in RFFI problems, such as magnitude error [12], sampling frequency offset [13], entropy [14] and statistics [15], [16], [17], [18], [19].Moreover, with the development of artificial intelligence, deep learning (DL) and convolutional neural network (CNN) also have been adopted in many works [19], [20], [21], [22], [23], [24], [25], and these approaches can process raw signals and directly make classification without feature engineering.Considering the limitation of computing capacity and complexity for the resource-constrained RFFI system, we focus on the modulation-based feature extraction method, where the handcraft features are used.
Despite the recent advancement of RFFI, many challenges still remain unsolved, such as the performance unreliability induced by hybrid time-varying distortions (HTVDs).The major time-varying distortion is caused by the multipath fading channel.As reported in the previous studies [4], [21], the carrier frequency offset (CFO) is time-varying and unsuitable for device identification, and hence we consider CFO as a source of time-varying distortion.Besides, the phase offset caused by carrier frequency and phase variations can also be regarded as a kind of time-varying distortion.To combat the impacts of these distortions on RFFI, some works have been conducted recently.For instance, Fadul et al. [26] attempt to mitigate the impact of multipath through the use of a Nelder-Mead simplex-based channel estimator.Sankhe et al. in [27] propose an undercomplete demodulation method where channel estimation and equalization are employed to mitigate the channel effects.However, the equalization-based methods require additional operations for channel estimation and equalization, and their classification performance is sensitive to the accuracy of channel estimation.In [28], Zhou et al. first compensate the frequency offset and phase offset, and then present an artificial noise adding (ANA) algorithm to enhance the recognition robustness through regularization and channel adaptation.However, this method requires a similar channel condition between the training set and the testing set, otherwise, the performance of recognition can be degraded.In [29], Shen et al. first use the preprocessing methods to compensate the CFO and phase offset.Then, they leverage the time correlation of the channel frequency response and perform division between the short-time Fourier transform (STFT) amplitude blocks to construct the channel-independent spectrogram.However, this scheme requires preamble normalization and only employs the amplitude information of the channel-independent spectrogram.
In this work, we propose a robust RFFI framework to overcome the effects of these unknown HTVDs in orthogonal frequency division multiplexing (OFDM) system.Specifically, we first use the WiFi preamble field to estimate the CFO and then compensate for the OFDM symbol in the data field.Thereafter, we give the frequency-domain representation of the OFDM data and derive that the residual HTVDs can be roughly deemed as multiplicative interference.Then, we define a novel signal analysis dimension named spectral division (SQ) representation.Meanwhile, by leveraging the strong frequency correlations of the multiplicative interference at the neighboring subcarriers, we propose the spectral circular shift division (SCSD) method to generate the HTVDs-robust SQ signals.Moreover, we also present the spectral circular multi-shifts division (SCMSD) method to increase the flexibility in the generation of HTVDs-robust SQ signals.Besides, we propose an RFF extractor named DB-RVSK, which can extract the statistics including root mean square (RMS), variance (VAR), skewness (SKE), and kurtosis (KUR) from the double branches (real and imaginary) of the SQ signals, respectively.Lastly, the extracted statistical features are fed to the support vector machine (SVM) classifiers for training and testing.To the best of our knowledge, this is the first time attempting to solve the RFFI problem in the presence of the HTVDs by leveraging the strong frequency correlations at the neighboring subcarriers.
In summary, the major contributions of this paper are listed as follows: 1  The rest of the paper is organized as follows.Section II describes the transmitter impairments and the received signal model interrelated with the time-varying distortions in OFDM system.The details of the proposed HTVDs-robust signal preprocessing methods are briefly introduced in Section III.Section IV provides the RFFI framework and the evaluation metrics.In Section V, we first introduce the setup of the experiments and then evaluate the performance of our proposed RFFI scheme.Finally, we conclude this paper in Section VI.

II. TRANSMITTER IMPAIRMENTS AND RECEIVED OFDM SIGNAL MODEL
Fig. 1 illustrates the simplified structure of a legacy WiFi signal frame, which contains the preamble and data fields [27].The preamble is used for the CFO estimation and correction, while the OFDM user data is used for the RFF extraction and devices classification.In this section, the transmitter impairments and the received baseband OFDM signal model of user data are briefly introduced, which are used to generate the simulated datasets for the training and testing of the proposed RFFI methods.The simplified block diagrams of the transmitter and the receiver are given in Fig. 2. On the one hand, hardware imperfections including in-phase (I) and quadrature (Q) imbalance and power amplifier (PA) nonlinearity are considered in our system as they are the primary aspects of the RFF.On the other hand, HTVDs caused by multipath fading channel, carrier frequency offset, and phase offset are also considered here because they can lead to unreliable classification performance.

A. OFDM Modulation
denote the kth data symbol modulated from a bitstream, where N is the symbol length.After performing the inverse fast Fourier transform (IFFT), the complex baseband discrete representation of an OFDM signal with the supplement of a cyclic prefix (CP) can be given as where N g is the index associated with the guard interval.
For simplicity, we can rewrite the OFDM signal into a complex form where x I (n) and x Q (n) represent the modulated signals on the I and Q branches, respectively.

B. IQ Imbanlance
Since the upconversion is the key part that transforms the baseband signal to the radio-frequency (RF) band for emitting, the imbalance between the I and Q branches often occurs and becomes one of the most significant aspects of transmitter imperfections.As referred to [4], the discrete equivalent baseband signal with IQ imbalance can be expressed as where θ tx (in rad) denotes the phase mismatch; g tx I and g tx Q are the I and Q gain, respectively.Moreover, the I and Q gain in the linear scale at the transmitter can be calculated as where G tx (in dB) is the gain imbalance.

C. Power Amplifier Nonlinearity
Power Amplifier is a crucial component in the transmitter, which can augment the power of transmitted signals.However, PA usually causes the nonlinear amplification of signals.As a result, nonlinearity will be generated in the amplified signals and becomes an important part of the RFF.To describe the transmitted signals considering the memoryless nonlinearity, Saleh model [30], [31] is generally adopted.Therefore, the baseband transmitted signal considering the IQ imbalance and PA nonlinearity can be expressed as where ϕ(z(n)) is the phase of z(n); |•| is the amplitude operator; A(•) is the function used to describe the AM-AM effects and φ(•) is the function employed to characterize the AM-PM effects.According to the literature [30], A(•) and φ(•) can be denoted as where α 1 , β 1 , α 2 and β 2 are the hyperparameters of Saleh model.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

D. Received Signal Model
In this subsection, the multipath fading channel is considered.Since an OFDM symbol period is comparatively short,2 it is reasonable to assume that the channel coefficients are constant in each OFDM symbol period.After passing through the channel, the transmitted signal can be captured by the receiver.The radio frequency received signal with the central frequency f c is shifted down to the baseband with the oscillator frequency f ′ c .Due to the frequency and phase mismatch between transmitter and receiver, CFO and phase offset are caused.In summary, the received baseband OFDM signal interrelated with the transmitter fingerprints and HTVDs can be formulated as [32] and [33] where I is the total number of channel delay taps and τ i is the ith channel delay; h τi is the channel coefficient of the ith delay tap and its envelope follows Rayleigh distribution; w(n) is the AWGN and w(n) ∼ CN (0, σ 2 n ); Φ is the received signal phase offset and is within [−π, π]; ε is the normalized CFO with respect to sampling frequency f s and ε = fc−f ′ c fs .

III. HTVDS-ROBUST SIGNAL PREPROCESSING METHODS
Since the HTVDs can largely affect the received signal, the fingerprints of the received signal are unstable, and hence this significantly degrades the performance of RFFI.To extract the robust RFF, the received WiFi signal needs to be preprocessed to alleviate the effects of HTVDs.In this section, we first employ the preamble to estimate the CFO, which will be used for the CFO compensation of the received OFDM signal in the data field.Then, we derive the frequency-domain representation of the corrected OFDM signal with residual CFO.Meanwhile, we define the SQ representation for the purpose of signal analysis.To further suppress the effects of HTVDs, we propose the SCSD and SCMSD methods to generate the HTVDs-robust SQ signals.The details of these operations are given in the following.

A. CFO Estimation and Correction
After receiving the WiFi signal, we divide it into three parts, including the L-STF signal, L-LTF signal, and OFDM signal of the user data.Then, we employ the L-STF and L-LTF signals to perform the CFO estimation in terms of the conventional two-step CFO estimator, which can be found in many prior works [34], [35], [36].Assuming the estimated CFO is △f , then the corrected OFDM signal can be represented as where ŵ(n) is the AWGN after CFO correction; ε ′ denotes the normalized residual CFO and

B. Fourier Transform of the Corrected OFDM Signal
Considering the length of CP is greater than that of channel delay, the corrected OFDM signal after removing the CP can be rewritten in the matrix form as where r = [r(0), . . ., r(N − 1)] T is the received signal vector after the CFO correction; y = [y(0), . . ., y(N − 1)] T is the transmitted signal vector that only contains the transmitter impairments; ŵ = [ ŵ(0), . . ., ŵ(N − 1)] T denotes the noise vector; ĥ is a N × N matrix correlated with the residual HTVDs, which can be denoted as where ĥm,n is the element of the HTVDs matrix and can be given as Let Q denote the FFT matrix, which is expressed as where W N = e − j2π N .Q H is the corresponding IFFT matrix and Q • Q H = I, where I is an identity matrix.
After performing the FFT calculation in Eq.( 11), the frequency-domain received signal can be derived as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where R = [ R(0), . . ., R(N − 1)] T is the frequency-domain corrected signal vector; Y = [Y (0), . . ., Y (N − 1)] T is the transmitted signal vector in the frequency domain; Ŵ is the frequency-domain noise vector.Let hn = h n e −j(πε ′ N +Φ) when n ∈ [τ 1 , • • • , τ I ], and then we can obtain the maximum normalized mean square error (NMSE) between hn and ĥm,n The maximum tolerable value of ε is equal to ±2500 ppm3 when f c = 5GHz and f s = 80MHz.After performing the CFO correction at SNR ≥ 15dB, the normalized residual CFO ε ′ ranges from −20 to 20 ppm with the probability of more than 99.8%.Since h N M SE can be much less than −23.832dB when N = 1024 and ε ′ ∈ [−20, 20], it is reasonable to assume that hn ≈ ĥm,n when n ∈ [τ 1 , • • • , τ I ].Hence, Eq.( 15) can be approximately rewritten as where h is the approximation of the HTVDs matrix, which can be represented as where hn can be denoted as As can be seen from Eq.( 18), h is a circulant matrix.Consequently, we can further derive the following expression from Eq. ( 17) as where the approximation of the frequency-domain residual HTVDs matrix, which is a N × N diagonal matrix.After a simple operation, the nth element of R can be approximately represented as where ℏ n denotes the nth diagonal elements of H.As a result, the frequency-domain HTVDs can be roughly deemed as multiplicative interference.

C. Spectral Circular Shift Division
In this subsection, we first define the signal analysis dimension of SQ as follows.
Definition 1: Given two spectral vectors X = [X(0), . . ., X(w), . . ., X(N − 1)] T and Y = [Y (0), . . ., Y (w), . . ., Y (N − 1)] T , the spectral quotient vector Υ = [Υ(0), . . ., Υ(w), . . ., Υ(N − 1)] T is defined as the vector obtained by performing division between these two vectors, where Υ(w) is the wth spectral quotient signal and is represented as In the above subsection, we can derive the spectral vector R by performing the FFT operation.Then, the vector ] T can be obtained by d circular shifts.Thereafter, we perform division between R and Rr d to generate the SQ signal vector as where Υ r d is the SQ signal vector; Υ r d (n) is the nth element of Υ r d , which is given as For simplicity, this signal preprocessing method that transforms the received signals into the SQ signals is named spectral circular shift division (SCSD).Meanwhile, we give the following remark on this method as Remark 1: By leveraging the strong frequency correlations of the multiplicative interference at the neighboring subcarriers, the SQ signals generated by the proposed SCSD method are HTVDs-robust to some extent.
The detailed explanations of Remark 1 are given as follows.Firstly, substituting Eq. ( 21) into Eq.( 24), and we can obtain where E d (n) is the noise-induced interference and can be calculated as In general, the frequency correlation of the channel frequency responses is often used in the interpolation technique for channel estimation, such as the piecewise constant interpolation [38].Hence, it is reasonable to assume that ℏ n ≈ ℏ n d as the channel frequency responses will not change dramatically within the neighboring subcarriers.Accordingly, a further approximation of Eq. ( 25) can be given as As observed from Eq. ( 27), without loss of generality, the HTVDs can be effectively suppressed in the SQ signals.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 3.The flow chart of the proposed RFFI framework, where ℵ = m(m − 1)/2 and Λ q denotes the prediction of the q th SVM.

D. Spectral Circular Multi-Shift Division
Obviously, we can yield a group of the SQ signal vectors using the proposed SCSD method by different shifts.Since these vectors are not totally correlated with each other, each vector may carry unique device information.To preserve as much information as possible in the generated SQ signals, we make a combination of these vectors and then obtain the following sequence as where µ is the maximum shift number; Υ sum is a µN × 1 vector.In comparison to the vector derived by SCSD, the vector Υ sum obtained by SCMSD contains more SQ signals and hence the features extracted from Υ sum may be more conducive to RFFI.

IV. THE PROPOSED RFFI FRAMEWORK AND EVALUATION METRICS
In this section, the flow chart of our RFFI framework is shown in Fig. 3, where we can use the switch to select the mode of signal preprocessing.When the switch only turns on a single path, the flow chart denotes the procedures of the RFFI scheme based on SCSD, otherwise, the flow chart represents the procedures of the RFFI scheme based on SCMSD.As can be seen from this figure, we also propose an RFF extractor named DB-RVSK, which can extract the statistics from the real and imaginary branches of the SQ signals, respectively.Finally, the SVM-based classifiers are trained with the statistical features and are used to distinguish the unknown devices.The details of the DB-RVSK feature extractor and SVM classifiers are introduced in the following.

A. DB-RVSK Feature Extractor
After performing the proposed signal preprocessing method, the SQ vector Υ (either Υ r d or Υ sum ) can be obtained.Then, we divide the complex-valued sequence into double real-valued sequences according to its real and imaginary parts.As studied from some works [15], [16], [17], [18], [19], the RMS, VAR, SKE and KUR are often employed as the handcraft features in RFFI.To exploit the devicespecific information, the four statistics are extracted from the two real-valued sequences and are used to serve as the discriminative features in our scheme.
Defining Υ I and Υ Q are the real and imaginary sequences of Υ, respectively.Then, the RMS, VAR, SKE, KUR value of the sequence Υ I can be given as where is the mean value of the Υ I sequence and L is the length of Υ vector.Let e I = [e R I , e V I , e S I , e K I ] T denote a set of statistical features extracted from Υ I .After performing the same operations on Υ Q , we can obtain the following feature vector as

B. SVM Classifier
Generally, the RFFI can be deemed as a multi-class classification problem.However, we can divide it into several two-class classification problems by using some useful strategies such as one-against-one [39], one-against-all [40] and binary tree architecture techniques [41].Since the oneagainst-one technique achieves a better classification performance [42], we adopt this technique to train the multi-class SVM classifier.
Let Λ = [Λ 1 , . . ., Λ m ] be a set of devices that need to be classified, where m is the number of classes.Then, we construct m(m − 1)/2 SVM classifiers where each one is trained on the data from two classes.In our training scheme, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I IMPERFECTIONS OF FIVE DEVICES USED IN SIMULATIONS
the SVM picks Gaussian radial function as the kernel function, which map the low-dimensional linearly features to the high-dimensional linearly separable features.The hyperparameters of each SVM classifier are updated in terms of the training samples.Given an input vector e i extracted from Λ i , the q th SVM classifier will make a prediction Λ q (Λ q is within the range of Λ) on the class of the input vector, and hence we can obtain m(m − 1)/2 prediction results.In order to make the final prediction, the "Max Wins" strategy is employed here as a voting approach to decide the predicted device code Λ j (j ∈ [1, 2, . . ., m]).

C. Evaluation Metrics
1) The Mitigation of HTVDs: The performance of the SCSD method in mitigating the HTVDs induced by multipath fading channel, residual CFO and phase offset can be evaluated using the NMSE value (in dB), which is defined as 2) Device Classification: The curve of correct classification accuracy and the confusion matrix are employed as indicators for evaluating the performance of device classification in this paper.In general, Monte Carlo trials are utilized to calculate the probability of correct classification P cc , which is given as where P (Λ i ) denotes the prior probability of the device Λ i , which is usually equal to 1/m; P (Λ j = Λ i |Λ i ) is the conditional probability in the case that the input vector is extracted from Λ i .

V. EXPERIMENT SETUP AND RESULTS
In this section, we first introduce the experiment setup in terms of the device impairments and the channel conditions.Then, the simulated datasets are generated using these parameters.The HTVDs-suppression performance of the proposed SCSD method is evaluated on the simulated datasets.Meanwhile, we investigate the identification performance of the proposed RFFI scheme with different signal preprocessing methods (SCSD and SCMSD).Furthermore, we also compare the proposed method based on the SCMSD method with the state-of-art approaches using the simulated dataset.Finally, we use the open dataset, referred to as KRI-16IQImbalances-DemodulatedData 4 [27], to verify the proposed RFFI scheme based on the SCMSD method again.

A. Configuration Parameters of Experiment Setup
We configure five device models used as the transmitter with different IQ imbalances and PA nonlinearity.Besides, we also set corresponding varying ranges of CFO and phase offset for each device.According to [43], the absolute gain imbalance varies from 0.02 to 1 dB and the phase imbalance ranges from 2 to 11.42 degrees.Hence, we use a set of gain and phase imbalances within these ranges.The parameters of the Saleh models used here are referred to [30] and [31].It should be noted that the combination of several signals with different phases and frequencies in OFDM waveforms can lead to a large peak-to-average power ratio (PAPR).Hence, the input back off (IBO) technique 5 is adopted to keep the signal away from the saturated region of PA [44].The level of IBO from the input saturation threshold is set as 12dB for all of the transmitted models.The CFOs of the different devices follow the uniform random distribution within the different ranges [45], while the phase offsets of the different devices follow the same distribution within [−π, π].The detailed parameters of the device impairments used in our simulations are summarized in Table I.
The carrier and sampling frequencies are 5GHz and 80MHz, respectively.The duration is 36 µs for each full WiFi frame, where the legacy preamble is 20 µs [27].For simplicity, the 16 µs duration for the data field is occupied by a random 16QAM-OFDM symbol (12.8 µs) with CP (3.2 µs).Note that both CFO and phase offset are set as random constants within the corresponding ranges for each WiFi frame.The Rayleigh fading channel is taken into consideration, and four types of channel conditions are given in Table II [26], [46]. 4Since the dataset is not collected in the multipath scenario, we first add the multipath fading effects to this dataset and then test the proposed RFFI method on it. 5The IBO is defined as 10lg , where Psat is the input saturation power and P in is the average input power.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.In each WiFi frame duration, the channel coefficients are also deemed as constants whose magnitudes follow the Rayleigh distribution.Five simulated datasets are generated according to Table III.Each dataset contains 12000 OFDM symbol samples extracted from the data field after CFO correction, and each device has 2400 samples in each dataset.After removing the CP, the length of each OFDM symbol is 1024.The different signal-to-noise ratio (SNR, γ) levels are simulated to evaluate the noise effects.Moreover, each observed sequence used for the statistical feature extraction contains κ OFDM symbols, where κ is equal to 6 in our experiments.To evaluate the performance of the proposed RFFI methods, we run LIBSVM 6to train the SVM classifiers. 7

B. Evaluation of HTVDs-Suppression Performance
Generally, the RFFI scheme should be robust to the HTVDs induced by multipath fading channel, CFO and phase offset.As given in Section IV, the frequency-domain effects of the HTVDs can be roughly deemed as multiplicative interference, which can be suppressed in the SQ signals generated by the proposed SCSD method.Therefore, we attempt to evaluate the HVTDs-suppression performance of the proposed method in this subsection.
Fig. 4 shows the NMSE results of the SCSD method with different shifts (d ∈ [1,6,11,16,21]), where the Rayleigh fading channel with five paths and the parameters of Λ 1 are used here.As is expected, the NMSE is smoothly dropped along with the SNR, which means that Υ r d is increasingly close to Υ ideal d .Note that the value of NMSE is greater than −7dB when γ < 15 dB, and it can be explained that the interference induced by our method has a negative impact on the NMSE performance in the low and medium SNR levels.Besides, we can also find that the NMSE values along with d are increasingly large.
Fig. 5 provides the simulation results of the proposed SCSD method (d = 1) with different channel conditions  (given in Table II), when the parameters of Λ 1 are employed.In comparison to the NMSE performance achieved by AWGN, the NMSE performance of the multipath fading channels only has a slight degradation.Hence, we can draw the conclusions from these two figures as follows: • The SCSD method can effectively suppress the HTVDs caused by multipath fading channel, residual CFO and phase offset, especially in the high-level SNR regions.• The performance of the SCSD is degraded slightly with the increase of d, which means that the frequency correlation of ℏ n ≈ ℏ n d can be held to some extent.• Even for the different channel conditions, the SCSD method is still effective and achieves similar NMSE performance in our simulations.

C. Investigation of the Proposed RFFI Scheme
Hereinafter, we make the following statement: the training samples are extracted from DAT 1 , while the testing samples are extracted from the other three datasets.
Firstly, We evaluate the identification performance of the proposed RFFI system based on the SCSD with respect to different shifts d, where d ∈ [1,6,11,16,21].The classification results of this case are given in Fig. 6 with the Monte Carlo trials.It can be appreciated from these figures that P cc is degraded with the increase of d, especially in the high-level SNR regions.Moreover, the impact of d on P cc varies from the channel conditions.When the DAT 2 and DAT 4 are employed, it can be seen that the gaps among the different curves in each figure are minor.When the DAT 3 and DAT 5 are adopted, we can find that the gaps of different d are very large, and the classification accuracy of d = 21 (P cc ≈ 60%) degrades severely in comparison to that of d = 1 (P cc > 80%) at γ = 30dB.It is worth mentioning that the degradation is pretty slight and can be neglected when d ≤ 6.
Then, we evaluate the identification performance of the proposed RFFI system based on the SCMSD with respect to different µ, where µ ∈ [1,3,5,7,9].The identification performance of the RFFI scheme based on the SCMSD with different µ is provided in Fig. 7, where the curves of µ = 1 in these figures are as same as that of d = 1 in Fig. 6.It is obvious that the classification performance first degrades and then improves with the increase of µ in these Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.figures.This can be explained that the high NMSE (in the low and medium SNR regions) causes the instability of the statistical features, while the low NMSE (in the high SNR regions) is beneficial to the stability of the statistical features.Furthermore, the accuracy difference between the curve of µ = 1 and that of µ = 9 reaches at least 5% in each subfigure when γ = 30 dB.
Finally, we draw several conclusions observed from these figures: • The proposed RFFI scheme is valid and HTVDs-robust in these simulations.• The classification performance of the proposed RFFI scheme considering the SCSD method degrades slightly with the increase of d, and the performance degradation can be neglected when d ≤ 6 in our experiments.• The proposed RFFI scheme based on the SCMSD can achieve obvious performance improvements in compari- son to that based on the SCSD method at the high-level SNR.

D. Comparison With Existing RFFI Schemes
The performance of the proposed RFFI scheme based on the SCMSD is compared with three existing RFFI schemes using the expert features: 1) The features extracted based on the ratio of high-order [16], referred to as RJ; it is also used as the comparative algorithm in [14].
2) The RMS and KUR features extracted from the frequency domain error vector [17], referred to as FDRK here; the error vector is obtained in terms of the 16QAM constellation.
3) The SKE and KUR features extracted from the decomposed signals based on empirical mode decomposition (EMD) [19], referred to as EMD-SK; the first two decomposed signals are used to extract the two non-Gaussian features, respectively.
It should be noted that power normalization of the received signal is required in the performance evaluation of these handcraft feature-based RFFI schemes. 8After the SVM classifiers are trained using the dataset DAT 1 , we test these classifiers 8 To evaluate the classification performance of the RJ and EMD-SK schemes, we first extract the corresponding handcraft features from the I and Q branches, respectively.Then, the joint I and Q feature vectors are sent to the multi-class SVM for training and testing.using all the simulated datasets. 9Table IV shows the identification performance of the abovementioned schemes when γ = 30 dB and µ = 9.Note that the identification accuracy of RJ and EMD-SK is not high but obviously effective for the AWGN case.This is because these two schemes are not perfectly suitable for the classification task in our simulated scenarios.Meanwhile, it is clear that all three schemes (the FDRK is almost ineffective) show significant performance degradation in the presence of the residual HTVDs, while the proposed scheme exhibits robustness and has a slight performance degradation.

E. Verification of the Proposed Scheme Using the Open Dataset
In this part, we use the open dataset, KRI-16IQImbalances-DemodulatedData, to further verify the proposed RFFI scheme based on the SCMSD.The authors in [27] first use X310 universal software radio peripheral (USRP) software defined radio (SDR) transmitter and B210 radio receiver for the data collection.The receiver SDR samples the incoming signals at 5 MS/s sampling rate at the center frequency of 2.45 GHz for WiFi and sixteen types of IQ imbalances are intentionally introduced.After performing the equalization, the dataset is generated, which consists of the demodulated IQ symbols.
As referred to [26], for adding the effects of the Rayleigh fading channel, the open dataset is post-processed, where the channel conditions with I = 5 are used.It should be noted that the CFO correction is neglected in this experiment, and the detailed steps of our experiment on the open dataset are given in Table V. Fig. 8 provides the confusion matrix of the classification results using the above steps.The overall classification accuracy of the sixteen devices is about 86.4%, which suggests that the proposed RFFI method is also effective and robust in this open dataset.

VI. CONCLUSION
In this paper, we proposed an HTVDs-robust RFFI framework, where significant classification performance can be achieved.In terms of the simulated datasets, we showed that the RFFI scheme based on the SCSD can achieve the accuracy of 80%-86% at d = 1 and the RFFI scheme based on the SCMSD can achieve the accuracy of 86%-91% at µ = 3.Compared to three existing algorithms based on expert features, the proposed method can provide the superior and the most robust identification accuracy when the classifiers are trained with the additive white Gaussian noise but are tested with the Rayleigh channel.Furthermore, we also gave the identification results of the proposed RFFI scheme based on the SCMSD using an open dataset, reaching 86.4% accuracy when µ = 3.To the best of our knowledge, this is the first time attempting to solve the RFFI problem in the presence of the HTVDs by leveraging the strong frequency correlations at the neighboring subcarriers.In the future, we will attempt to improve the performance of the proposed method in the low and medium SNR regions.

Fig. 2 .
Fig. 2. Simplified block diagrams of the transmitter and receiver.

Fig. 4 .
Fig. 4. The NMSE curve of the proposed SCSD method with different circular shifts.

Fig. 5 .
Fig. 5.The NMSE curve of the proposed SCSD method considering different channel conditions.

Fig. 6 .
Fig. 6.Classification results with SCSD Different datasets are used to test the identification performance.

Fig.
Fig. Classification results with SCMSD method.Different datasets are used to test the identification performance.

Fig. 8 .
Fig. 8. Classification results on the open dataset.The overall accuracy is about 86.4%.

TABLE II THE
DELAY TAPS AND NORMALIZED VARIANCE VALUES USED TO GENERATE RAYLEIGH FADING CHANNEL MODELS OF I PATHS

TABLE V THE
EXPERIMENTAL STEPS OF OUR PROPOSED METHOD USING THE OPEN DATASET