Dyadic Aggregated Autoregressive Model (DASAR) for Automatic Modulation Classification

In this article, we presented a novel spectral estimation method, the dyadic aggregated autoregressive model (DASAR), that characterizes the spectrum dynamics of a modulated signal. DASAR enhances automatic modulation classification (AMC) on environments where new or unknown modulation techniques are introduced, and only size-restricted data is accessible to train classification algorithms. A key component for obtaining efficient machine learning-based classification is the development of valuable knowledge-descriptive features. DASAR constructs a multi-level spectral representation by subdividing a signal into successive dyadic segments where each partition is modeled as an aggregation of single-frequency autoregressive processes. Thus, the model ensures a robust representation at the segment level, while the multi-level decomposition can capture time-varying spectra. As a feature extraction model, DASAR can provide useful learning features related to signals with complex spectra. The effectiveness of our model was tested on a dataset comprised of 11 different modulation techniques and realistic transmission medium characteristics. Using only 200 128-point samples per modulation scheme (1% of the available signal samples) and a proper selection of a classification algorithm, DASAR reaches accuracy up to 70.96% compared with a maximum accuracy of 43.62% using the state-of-art methods tested under the same conditions.


I. INTRODUCTION
A reliable communication system enforces that a transmitted message will only be understood if the transmitter and receiver have complete information about the communication parameters, including the type and characteristics of the desired modulation (i.e., the techniques to adapt the message to the transmission medium conditions). However, it is feasible to infer some properties about the communication channel's configuration if we intercept the modulated message passing through the channel. This inference is the central goal in automatic modulation classification (AMC) [1]. AMC was developed for military applications such as electronic LB-AMC estimates the likelihood of a finite set of modulation schemes given the observed signals. Multiple likelihood ratios are commonly applied as the main discrimination method [3]. It is granted that LB-AMC methods can theoretically achieve optimal solutions (if prior information of the transmitter is available). However, due to the complexity of those likelihood functions, the calculations depend on Markov chain Monte Carlo [4] techniques that are intrinsically time computationally expensive and inhibit the application of LB-AMC for embedded systems.
Moreover, FB-AMC techniques search for suboptimal alternatives and provide reasonable and flexible solutions with less computational requirements that do not depend on prior knowledge. FB-AMC was conventionally based on the development of expert features based on the nature of the modulation process: 5th, and 8th-order cyclic cumulants [4], Fourier transform of the received signal [5] or the phase signal [6]. However, the detection accuracy of those methods is limited. In recent research, there was an interest in the development of advanced machine learning methods to increase that accuracy. In this trend, several models were proposed based on deep learning: deep belief networks [7], extreme learning machines [8], stacked autoencoders [1], or convolutional neural networks [9]- [13].
The state-of-art methods mentioned above have reached high levels of accuracy in classifying the modulation type. In some cases, accuracies higher than 90% were reported [8], [12], [13]. However, there are still some challenges, mainly the need for larger datasets for training with all desired and possible modulation techniques in order to train the neural networks properly.
In this article, we propose a preventive solution for those cases when a new modulation scheme, or channel condition, is introduced, and there are only a limited modulated signal samples. For this objective, we proposed to extract spectral features under a classical machine learning classification model. The features are generated from a novel stochastic time-spectral decomposition of the signal magnitude and phase. Our decomposition model, the dyadic aggregated autoregressive model (DASAR), is based on a dyadic (binary) partition of the signals, where the spectrum of each individual segment is modeled by a set of autoregressive models. DASAR can offer noise-robust mechanisms of spectral autoregressive representations while allowing to represent complex dynamic and varying spectra. Therefore, our model can also capture the spectral signal dynamics while retaining only the most significant spectral components (likely to contain the modulated and carrier signal). Furthermore, in order to ensure the computational feasibility of this method, we also incorporated a dynamic estimation method.
The rest of this article is organized as follows: Section 2 describes the generalized signal model, the aggregated autoregressive model, the dyadic decomposition technique, and the estimation algorithm; section 3 illustrates the dataset and the experimental conditions; in section 4, the results of the classification algorithms are shown along with the respective discussions. Finally, we present our conclusions in section 5.

II. DASAR MODEL
In a common communication system, a carrier signal, with amplitude A and carrier frequency f c : c (t) = A cos (2π f c t) is modified according to a source signal x (t) in order to produce a signal s (t) prepared to the conditions of the communication channel (baseband signal). In essence, three major parameters are manipulated from the carrier signal: frequency, phase, and amplitude. We can summarize the diverse single-carrier modulation techniques through the accompanying equation: where κ AM , κ FM , κ PM , κ PAM are modulation scheme parameters and g (t) determines the shape of the pulse (in digital communications), typically a cosine frequency shaping filter: controlled by the roll-off factor α ∈ [0, 1]. Now, let us consider the parameter vector: Combinations of this vector generate the majority of existing single-carrier modulation techniques characterized by [2]. For instance, amplitude modulation double-sided band (AM DSB) and amplitude shift keying (ASK) are the analog and digital case when κ θ = (1, 0, 0, 0). Frequency modulation (FM) and frequency shift keying (FSK) are a particular case of Equation 1 with κ θ = (0, 1, 0, 0). Phase modulation (PM) and phase shift keying (PSK), phase schemes in analog and digital communications, are special cases of Equation 1 when κ θ = (0, 0, 1, 0). Furthermore pulse amplitude modulation (PAM) is obtained when κ θ = (0, 0, 0, 1); and quadrature amplitude modulation (QAM) under κ θ = (1, 0, 1, 0). We should recognize that the source signal s (t) will suffer alterations during its transmission over a wired or wireless medium, and the signal received will be different. Therefore, we can model the measured signal r (t) at the receiver as: where α is the channel gain, p (·) is the pulse shape of the channel, h (·) is the channel response, and ε T is the symbol timing error, and f 0 is the carrier frequency that can be delayed by θ 0 radians. Moreover, it is also often observed that the channel will be affected by a thermal noise interference ω (t) (modeled with an additive Gaussian noise [14]) inherently related to the maximum signal-to-noise ratio supported by the VOLUME 8, 2020 channels (SNR). We refer to [2] and [15] for further details about this model. If the pulse shape and channel response are known, the receiver is able to mitigate the distortion impact of the channel as it takes place in communication with sufficient knowledge between transmitter and receiver. In those cases, the model can be simplified as [2]: For complex modulation schemes, the encoded and transmitted symbols (Equation 1 and 5) can be recovered applying discrete inverse Fourier transform [16]. It is implicit to this process that time-located frequency variations for each modulated symbol can be ignored. Nevertheless, Fourier transform is still suitable to demodulate the signals under some constraints. In this article, we proposed a parametric method to model the signal spectrum; therefore, the complete spectrum can be represented by a reduced set of parameters while spurious components are rejected. The crucial element in DASAR is the spectral representation as an aggregated autoregressive model (AAR). For efficient parameter estimation and simple representation, in this study we use aggregated second-order autoregressive model AAR (2). However, this choice does not restrict the application of DASAR to higher-order AAR models. An autoregressive model of second order AR(2) describes a real and finite signal x (t) (sampled at a fixed interval 1/f s ) as the stochastic process generated by a linear combination of the two previous points, x (t − 1) and x (t − 2), and an additive Gaussian error term ε (t): where φ 1 and φ 2 are known as the autoregressive coefficients. In this case, without loss of generality, we assume that For stationary AR(2) processes, the power spectral density where ω is the normalized frequency with respect to the sampling frequency f s : ω = f f s . We should note that S x (ω) has a single maximum at the dominating frequency ω * : when φ 2 1 + 4φ 2 < 0. Given our interest in the spectral information contained in an AR process, we can reformulate the autoregressive coefficients based on ω * and a parameter τ : AR (2) processes can effectively model stochastic oscillations with a frequency around ω * [17]. In these models, τ controls the randomness of the central oscillation frequency: small values can model signals with frequency components widely spread around ω * (Figure 1 However, the received signal r (t) can have more than one main resonating frequency (this is more noticeable in AM/ASK), and therefore a single AR(2) would be insufficient to represent the spectrum of the signal accurately. Nevertheless, we can take advantage of the concise stochastic frequency representation offered by AR (2), establishing an aggregated model as the superposition of several AR(2) components with distinct parameters. Some theoretical properties of these types of models were introduced by Chong et al. [18] and generalized by Dacunha-Castelle et al. for AR(p) processes [19].
An aggregated model AAR (p, K ) is defined as the sum of K uncorrelated components where each one is characterized through an AR (p) process: where ε k is the additive white noise ε k ∼ WN 0, σ 2 εk , and each z k (t) is a latent unobserved AR(p) time series, in our target case, an AR(2) process associated with a central frequency ω * k , and a frequency randomness τ k (Equation 9, 10 and 7): 156098 VOLUME 8, 2020 Given that z i (t) and z j (t) are uncorrelated for i = j, the spectrum of an AAR (2, K ) is determined by . It should be emphasized that in the latter, it is required apriori knowledge of the target signal, and a reasonable fixed resonating frequencies ω * k . To the best of our knowledge, there is no general method that provides an estimation method without relying on strong assumptions related to the number of representative components of the signal.
We introduce an estimation method that automatically infers a pertinent number of components (and their parameters) that can explain a set of observations. Given a signal that has a fixed, and unknown, number of dominating frequencies with at least a separation of f , we proposed and estimation algorithm composed of three stages: 1) Estimation of an interpolated fast Fourier transform (FFT) of a signal x (t). The autocovariance function of a zero-mean stationary process, x[t], is given by is the expectation operator and the spectrum is further given by the Fourier transform: For a realization (or observation) of this process, the FFT squared magnitude of the recorded signal can be used as an estimate of S X (ω). Moreover, to increase the frequency scale of the digitalized FFT signal, we can interpolate the estimates in the frequency domain with a Dirichlet kernel; or equivalently, zeropadding the signal in the time domain, Thus, the initially estimated spectrum is 2) Fit an AR(2) model that describes the dominating frequency in ω * . Let w be the frequency where S (k) has a maximum value (consider that initially, k = 0). Then, restrict the neighborhood of w with a Gaussian window B (ω; ω * ) around it with a standard deviation equal to the frequency separation f : Now, given the L 2 distance function defined by define the estimateτ as the nonlinear least square of B, i.e., the minimum value of L (ω): 3) Calculate the residuals. Obtain the unexplained spectrum for a next iteration k + 1: 4) The previous steps should be repeated until convergence ( ω S (k+1) (ω) < ε), or until reaching the maximum number of desired components.

C. DYADIC TIME PARTITION
AAR (2, k) offers a model that explains the most significant frequency components in the signal, but ignoring their time origin. This absent time-location information is highly relevant in dynamic modulated signals because of the modulated signal's dynamic nature. Therefore, inspired by the decomposition in the discrete wavelet transform, the dyadic aggregated autoregressive model (DASAR) addressed this time-location issue through a dyadic (or binary) decomposition of the time series where a signal segment on n-th level is split into two segments (n + 1-th level). For each segment, we fit a AAR (2, k) model with a maximum of k components. This decomposition structure provides a representation of the frequency components that are allowed to vary according to time during the complete recorded transmission. Thus, stable components such as the carrier frequencies are likely to be captured in the first level (level 0 in Figure 2) due to their stability and magnitude compared with the modulated components and the noise. Subsequent levels can capture components that are most significative only in specific intervals, or can capture minor variations in the carrier frequencies as they are common for frequency modulation. Figure 2 shows an example of a decomposition of signals with 11 different modulation schemes into 4 levels. Note that compared with the decomposition applied in wavelet transform, DASAR also keeps the high-frequency dominating components at each level.

III. DATASET
In order to ensure the repeatability of the results presented in this study, we utilized a publicly available dataset for assessing the performance of our model: the RadioML2016.10a (RML) dataset. This is a synthetic database created using the software GNU Radio to simulate several realistic transmission conditions. It was originally introduced by O'Shea [14]. In RML, the communication channel has an average sampling rate of 200KHz with a standard deviation of 0.01Hz. The effect of the channel on the carrier (carrier frequency drift) is simulated through a zero-mean Gaussian random variable with a standard deviation of 0.01Hz. Moreover, Doppler effects (common in wireless transmissions) are simulated with stochastic frequency shifts up to 1Hz. Finally, channel delays are simulated through a fractional delay model, with three fractional tap delays: 0, 0.8 and 1.7 taps.
RML comprises 220000 in-phase/quadrature samples with a length of 128 points each one, those segments simulate several noisy conditions with signal-to-noise ratios (SNRs) ranging from −20dB to 18dB. Note that the overall RML simulation parameters (SNR levels, maximum Doppler shift, and channel tap delays) can simulate mobile communications in indoor and outdoor environments with communication devices at varying speeds and under different interference levels. Nevertheless, no burst errors are included in the simulated communication channels. Moreover, RML comprises 11 modulation schemes including 3 analog methods: doublesideband amplitude modulation (AM-DSB), single-sideband amplitude modulation (AM-SSB), wideband frequency modulation (WBFM); as well as eight digital modulation techniques: 2-, 4-and 8-level phase shift keying (BPSK, QPSK, 8PSK), pulse-amplitude modulation (PAM), 16-and 64-level quadrature amplitude modulation (QAM16, QAM64), Gaussian and continuous-phase frequency-shift keying (GFSK, CPFSK). For a detailed explanation about the simulated modulation techniques and their respective parameters, we refer to [14] and [9].  [9], long short-term memory neural networks (LSTM) [13], and convolutional long short-term deep neural networks (CLDNN) [12], respectively.
We extracted features using a 4-level DASAR model with a maximum of 4 components per segment ( Figure 3) such that, at least, 8 data points are processed in the lowest levels given the size of the sample segments. The spectral information is extracted from the modulus and phase of the samples in RML, following the parameter estimation procedure described in Section II.
We adopt a training set composed of a randomly chosen 1% of the dataset (2200 samples) with an SNR higher than 0, and the rest of the dataset as testing set (217800 samples). Due to computational constraints, this process was only repeated 30 times. We should emphasize that this proportion of 1%-99% allowed us to replicate a condition of limited knowledge of the available types of modulations using less than 200 sample segments per modulation scheme.
We also contrast two conditions: our processed features and raw data using four machine learning classifiers: the CNN architecture introduced by [14] as a reference algorithm, random forest of 500 trees (RF500), extreme gradient boosting trees with 500 trees (XGB500), a decision tree with 20 maximum levels (DT20).

IV. RESULTS AND DISCUSSION
It is usually that the AMC discriminators displayed the classifier's accuracy as a function of the signal-to-noise ratio of the modulated signal [9], [12], [13], in order to observe their ability to detect the correct modulation method in environments with high, medium and low interference. It is also frequent that the displayed accuracy is normalized with respect to the number of samples in each SNR level [9], [12], [13]. The accuracy-per-SNR curve of the eight simulated configurations (raw data and DASAR with four classification algorithms) is displayed in Figure 5. Initially, we can observe that the performance of the features extracted from DASAR using RF500 showed the best performance with a detection accuracy higher than 70% with SNRs higher than 10dB.
We acknowledge that O'Shea's CNN model [9] will provide low performance (lower than DT20 in our simulations), due to the limitation of the training data size. As it was also denoted by [13], a deep learning architecture will require more than 50% of the dataset input for reaching an accuracy level of 90%.
We also present the normalized confusion matrix obtained by the RF500 classifier in Figure 5. This structure allowed us to identify the central reasons that illustrate the performance of the method: • The modulation schemes that involve one or more distinct central frequencies were accurately recognized. Thus, AM-SSB has a prediction accuracy of 94.6%, while CPFSK and GFSK have accuracies higher than 96.6%. The latter techniques modulate discrete input signals shifting the carrier frequency, and the RF500 classifier accurately identifies this change.
• However, gradual variations in the carrier frequency were not detected by the classification algorithm (WBFM was correctly classified only in 43.6% of the cases). We hypothesized that DASAR with a maximum of 4 components per level was not sufficient to represent this continuous change in frequency accurately, and the variations were improperly interpreted as wide fluctuations over the carrier wave, and therefore, labeled as amplitude modulation AM-DSB (that produces the same precise behavior in frequency). Note that 22.7% of the AM-DSB segments are recognized as WBFM, and 44.5% of the WBFM samples are classified as AM-DSB.
• Phase transformations cannot be fully described by the PSD, and therefore, by a limited-level DASAR.
We should note that this type of modulation has higher bit error rates, given its high sensitivity to changes in the medium [2]. In our simulations, we observe a confusion of the RF500 classifier between 8-PSK and QPSK, and QAM16 with respect to QAM64. The first set (QPSK and 8-PSK) corresponds to the same modulation technique with 4 and 8 levels, respectively. In both, the digital input information is encoded by only changing the phase of the signal. The second pair (QAM16 and QAM64) are also two variants of a quadrature amplitude modulation that encodes digital signals through combinations of amplitude and phases. Even though the limitations of an AAR(2) spectral representation, the dyadic structure of DASAR can represent the transitions caused by the variations of phase and allow RF500 to identify the algorithms. But, a minor difference in the type of phase transitions (e.g., from QAM16 to QAM64) limits the effectiveness of a 4-level DASAR for differentiating the correct scheme.

V. CONCLUSION
We presented a spectrum-estimation model to serve as a feature-based method for automatic modulation classification with an emphasis in environments with limited access to samples of the modulation schemes. Our method, the dyadic aggregated autoregressive model (DASAR), starts performing a dyadic decomposition, i.e., it divides a signal based on successive binary partitions. Each segment is modeled as a sum of second-order autoregressive processes from which a robust estimator of the spectrum can be obtained. The inference of the parameters in the later processes is performed using an efficient and flexible numerical method. As a feature extraction method for AMC, DASAR was tested on the RadioML2016.10a dataset, which contains a large sample set of modulation techniques under realistic transmission conditions. As the state-of-art proposed for AMC, we explored a convolutional neural network (CNN) architecture with raw data as a classification method, as long as decision trees, and high-performant ensemble trees (random forests and extreme gradient boosting trees) trained with DASAR features. Later, we used a train/test split technique to evaluate the performance of those techniques, repeated 30 times. To assess the generalization power of our method, we simulate a restrictive environment condition where only a minimal portion of the complete set is accessible, in such a way that 2200 samples (1% of the dataset) with SNR higher than 0dB were randomly selected for training the algorithms, and 217800 samples (99% of the dataset) is utilized for testing the performance.
Random forest classifiers with DASAR-based features showed accuracies higher than 70% for identifying 11 different modulation techniques when the signal-to-noise ratio was higher than 10dB. In comparison, the state-of-art methods showed an accuracy lower than 35.5%. Surprisingly, DASAR showed an accuracy higher than the state-of-art (59%) even when the noise and signal have the same power level (0dB). We should emphasize that alternative state-ofart deep-learning methods have reported higher accuracies when larger datasets are available. Nevertheless, the observed results revealed promising further applications of DASAR for environments where novel modulation schemes are introduced, or extensive datasets cannot be provided or constructed.