Deep Learning-Based Automatic Modulation Classification Over MIMO Keyhole Channels

Automatic modulation classification (AMC) is a significant part of cognitive communication systems. In early researches, likelihood-based (LB) and feature-based (FB) solutions were proposed for the AMC problem. With the developments in the data-driven approaches, a third method based on deep learning (DL) has recently gained prominence among AMC researchers. It is shown that convolutional neural network based classifiers are very efficient in the AMC for both single input single output (SISO) and multiple-input multiple-output (MIMO) systems. However, for most of the works in MIMO-AMC, the channel considered is full rank. This work addresses the problem of AMC over rank deficient channels such as a keyhole channel using a DL-based classifier. The classifier utilizes a CNN, which does not employ pooling layers or dropouts in the convolutional layers. To further improve the classification accuracy, decision cooperation as well as feature fusion is employed. In addition to the keyhole effect, this work investigates the effect of antenna correlation on DL-based AMC. A comparative study of the proposed method and the existing FB AMC method for the MIMO keyhole channel is also presented.


I. INTRODUCTION
Automatic modulation classification (AMC) is an integral part of cognitive-communication receivers. Traditionally AMC was a key technology for military communication [1] applications such as electronic warfare, intruder signal detection, and surveillance. With the introduction of 5G and beyond, the communication systems need to handle numerous devices in the network to process various information sources. Integrating a large number of devices into the communication networks demands increased cognition in the system. Hence, the realization of an intelligent communication system that will automatically adjust the parameters to adapt as per the requirement of the situation is much needed. In these types of cognitive radio applications, a known pool of modulations can be used at the transmitter, and a particular modulation is dynamically selected according to the data rate and channel conditions. This is typically achieved by adding The associate editor coordinating the review of this manuscript and approving it for publication was Cunhua Pan . overhead pilot symbols at each frame of the signal. However, as the number of digital devices added to the network is increasing exponentially, adding overhead symbols affects the efficiency of the system. AMC-based intelligent receivers may be employed in these cases to alleviate the problem.
The research in AMC started by applying the likelihoodbased (LB) methods to solve the problem. The LB approach is optimum in the sense that it maximizes the likelihood of the received data with respect to the modulation type. Important LB methods include the average likelihood ratio test (ALRT), the generalized likelihood ratio test (GLRT), and the hybrid likelihood ratio test (HLRT) [2]- [4]. The poor performances under model mismatch and the high computational complexity of LB methods paved the way for the feature-based (FB) methods. Higher-order moments, higher-order cumulants (HOC), cyclic cumulants, etc [5]- [8] were used as features to classify the unknown signal at the receiver. Another technique, which is gaining popularity among AMC researchers, is deep learning (DL) based classifiers [9]. DL networks like convolutional neural The enormous success of DL algorithms in the field of image processing, computer vision, speech processing, etc. prompted the researchers to utilize the same in wireless communication applications. A data-driven DL based approach for designing a communication system is proposed in [10], [11]. A new way of communication system design based on auto encoders is presented in [12]. These works have demonstrated that the entire transmitter-receiver implementations for a given channel can be learned by the DL algorithms and are competent with the state-of-the-art systems.
A comprehensive survey of different AMC research works based on the DL is presented in [9]. The initial works mostly discussed the AMC for single input single output systems (SISO) [12]- [15]. The results presented in these works established the improvement of classification accuracy over traditional methods. A few of these works [16]- [18] are based on the publicly available datasets RadioML 2016.10B [13] and RadioML 2018.01A [14]. Many others employed datasets generated by themselves according to the selected channel conditions [15], [19].  [22], which utilized the peaks in the cross-correlation function for differentiating the modulation formats. Zhu and Nandi [23] have employed the expectation-maximization algorithm to estimate the MIMO channel matrix, and then an LB-based method is employed for modulation classification. Classification of MIMO signals under the Rayleigh fading channel using a CNN was proposed in [24]. They have employed a zero-forcing equalizer for CSI estimation and used the estimated signals to train the CNN. The performance of the classifier under imperfect CSI was also studied in the same work. A CNN-based cooperative AMC method for the MIMO signals under Rayleigh fading channel is proposed in [25]. In both of these works, the CNN structure contained one dimensional convolutional layers.
In the majority of researches existing in the literature, the MIMO channel models assume a rich scattering environment. However, it is shown that for some MIMO environments, the capacity of the channel will be low even though the signals are uncorrelated [26]- [28].The AMC for MIMO signals under such poor scattering environment (rank deficient channels) is addressed in [29] using an FB approach. This method is able to discriminate only lower order PSK constellations under unknown CSI. To the best of our knowledge, the investigation of DL-based AMC under a rank deficient channel is not carried out in the literature. In this work, we propose to evaluate the efficacy of DL-based classifiers under a correlated MIMO keyhole channel.
The contribution of this paper can be summarised as follows: 1) A CNN-based classifier for the AMC of MIMO signals under correlated keyhole channel is proposed. A decision cooperative mechanism as well as feature fusion method is employed to improve the classification accuracy. The classification performance of the proposed CNN-classifier is compared with the traditional method employing HOC. 2) Two CNN models for SISO AMC and one CNN model for MIMO AMC were selected from the DL-AMC literature and adapted to the current problem. The networks were trained and tested on the MIMO signals over keyhole channels and their performance was compared with the proposed classifier. The rest of the paper is organized as follows. In Section II, the MIMO system model is discussed. The HOC based AMC for MIMO signals over keyhole channel which we VOLUME 10, 2022 have selected as our baseline work is outlined in Section III. A general overview of the DL-based AMC is presented in Section IV. Section V explains the CNN architecture proposed in this work. This section also explains the cooperative decision fusion mechanism as well as feature fusion method used to improve the classification accuracy. The simulation results are given in Section VI. Finally, a conclusion is drawn in Section VII.

II. SYSTEM MODEL
We consider a time invariant, block fading MIMO channel with N T transmitting and N R receiving antennas. Hence at any instance k, the output signal Y (k) cane be given by, where H is the complex MIMO channel matrix of size N R × N T , and W (k) is the N R × 1 zero mean circularly symmetric complex Gaussian noise vector.

A. MIMO KEYHOLE CHANNEL
A spatially correlated MIMO channel [26], [30] can be modelled by the matrix where θ R and θ T represent the receiver and the transmitter correlation matrix and H R is the uncorrelated MIMO channel. The spatially correlated channel considered here is modelled by the Kronecker model [31]. The elements of the correlation matrix θ are given by, where ρ is the complex antenna correlation coefficient of the neighboring antennas. The two correlation matrices, θ R , as well as θ T , are generated using the receiver antenna correlation ρ r and ρ t respectively according to (3). A keyhole channel can occur because of a rich scattering environment separated by large distance or rich scattering environments connected by a rank-1 propagation path. Rich scattering environments connected by diffraction over edges also create a keyhole channel [26], [28]. This can be explained using Fig. 1 where a screen with a small keyhole is punched through it separating the transmitter and receiver antennas. Hence the channel model described by equation (2), cannot model the keyhole channel. The exact model that represents a MIMO correlated keyhole channel can be expressed as, where β and α are independent Rayleigh vectors of size N R × 1 and N T × 1 respectively.

III. THE HOC-BASED AMC OF MIMO SIGNALS OVER KEYHOLE CHANNEL (BASELINE WORK)
In [29], an HOC-based algorithm is proposed for the classification of lower-order PSK signals over the MIMO keyhole channel. The modulation pool considered was {BPSK, QPSK, OQPSK}. The direct modulation recognition (DMR) algorithm proposed in this work does not require CSI at the receiver for classification. The authors cleverly employed the ratio of 4 th and 6 th order cumulants to cancel out the channel effects. The discriminative features were ratios of HOC as given in (5)(6) where ŷ(k) =ŷ(k) − y(k − 1) is the backward difference of the received signal andĈ m,n is the estimate of the cumulant value of signal. For an in depth understanding of these features one can refer [29].

IV. DL-BASED AMC CLASSIFIERS
A schematic representation of DL-based AMC is depicted in Fig. 2. The existing works on the DL-based AMC employ networks like the CNN, the recurrent neural networks (RNN), the ResNet, the convolutional long short term deep neural network (CLDNN), etc., as the core of the classifier. Different types of data representations were proposed for the training and testing of these networks. The three important data representations are in-phase and quadrature (IQ), magnitude and phase (Polar), and the constellation image representations. At the receiver, the complex baseband signal is converted into one of these representations and stored with corresponding labels in the dataset. In a supervised DL-based AMC, these labels include both the modulation type and the SNR associated with the signal.

A. IMPORTANT CNN ARCHITECTURES FOR DL-AMC
In this section, some of the CNN architectures used in DL-based AMC are discussed. Out of these, the first two CNN architectures (Maxpooling and Dropout CNN) are well discussed by the DL-based AMC community in the SISO scenario. The third network architecture is the pioneering work on the DL-based AMC for MIMO signals. A comparison of the performances of the networks under the keyhole scenario is presented in the result section. These architectures are outlined below.

1) MAXPOOLING CNN
The first CNN architecture chosen for comparison is from the work of O'shea et al. [12]. The paper proposes to use two convolutional layers followed by a max-pooling layer along with four dense layers. The activation function used in each of the first five layers was the rectified linear unit (ReLU), and a softmax activation was used in the output dense layer.

2) DROPOUT CNN (DrCNN)
The second architecture is taken from [15] which uses a dropout layer instead of max pooling after each convolutional layer. This network uses a parametric rectified linear unit (PReLU) as the activation function except for the output layer, where a softmax activation is applied. PReLu is a generalization of ReLU with a slope for negative values.

3) 1DCNN
A one-dimensional CNN architecture was employed by [24]. They applied batch normalization (BN) and dropout after each convolutional and dense layer for regularization. The activation function used is softmax for the dense output layer and ReLU for all other layers. We refer to this CNN architecture as 1DCNN in the subsequent sections.

V. PROPOSED DL-AMC FOR MIMO SIGNALS OVER KEYHOLE CHANNEL
A CNN is utilized as the DL network in the proposed classifier. Further, the classification accuracy of the network is improved by adding a cooperative mechanism at the output of the CNN. Details of the employed CNN and the cooperative mechanism are explained in the subsequent sections.

A. DENSE LAYER DROPOUT CNN (DDrCNN) ARCHITECTURE
Generally, the CNNs employed in the AMC problems include two or three convolutional layers and a few dense layers [12], [15], [25]. Since AMC classifiers are to be deployed in environments with limited computational capabilities, the size of the DL classifier should be small. In this work, a modified version of the CNN architecture mentioned in the authors previous work [32] is utilized as the DL network.
We observe that by removing pooling layers and dropouts from the convolutional layers, the CNN is able to achieve better classification performance in the case of AMC [32]. In most of the DL-AMC methods, the network is able to attain significantly better results than the FB-based methods with fewer number of symbols at the receiver. Since the number of symbols in the received signal is less, the size of the CNN is manageable though we remove the pooling layers. The removal of pooling layers and dropout allows the CNN to keep the richness of the features and help to achieve better classification results. In order to standardize the inputs to each layer, batch normalization is employed.
The architecture of DDrCNN proposed to employ for AMC under the MIMO keyhole channel is presented in Fig. 3. Three convolution layers function as the feature extraction unit. The first layer contains 128 filters of kernel size (2,8) with rectified linear unit (ReLU) as the activation function. The second and third convolutional layers consist of 64 and 32 filters with kernel size (1,8), respectively. Two hidden layers with 256 and 128 filters are employed with ReLU as the activation function in the dense layers. BN and dropout are added after each of the hidden layers. The output layer is a softmax layer.
The network is trained using the corresponding dataset for the non-cooperative and cooperative cases. The loss function selected is the categorical cross entropy defined for the ground truth of the class symbols represented in one-hot encoding (y i ) and its prediction (ŷ i ) as, where N is the number of samples considered in one training batch. The adaptive learning rate optimizer (Adam) is used as the learning rate optimizer to minimize the loss function. The learning rate value employed for the training of DDrCNN is 0.001.

Algorithm 1 Cooperative AMC for a Pool of Modulation, M
Input: Output from each of the N R antennas Y j , j = 1 → N R Output: Selected modulation typem n ∈ M I. Input Y j (in the IQ format) to the trained CNN. II. Obtain the softmax probability of each class for these signals at each antenna, to make the probability vector at j th antenna, P j = [P(m 1 /Y j , P(m 2 /Y j , . . . .P(m |M| /Y j ))] III. Find the average probability vector at the receiver, [1,|M|] (P(n))

B. COOPERATIVE RULES EMPLOYED FOR THE PERFORMANCE IMPROVEMENT
The literature shows that when there are multiple antennas at the receiver, one can cooperatively combine the predictions at each of them to improve the overall classification performance of the AMC system [25], [30], [33]. The schematic representation of cooperative AMC using decision fusion is shown in Fig. 4. The CNN is trained with the output from each of the receiving antennas with proper labels. At the time of evaluation, the decisions from the receiving antennas are combined together to predict the correct modulation format. In [25], the authors have shown that direct averaging based cooperation outperforms direct voting method in terms of classification accuracy. Hence in this work, we employ the averaging-based cooperation for combining the results from each antenna. During testing, the output from each of the receiving antennas is given to the trained CNN, and the softmax output probabilities are recorded. Now, these probability vectors are averaged over all available receiver antennas, and the symbol with maximum probability after averaging is considered as the chosen modulation format. The process of decision fusion is elaborated in Algorithm 1.

C. AMC USING FEATURE FUSION
In order to remove the decision cooperation at the output of the CNN, a feature-fusion method is proposed in this section. The features obtained from the trained DDrCNN considered in the previous section are fused together to form a new input signal for a neural network classifier. At lower SNR levels, the performance of this classifier appears to be better than the cooperative decision fusion mechanism applied in the previous section. A schematic diagram of the process is shown in Fig. 5.
The ANN classifier selected for the classification purpose consists of two fully connected layers. The final dense layer of DDrCNN has a size of 128. Hence the input data shape to the ANN classifier shall be N R times 128. The two hidden layers of the ANN classifier consist of 128 and 64 neurons respectively. BN and dropouts are added after each hidden layers. A softmax output layer is employed to select the output class. We use the categorical cross entropy as the loss function and adam as the optimizer with a learning rate of 0.01.

VI. RESULTS AND DISCUSSIONS
In this work, along with the proposed CNN-based classifier, the other three CNN-based DL classifiers described in Section IV were also tested for their AMC performance on the MIMO signals over a keyhole channel. The process of data generation and implementation details are presented first, which is followed by the classification results.

A. THE DATA GENERATION AND IMPLEMENTATION DETAILS
The dataset used for training and testing the classifier is generated using Matlab. For this work, we have created two different datasets as per the model described in Section II. The first dataset consist of 3 lower order modulation types namely BPSK, QPSK and OQPSK. The second one is a larger dataset containing 5 modulation schemes viz. BPSK, QPSK, OQPSK, 8-PSK and 16-QAM. The process of data generation is shown in Fig. 6. Initially, we generate a random sequence of length N, which is then modulated to produce the complex modulated vector X of the same size. The vector X is normalized to unit power and reshaped into N t × N /N t , where N t is the number of transmitting antennas of the MIMO system. The signal is then passed through the Rayleigh fading channel and received at each of the receiving antennas. The complex baseband signal received at a particular antenna j can be expressed as y j = [y 1 (j)y 2 (j) . . . y N /N t (j)] T , where j ∈ [1, N r ]. In the case of cooperative AMC, the real part and imaginary part of y j are extracted and stored as a sample (size 2 × N /N t ) in the dataset. In order to test the non-cooperative AMC, the real part and the imaginary part of the signals at each of the antennas are combined together to form a sample (size 2N r × N /N t ) of the dataset. Both the CNN and the ANN were implemented using Keras and trained on Google Colab Pro using Tesla P100 GPU. For the comparison of the classification performance of different CNN architectures, we have employed a dataset that contains 10000 samples per SNR per modulation type for training and 2000 samples per SNR per modulation type each for validation and testing. The percentage of correct classification (PCC) is considered as the metric to evaluate the classifier's effectiveness. The networks were trained for 200 epochs, an early stopping callback of patience 50 is used. We set the batch size to 500 during training. After training, the test dataset is fed to the network, and the PCC is evaluated for each SNR. The following experiments were performed.

B. CLASSIFICATION PERFORMANCE OF DDrCNN AND COMPETING NETWORKS ON THE KEYHOLE DATASET
The DL-AMC methods are evaluated for both noncooperative and cooperative cases, and the results are discussed in this section. In the cooperative case, the classifiers are tested under correlated keyhole channel also.

1) NON-COOPERATIVE AMC UNDER UNCORRELATED KEYHOLE CHANNEL
In the non-cooperative scenario, we fed the signals at receiving antenna to the CNN as a single frame of data (IQ format). In this case, each of the samples in the dataset is of size (2N r × N /N t ), where N is the number of modulated symbols. We have chosen the MIMO configuration 2 × 4 and the input symbol size to be 256. Hence for the non-cooperative case, the input to the CNN will be 8 × 128 data frame.
The classifiers were tested for two different sets of modulation pools. For the 3-modulation pool, the classification result is plotted in Fig. 7. It is observed that the classification accuracy attains a maximum of around 97% at 5 dB SNR. For the 5-modulation pool, the performance deteriorated considerably and never achieved more than 92% PCC even at high SNRs, as shown in Fig. 8. In the case of the 5-modulation pool, among the four classifiers, the DDrCNN-based classifier performed better with achieving more than 90% accuracy at high SNRs, while Maxpooling CNN was able to achieve only below 80% accuracy.

2) COOPERATIVE AMC UNDER UNCORRELATED KEYHOLE CHANNEL
The recognition performances of DL-classifiers under cooperation for 3-modulations pool and 5-modulation pool are presented inf Fig. 9 and Fig. 10 respectively. From the figures, one can observe that there is considerable improvement in performance in the case of the 5-modulation pool. In the case of the 3-modulation pool, the improvement is in the order of 1% at higher SNRs. While all the classifiers perform similarly for the 3-modulation pool, DDrCNN outperforms others in the case of the 5-modulation pool. However, in terms of the number of parameters in the network, Maxpooling CNN is the lightest among all (Table 1). A plot showing the PCC for each of the constituent modulation scheme is given in Fig. 11.

3) COOPERATIVE AMC UNDER CORRELATED KEYHOLE CHANNEL
The performance of DL-based AMC classifiers over correlated keyhole channels is elaborated in this section.
The classification results for the 5-modulation pool under cooperative AMC are presented. For testing the PCC of the classifier under various correlation factors, datasets with different antenna correlations (ρ = 0.2, 0.5, 0.7, 0.9) at the transmitter and receiver are generated. These datasets were tested using each of the classifiers trained with uncorrelated data, and the result is plotted in Fig. 12. It can be observed that even in correlated keyhole channels, the proposed classifier is able to achieve decent classification performance.    13 shows that the performance of AMC based on feature fusion on the MIMO signals over an uncorrelated keyhole channel is better at lower SNRs when compared to the decision fusion cooperation employed in the previous section. This performance improvement at lower SNR comes at the cost of having an additional ANN at the receiver.

D. COMPARISON WITH TRADITIONAL FB-AMC METHOD
Since the HOC based AMC of MIMO signals over a keyhole channel was proposed to classify lower-order PSKs (BPSK, QPSK, and OQPSK), the classification result of DDrCNN for the same modulation pool is used for the comparison. For the training of the ANN using the features mentioned in the equations (5) and (6), we have created a dataset consisting of 10K feature vectors per SNR per modulation type. In the baseline work, the authors considered 4000 symbols at the receiver for the classification purpose. However, for a fair comparison, we have considered only 128 symbols while generating the dataset. A minimum distance classifier is used in the baseline work for discriminating the extracted features is replaced by an ANN classifier in this comparison.
From the classification performance plotted in Fig. 14, it is easily observable that the CNN-based classifier outperforms the HOC-based ANN classifier employing the DMR technique in a MIMO keyhole channel. The proposed CNN-based method achieved 99% classification above 5 dB SNR while the baseline method was only able to achieve a maximum of 90% classification around 10 dB SNR.

VII. CONCLUSION
This work investigated the performance of the DL-based AMC over a rank deficient MIMO channel. A cooperative decision fusion mechanism as well as a feature fusion method was employed at the receiver to enhance the classification performance. In the case of the 5-modulation pool, the DDrCNN-based classifier achieved better accuracy than the classifiers based on the competing CNN architectures. In comparison with the baseline HOC-based method, the DL-based method achieved better classification performance and was able to classify the higher order constellations like 8-PSK and 16-QAM. The performance of the DL classifier over correlated channels is also evaluated, and the classification is found to be satisfactory.