End-to-End PSK Signals Demodulation Using Convolutional Neural Network

Demodulation techniques are of central importance for achieving intelligent receiving. Improvement in demodulation performance enhances the overall performance of a communication system correspondingly. However, conventional demodulators require dedicated hardware platforms leading to high implementation costs and time-consuming development. This work proposes a unified architecture for end-to-end automatic demodulated modulated signals. The proposed demodulator utilizes the residual unit and fully convolutional network (R-FCN) to extract the time-domain feature of the modulated signal and determine the transmitted symbols to realize the demodulation of a received signal. Simulations show that the proposed method has better demodulation performance compared to existing methods. It is further demonstrated that when the signal-to-noise ratios (SNR) exceed 2dB, the proposed demodulator exhibits similar demodulation performance to symbol-unsynchronized data compared to conventional demodulators.


I. INTRODUCTION
Phase shift keying (PSK) is a digital modulation scheme that transmits information by shifting the carrier wave between different phases. In a pure PSK, both the amplitude and the frequency of the transmitted carrier are typically kept constant [1]. Because PSK is more efficient and less prone to errors than frequency shift keying (FSK) and other modulation forms. It is used extensively in digital microwave communication, mobile communication, satellite communication, broadband access, and cable television systems [2]. The current research for wireless communication systems mainly starts with accurate and high energy efficiency or high spectrum utilization, while signal demodulation directly impacts wireless transmission performance.
With the rapid development of computer science and hardware, machine learning has been developed accordingly. Deep learning (DL) [3] is a sub-class of machine learning (ML) that focuses on utilizing convolutional neural networks (CNNs) to extract helpful features in raw data automatically. Inspired by the architecture and functionality of the artificial neural networks (ANN), deep learning uses algorithms to learn abstract features directly from input data in a hierarchical manner with features from higher levels of the hierarchy formed by the composition of lower-level features. Researchers have been exploring its use in communication fields since the introduction of DL [4]. Currently, DL has been applied to channel estimation [5] [6], modulation recognition [7]- [9], channel code recognition [10] [11], communication system simulation [12] [13] and decoding [14]- [16].
Signal demodulation based on learning methods has been widely studied in recent years, and the application of neural networks is more common than other algorithms such as support vector machines (SVM). Depending on the data processing method, the signal demodulation methods can be divided into the sampling points grouping-based and the phase shifts detecting-based. According to the different neural network models used, the signal demodulation methods can be divided into four types: multilayer perceptron (MLP) [18]- [20], deep belief network (DBN) [21], long short-term memory (LSTM) [22], and CNN [25]- [28]. In [17], an ANN demodulator was proposed to demodulate the FSK signal. It can efficiently demodulate the FSK signal transmitted in a complex noise context and significantly enhance the anti-interference capability. And Chan et al. [23] presented a pattern recognition theory-based PSK demodulation algorithm that recognizes PSK order and simultaneously eliminates phase shift interference. The authors from [19] designed an electrocommunication system for deep learning-based demodulation. In [21], two datadriven frameworks of signals demodulation techniques, including the DBN-SVM-based approach and the adaptive boosting (AdaBoost)-based approach, are summarized. In [22], the LSTM unit-aided intelligent deep neural network (DNN)-based DL demodulator was proposed to demodulate the received signals. Better performance than the benchmark systems was obtained. Elbaz et al. [24] proposed to employ the prior information of transmitted speech messages to train a DNN and LSTM for Frequency modulation (FM) demodulation. Onder et al. [20] considered that wireless channels often got substantial interference. So, they designed a feedforward neural network demodulator to demodulate the transmitted signal over unknown channels. The network was pre-trained with pilot signal then further trained in multipath transmission scenario. It obtained better BER performance in the Rayleigh channel than the traditional correlation demodulator.
Recently, many researchers applied CNN to resolve the digital modulation signals' demodulation problems (see, e.g., [25]- [28]). The authors in [25] proposed three ML demodulators based on CNN, the DBN, and AdaBoost. Particularly, the CNN-based demodulator transforms the modulated signal into an image and identifies the demodulation symbols of the received signal by the image classification. In [27], the mixed-signal demodulation was investigated based on the deep convolutional network. This method attempts to demodulate symbol sequences from mixing signals, respectively. However, all aforementioned methods need to take samples at each symbol period as the input to their neural networks, and then they use neural networks to binarize them. Nevertheless, grouping the baseband data strictly according to the symbol period is challenging to achieve, primarily when frequency offset or sampling error occurs. To address this issue, Zhang et al. [26] proposed the 1-D CNN-based binary phase-shift keying (BPSK) demodulator, which uses 1-D CNN to detect the location and form of the phase jump of the BPSK signal, thereby obtaining the demodulation results. This method needs neither complicated preprocessing nor sample sequence grouping, yet an increase in the number of phase jumps will call for a corresponding rise in the number of CNNs that need to be built. Therefore, this approach is no longer practical when the modulation order is raised.
We can see that CNN is a widely used neural network model for signal demodulation from the above. For the most part, in CNN-based signal demodulation approaches, the input vector of received signal supported by the CNN is frequently required grouping of the sampling sequences. Then a CNN model is used to demodulate the modulated signal within one sampling period into one symbol. However, as the number of modulating orders increases, the number of categories increases exponentially. With the aid of the residual network, we first use the entire modulated signal as input to the CNN demodulator instead of a single symbol period to solve the sampling points grouping problem. Subsequently, we use N multi-category classifiers at the final classification layer to demodulate N symbols information stream instead of a single binary classifier.
This paper presents a residual unit and fully convolutional networks (R-FCN)-based end-to-end demodulator, which can be applied to different modulation schemes with a few modifications. This approach allows learning and extracting features directly from the received modulated signal without any a priori knowledge of the channel model. As a comparison, MLP, LSTM, CNN, and the conventional methods are also studied. Then, we investigate the BER of BPSK, QPSK, and 8PSK signals with different SNRs. In order to train the model, three modulate signals of single-carrier modulated signal samples with additive white gaussian noise (AWGN) are generated in various SNR circumstances. Furthermore, we investigate how symbol-unsynchronized data affect the demodulation performance for QPSK signals. Results show that the proposed demodulator is relatively insensitive to symbolunsynchronized data compared to conventional algorithms.
The rest of this paper is organized as follows. Section II briefly introduces the signal model and a typical CNN model. Section III gives a description of the presented CNN demodulator in detail. The results of the experiment and performance analysis are given in Section IV, and the conclusion in Section V.

A. SYSTEM MODEL
As shown in Fig. 1, this PSK signal transmission system consists of three main parts, a transmitter, a transmission channel, and a receiver. The transmitter comprises a baseband modulator, a baseband-shaping filter, and a carrier wave modulator [29]. The receiver is composed of a preprocessor and a neural network demodulator. At the transmission end, after baseband modulation, basebandshaping filtering, and carrier wave modulation, the information bits that have turned into an intermediate frequency (IF) signal is then up-conversed into a high frequency (HF) signal, which is more suitable for channel transmission. In the transmission channel, AWGN is added to the transmission signal. The receiving signal is first down-converted into an IF signal at the receiving end. Next, it is preprocessed then sent to the neural network to get the demodulated information bits. PSK is a digital modulation process that transmits information by shifting the carrier wave between different phases. This paper focuses on studying modulated signals demodulation corrupted by additive white Gaussian noise (AWGN). We suppose that the transmitted information bitstream is independent identically distributed (i.i.d.) and that the transmitter uses the PSK signal waveform to send digital information. When the PSK digital modulation scheme is used, the transmitted signal x(t) is expressed as where A denotes the amplitude, fc is the carrier frequency of the modulated signal, wn is the absolute phase of the n-th symbol. M is the modulation order. Each symbol of MPSK modulation corresponds to k bits, and there are M symbols in total. At the receiver, the received signal y(t) in the communication system is given by: where x(t) represents the transmitted pulse-shaped signal, c(t) is the channel pulse response, and n(t) denotes AWGN noise with power σn 2 .

B. CONVOLUTIONAL NEURAL NETWORK
CNNs are artificial neural networks that are so far generally used to analyze visual images [30]. Inspired by early findings in the study of biological vision, a typical CNN consists of an input layer, convolutional layers, pooling layers, and an output layer, as shown in Fig. 2.
As the most critical module in the CNN, the convolutional layer is vital in applying the model. The parameters of the neural network consist of a set of learnable convolutional kernels, all of which are set as a small receptive field and then expanded to the entire input data by means of a sliding window convolution. During forward propagation, each convolution kernel computes the dot product of the filter with the previous layer of input in the width and height directions of the input data, and then generates a two-dimensional activation map for that convolution kernel. The convolution operation can be expressed as: where w l j and b l j represent the weights and bias of the l-th convolution kernel, respectively, and f(•) represents the activation function.
Another important concept of CNNs is pooling. A pooling layer is a nonlinear down-sampling layer, and it is often added after the convolutional layers to reduce the dimensionality of the feature map. Pooling operation increases the receptive field's size and helps make a representation approximately invariant to local translation.  Two standard functions used in the pooling operation are average pooling and max pooling which help extract background and textures, respectively, where max pooling is the most common (see Fig. 3). When the background is noisy or complicated, max pooling is usually used to summarize the most activated presence of a feature.

Conv
Linear Leaky ReLU x FIGURE 4. The residual unit.

C. RESIDUAL NETWORK
As the depth of a convolutional neural network grows, the gradient of the loss function approaches zero, leading to higher training error. To address this problem, He et al. [31] presented a residual learning framework (ResNet). A residual neural network (ResNet) is an artificial neural network (ANN) that builds on the known structure of cortical pyramidal cells. Residual neural networks do this by skipping connections or shortcuts to skip specific layers. A typical ResNet model is accomplished by double-or triple-layer skip connections, which includes rectified linear unit (ReLU)) and batch normalization (BN) in between.
A residual unit is shown in Fig. 4. Denoting the input of the residual unit as x and output as H(x), unlike plain CNN, which tries to learn the actual output H(x), Residual networks learn the residual--F(x)=H(x)-x. Formally, a residual unit for a ResNet is defined as: where xk is the input of the k-th residual unit, and xk+1 is the output of the k-th residual unit. Leaky ReLU represents the activation function, i.e., σ(x)= max (0, x + β · min (0, x), where β is a small constant, such as 0.1 [32].

III. THE PROPOSED APPROACH
In this section, we first introduce the architecture of the proposed demodulator network in brief. Then, we describe in detail the end-to-end R-FCN demodulator architecture and give an optimized training scheme. Finally, the methods of modulation and the parameters of experimental signals are provided.

B. PROPOSED R_FCN STRUCTURE
The proposed R-FCN demodulator consists of an input layer, an encoder, a decoder, and an output layer. See Table 1 for detailed architecture. There are 4096 nodes in the input layer, the same length as the input vector. The encoder comprises four max-pooling residual stacks. Fig. 6a shows the architecture for the max-pooling residual stack, consisting of two residual units and a max-pooling layer. The encoder performs convolution and max-pooling operations to extract and summarize features. The obtained feature maps are then sent to the decoder. As shown in Figs 6, the decoder comprises two upsampling residual stacks and two residual stacks. In this work, the number of samples per symbol have a fixed number of 4. Thus, there are two more max-pooling layers in the demodulator than the number of up-sampling layers, and two more max-pooling layers are used to down-sample the incoming signals. When changing the number of sample points, the architecture of the demodulator changes correspondingly. The output layer contains only a single convolutional layer. Its output indicates the demodulation results for the incoming signals, whose dimension is determined by the modulation order (M). The output length is calculated by the input length and the number of samples per symbol.
During training, optimizations are made for a CNN better to fit the communication signal demodulation task [34]. First, we use Leaky ReLU [32] as the nonlinear activation function after each convolutional layer. It allows faster convergence than traditional nonlinear activation functions and doesn't omit negative values compared to ReLU. However, the activation function used in the demodulation layer is the Softmax function so that the output values of each Softmax function can be controlled in the [0,1] interval. Since the activation function used in the demodulation layer is different from the other layers, the Xavier initialization method [33] is used to initialize the weights to ensure the consistency of the input and output variances throughout the network. Second, the batch normalization [35] technique is used after each convolution layer to make its output (input for the next layer) follow a standard distribution, improving the generalization performance.

C. DATASETS
The size and variety of the dataset significantly impact the effectiveness of deep learning techniques. This paper evaluates the proposed demodulation algorithm on three widely used digital modulation schemes-BPSK, QPSK, and 8PSK. We use MATLAB to generate random binary bitstreams then modulate them with the above three modulation schemes. In our experiments, we set the frequency of carrier wave fc at 23.325KHz, the sampling rate fs at 93.3KHz. We select 23.325Kbps of symbol rate to obtain the most sampling points. In addition, we add AWGN to signals to train the demodulator, which attributes to the enhancement of noise immunity. The channel SNR is calculated by the ratio of energy per symbol to the spectral noise density (Es/N0) and oversampling factor. The relationship between Es/N0 and SNR, both expressed in dB, is as follows: where sps is the number of samples per symbol (oversampling factor). For an actual input signal oversampled by a factor of 4, the Es/N0 exceeds the corresponding SNR by 10×log10(0.5*4). The data is produced from simulations in different channel SNR (-4 dB to 8 dB) environments for training sets and validation sets concerned in this work. A group of data (10000 samples, 90% for training, 10% for validation) is generated for every 2 dB of SNR change. Eight groups of data are generated in total. Due to the difference in bit error ratio (BER) among the three modulation schemes, the SNR range of the BPSK, QPSK, 8PSK test sets are set to be -5 dB to 8 dB, -5 dB to 10 dB, and -5 dB-15d B, in the order given. One thousand samples are generated for every 1 dB of SNR change for each modulation scheme.

IV. SIMULATION RESULTS
In this section, we conduct a series of simulation experiments to demonstrate the effectiveness and robustness of the proposed R-FCN demodulator.

A. TRAINING OF THE NETWORKS
The R-FCN demodulator is used to demodulate the three modulation data sets (BPSK, QPSK, and 8PSK). We train the network on the training sets, and the validation sets are partitioned in Section III. Categorical cross-entropy and Adam are used as the loss function and optimization algorithm for training, and the initial learning rate is set to be 0.001. If the verification loss is not reduced over eight epochs,   the learning rate is reduced to one-tenth of the previous learning rate. In addition, the mini-batch method is used to partition the input training data sets into several batches to improve the stability of convergence. In this work, the minibatch size is set to be 64. Fig. 7 shows the accuracy and loss value curves of the QPSK demodulation training. The horizontal axis represents the number of training epochs, the left vertical axis represents the loss value, and the right vertical axis represents the accuracy demodulation. We see that the demodulation accuracy increases with the increase of the epochs, while the loss value decreases with the rise of the periods. Moreover, both reach a stable state after about 100epoch. The training curves for BPSK and 8PSK signals are similar. For simplicity, we will not detail them.

B. IMPACT ON DATA SETS
In previous signal recognition tasks (e.g., modulation recognition, channel coding recognition), signals with all SNR levels are needed for analysis of the impact of SNR level on recognition accuracy. We use another training method to analyze the influence of SNR on BER performance in this case because the proposed CNN-based demodulation task is different from the traditional classification task. First, we train on each of the eight data groups to get eight corresponding demodulators, and then we train on all samples to get a demodulator. Those nine demodulators are used to demodulate the test data sets, respectively. Furthermore, the BER of each demodulator in different SNR circumstances is obtained. Experimental results show that the model trained on low SNR samples has a higher BER when demodulating high SNR. In comparison, the model trained on high SNR samples has a higher BER when demodulating low SNR samples. Therefore, training samples bringing two of the bestperforming demodulators are combined into one training set. So, we use that set to train the tenth demodulator. Fig. 8,   Fig. 9, Fig. 10 show BER performances of the R-FCN demodulators trained on three modulation schemes of training sets with different SNRs. In order to improve the readability of images, some test result curves are deleted because some demodulators have similar BER performances. Due to the difference in BER among the three modulation schemes, the SNR range of the BPSK, QPSK, 8PSK test sets are set to be -5 dB to 8 dB, -5 dB to 10 dB, and -5 dB-15 dB, in the order given.
We have three observations from Fig. 8, Fig. 9, Fig. 10. First, the difference in SNR of the training sets leads to different BER performances of the CNN demodulator. When training sets contain data with two levels of SNRs, the trained CNN demodulator has the best BER performance, which is called the optical demodulator. Second, each modulation scheme has its optimal demodulator. The optimal   demodulators were obtained for BPSK, QPSK, 8PSK with SNR equal to 2 dB to 4 dB, 4 dB to 6 dB, and 6 dB to 8 dB, respectively. We can see that when the order of modulation rises, training data with higher levels of SNR are needed to yield the corresponding optimal demodulator. Last, we note that the CNN demodulator trained with all training sets has a higher BER performance than the optimal demodulators. This result indicates that higher SNR data and lower SNR data affect each other. Therefore, a suitable training set is required to obtain the desired demodulator. Fig. 11, Fig. 12, and Fig. 13 show the BER performance curves of the conventional demodulator, coherent demodulator, MLP demodulator [17], LSTM demodulator [22], CNN demodulator [28], and R-FCN demodulator by the demodulation of BPSK, QPSK, and 8PSK signals, respectively. All demodulators use the same training sets and test sets to make a fair comparison. As shown in Fig. 11, the BER performance of the proposed R-FCN demodulator is like the other methods when the SNR is below -1 dB and is significantly better when SNR is over 0 dB. Similarly, in Fig. 12, the BER performance of the proposed R-FCN demodulator is slightly worse than the conventional demodulator in the low SNR. Still, it outperforms other methods by about 1 dB in the high SNR. In addition, from Fig. 13, we can find that the BER performance of the LSTM demodulator is worse than the other five demodulators. It can be seen in Fig. 13 that at SNR ranges from -5 dB to 15 dB, the BER curve of the optimal R-FCN demodulator is always lower than the other three demodulators. Specifically, the proposed R-FCN demodulator can still achieve better reliability performances compared with the state-of-the-art demodulation methods.

D. EFFECT OF SYMBOL SYNCHRONIZATION
In digital communication systems, symbol synchronization provides information about each discrete symbol's start time and end time. In traditional demodulation methods, the speed and accuracy of symbol synchronization significantly impact the reception response speed and BER of digital wireless communication systems.
To analyze how symbol synchronization affects demodulation performance, we compare the BER performance between the proposed algorithm and the conventional methods at different symbol starting points, as shown in Fig.14. P=1 indicates that the starting point of the input data is the first bit of the signal (i.e., symbol synchronization). P=2 means that the demodulation starts from the second bit of the input data (i.e., symbolunsynchronized), P=Random (Because the number of VOLUME XX, 2022 sample points used in the simulation experiment is 4, so P = 1,2,3,4) means that the symbol starting position of the input data is randomly selected. We can see that the symbolunsynchronized input data affects two demodulation algorithms' BER performance. Nevertheless, it has a more negligible impact on the R-FCN demodulator. Moreover, when the SNR is greater than or equal to 2 dB, the BER performance of the R-FCN demodulator for symbolunsynchronized data is close to the traditional algorithm for symbol-synchronized signals. However, when the SNR is less than 0 dB, the CNN demodulator's BER performance (for symbol-unsynchronized signals) is worse than the conventional algorithm for all three modulation schemes. Besides, we see that from the two curves of P=2 and P=Random, the BER performance of the traditional demodulator is affected by the bit starting point. Nevertheless, the R-FCN demodulator is insensitive to the bit starting point. Specifically, a CNN-based R-FCN demodulator will obtain better robustness than the traditional demodulator.

V. CONCLUSION
In this paper, we have proposed a CNN-based architecture for demodulating the modulated signals by integrating CNN into the communication system. The proposed end-to-end R-FCN demodulator takes advantage of CNNs are superior in extracting high-level features, and ResNets address the problem of gradients vanishing. They are being used to improve the performance of the demodulator. Experiments show that the proposed R-FCN demodulator has better BER performance than conventional demodulators. In addition, we further investigate how symbol-unsynchronized data affect the demodulation performance for QPSK signals. Results show that the proposed demodulator is relatively insensitive to symbol-unsynchronized data compared to traditional methods.