The Performance Analysis of Complex-Valued Neural Network in Radio Signal Recognition

Many techniques have been developed for wireless signal recognition in many fifth generation (5G) enabled derivatives. Many harsh constraints, such as the large amount of model parameters and complex signal characteristics, drives intelligent recognition method in real-world settings. In this paper, we propose a generalizable, practical method for raw IQ signal recognition. Specifically, deep complex-valued convolutional neural network models, including a Complex-valued Visual Geometry Group (VGG) (CxVGG) model and a Complex-valued ResNet (CxRN) model, are proposed for handling raw signal IQ data. We examine the merit of complex-valued neural networks (CxNN) and validate their performance with experiments using two public datasets. With an SNR of 10dB, the proposed algorithm achieves a recognition accuracy of 96% on the RadioML2018.10a dataset. When performing drone recognition, CvNN can achieve a recognition accuracy of 99%. Our experimental results verify that deep complex-valued neural network models can achieve considerably improved accuracy with lower computation complexity and fewer model parameters than their real-valued counterparts.


I. INTRODUCTION
Over the years, various intelligent terminals and wearable sensing devices brings us more serves mode and connecting approaches, such as autonomous vehicles [1], privacy-preserving [2,3], and blockchain [4,5]. Identification of received signals has become a hot research topic in fifth generation (5G) communication, which is usually a challenging task executed at the receiver side before demodulation [6]. Generally, feature based (FB) methods are prominent in practical implementations because of the low complexity involved [7][8][9][10][11][12]. Usually, domain expertise is critical for such solutions. Recently, deep learning (DL) attracts great interest in the signal recognition field, which employs a hierarchical feature extraction approach. In other words, DL can reduce the pre-processing effort and efficiently distill useful information. Moreover, Graphics Processing Unit (GPU)based parallel computing allows fast inference with DL. In addition, DL has the unique advantage of handling high dimensional feature spaces encountered in signal recognition problems, while the availability of public datasets facilities the research and wide deployment of DL [13][14][15]. Several recent works study the application of DL to wireless signal recognition. Specifically, DL models have been proposed to handle raw I/Q data of received signal [7] and transfer computer vision DL models to classify the statistical images constructed with the received signal [16,17]. Y. Tu et al. [18] proposed large-scale realworld radio signal based on automatic Dependent Surveillance-Broadcast (ADS-B) and utilized several DL methods to give some results on common channel influence in communication system. For DL-aided approaches with raw I/Q data of signals as input, T. O'shea et al. [19] first introduce an open-access AMC dataset termed RadioML2016.10a, which comprises of eight digitalmodulated signals and three analog-modulated signals. In [20], O'Shea et al. introduce several novel deep learning applications in the physical layer. They demonstrate a proof-of-concept study, where a convolutional neural network (CNN) is utilized for modulation classification and achieves satisfactory accuracy. Later in [7], the authors provide a more extensive dataset of additional radio signal − types, a more realistic simulation of the wireless propagation environment, and new methods for signal classification which greatly outperform those they initially introduced. M. Wang et al. [21] illustrated how to improve work efficiency while saving costs in the future DL-based scenarios. Y. Tu et al. [22] proposed activationmaximization based neural network pruning method, it will slim the neural network and make it much easier to be deployed in edge device. Some researchers [23,24] also considered the security problems existed in DL-based communication system, and they used adversarial example to check the best way to deceive DL-based signal recognition. What is more, to overcome the constraint number of signal label, Y. Dong et al. [25] proposed SR2CNN to conduct zero-shot learning for signal type classification.
With regard to DL-aided statistical signal image recognition, deep convolutional neural networks (DCNN) is first applied to process the images constructed with channel state information phase difference data for indoor fingerprinting in 2017 [16,17]. S. Peng et al.[26] exploit colored constellation diagrams to represent digital signals and utilize AlexNet to precisely recognize them. Y. Lin et al. [27] proposed contour stellar image (CSI) to bridge the gap between signal waveforms to DL data formats. CSI will extract the signal I/Q amplitude distribution and map it into different color. This will boost signal feature extraction performance and facility DL framework application in AMC, data augmentation, and transfer learning. S. Zhang et al. [28] utilized binary neural networks to reduce compute complexity, the experiment result showed the neural network only needs 5% and 50% run time to achieve the same accuracy in CSI. Wang et al. in [29] combine two CNNs trained with different datasets to achieve high AMC accuracy. In a recent work [30], the authors utilize generative adversarial networks (GAN), a semi-supervised learning approach, to conduct semisupervised learning for AMC with incompletely labeled datasets. Wang et al. in [31] propose a fundamental privacy-preserving framework with differential privacy. They survey the adaptations and variants of differential privacy in emerging applications as well as the challenges to differential privacy. Zheng et al. in [32] propose adaptive hybrid communication protocols, including a novel position-prediction-based directional MAC protocol (PPMAC) and a self-learning routing protocol based on reinforcement learning (RLSRP). Tang et al. in [33] propose a smart approach to programmatic data augmentation by using the auxiliary classifier generative adversarial networks (ACGAN) to gain a better classification accuracy of communication signal modulation. Wang et al. in [34] also propose a novel AMC method. They firstly utilize Rényi entropy and singular entropy to obtain the modulation feature; present a novel basic probability assignment function (BPAF) based on the normal test theory; and utilize the Dempster-Shafer (DS) evidence theory to develop a classifier. Despite the relatively rich literature on application of DL in communications, there are only few works considering complex-valued representation of signal attributes in model training and inference.
Deep learning in the complex domain is challenging. Hirose et al. demonstrate that complex-valued networks are better than real-valued networks for denoising incoherent, or noise-corrupted, waveforms [35][36][37]. Nuaimi et al. propose an improvement to the MP3 codec using complex networks [38]. Zhang et al. in [39] propose a complex-valued CNN (CvCNN) specifically for synthetic aperture radar (SAR) image interpretation. It utilizes both amplitude and phase information of complex SAR imagery. However, the tools and algorithms for handling complex-valued signals are still lacking, or, are simply too scattered in the literature.
Complex-valued signals are encountered in a wide variety of applications, such as wireless communications, sensor array signal processing, as well as biomedical sciences and physics. Consequently, there is compelling need in science and engineering for a statistical and mathematical theory for processing complex-valued random signals. For example, most practical modulation schemes in communications are complex-valued. Many applications, such as radar and magnetic resonance imaging (MRI), generate data that are inherently complexvalued. In some scenes, two-dimensional real-valued data matrix is another way to present a complex vector and then conducting analysis in the complex domain (instead of the real domain). The complex-valued representation is also compact and simpler in terms of notation and for algebraic manipulations, and is convenient for computation operations. It is evident that the need for the expertise and theory in the processing, statistical modelling, and estimation of complex-valued multivariate signals and phenomena is rapidly increasing.
It is well known that CNNs actually learn discriminating features in computer vision applications. CNNs are expected to detect small-scale feature at shallow layers while complex features in deeper layers. In the wireless communications domain, however, CNNs are not trained to identify images but I/Q samples. Nowadays, the prevalent CNN framework [7] considers the particular transition signature in wireless signals.   different constellation. The transition patterns can constitute a unique signature of the modulated signal, which can eventually be learned by the CNN's filters. Although the above CNN models are proved to be effective and useful [7], we still believe that they do not fully consider the inherent nature of the physical layer. Our motivation are twofold. First, the real and imaginary parts of wireless signals are statistically dependent on each other. For example, consider the circular rotation of a timedomain signal corresponds to a linear phase shift in the frequency domain. The real and imaginary parts of a complex number are dependent on each other under any change in phase. Unfortunately, the real-valued network model usually ignores the statistical correlation between the real and imaginary parts. Second, a complex-valued model provides a more constrained system than realnumbered models. If we know in advance that both phase and amplitude are important to the learning objective, then it is sensible to employ a complex-valued model.
In this paper, we propose complex-valued neural network (CvNN) models for communication signal recognition, including a complex-valued Visual Geometry Group (VGG) model, termed CvVGG, and a deep complex-valued convolutional neural network model, termed Complex-valued ResNet (CvRN). We present their design, study their merit, and validate their performance with two public datasets.
The main contributions made in this paper are as follows.
• We examine how to incorporate CvNNs into smart communication systems. Furthermore, we derive useful insights and highlight the inherent merit of CvNN in handling raw I/Q signal waveforms. • We propose new deep neural network architectures, i.e., the complex-valued VGG (CvVGG) and the complex-valued ResNet (CvRN) models for signal recognition problems. We present their designs and configurations, and explore their suitability for signal modulation classification and device fingerprint identification problems. • We provide a thorough experimental study of the proposed CvNN models with respect to their classification accuracy, learning speed, computation complexity, and model parameters using two public datasets. We demonstrate the effectiveness of the proposed CvNN models under challenging and realistic scenarios, using raw, realworld I/Q waveform data. The remainder of this paper is organized as follows. Section 2 presents the merit of CvNN for signal recognition. In Section 3, the building blocks and the architectures of the two proposed CvNN models are introduced. Subsequently, the performance of the proposed CvRN and CvVGG models is investigated using two public datasets in Section 4. Section 5 concludes this paper.

A. COMPLEXED-VALUED CONVOLUTIONAL KERNEL REGULARATION
A real convolution output can be interpreted as a heat map of similarity to the convolution kernel. That is, every output value of a convolutional layer is a dot product between a kernel and an input patch. Indeed, a dot product between a real patch and a kernel with L1-norm is maximized when they are equal to a scalar, given by: where Wr denotes the convolutional kernel weight matrix and X denotes the input data. When it comes to complexvalued convolutional operation, we need to maximize the magnitude of the dot product between the complexvalued convolutional kernel weight matrix Wc and the input data X, given by: considering the representation of complex-valued amplitude and phase, we can rewrite (2)  In (3), mn Z is multiplied by A mn and rotated by mn  .The sum of the multiplied and rotated vectors achieves its maximal magnitude if they all have the same phase and their magnitudes accumulate. Otherwise, the summed terms may cancel each other, resulting in a smaller magnitude. In other words, we claim that the complex-valued convolutional layer indeed operates as a regularization method.

B. SIGNAL COHERENCE
When we deal with I/Q raw waves, the real and imaginary axes are essentially less meaningful than amplitude and phase (or phase difference). This is because the real and imaginary axes are determined only relatively to an arbitrarily determined phase reference. The receiver determines the real and imaginary parts, which never exist beforehand [44]. Instead, the difference of two phase values is meaningful itself, which corresponds to the 1: time course and /or position difference. In this sense, the phase difference represents certain useful information. The amplitude, orthogonal to phase, is also meaningful since it signifies the energy or power of the waveform.
As an example, consider the 16-quadrature-amplitude modulation (16-QAM). Fig.2(a) shows an ideal signal constellation in the complex plane. When a receiver detects the signal, the constellation is affected by random noise, phase rotation/Doppler effect, and possible harmonic waves just like that shown in Fig. 2 The outcome of the CvNNs operation contains values that carry features of both I/O parts, as a result of the complex operation. Let the complex kernel be A+jB and the complex input signal be X+ jY. We can store the outcome as: Mathematically, the outcome of this operation is still real valued in one channel and imaginary in the other channel. This way, the mixed channel will allow CvNN to learn signal coherence information. Theoretically, CvNN outperforms real-valued Neural Network (RVNN) in high signal coherence regions (e.g., the high SNR region).

NETWORK ARCHITECTURE
In this section, we will first discussion the complexvalued building blocks of CvNNs, including complexvalued convolutional layer, complex-valued batch normalization layer complex-valued activation, and complex-valued dense layer. We will then present the architectures of the proposed complex-valued VGG (CvVGG) and complex-valued ResNet (CvRN) models.

A. COMPLEXED-VALUED CONVOLUTIONAL LAYER
Based on the definition of complex number as an ordered pair of real numbers, we represent a complex number z as: where m and n are real numbers representing the real and imaginary parts of z . The addition and multiplication operations of two complex numbers can be respectively defined as: In addition, for 2D real values, the addition and multiplication operations can be respectively represented as: Nowadays, the fact that a complex number is defined by two real numbers may lead present-day neural-network researchers to consider a complex neural network equivalent, other than just a double-dimension, real-valued network. However, as can be seen from (5) to (8), although the addition processes are identical, the multiplication of two complex numbers is unique, involving both angle rotation and amplitude amplification. This feature is the result of the mixture of the real and imaginary components, making it more challenging to design a CvNN.
In the complex generalization, both the kernel and input are complex-valued. The only difference stems from the multiplication of complex numbers. When convoluting a complex matrix with a complex kernel A+jB, the output corresponding to the input patch X+jY is given by: To implement the same functionality with a real valued convolution, the input and output should be equivalent. Each complex matrix is represented by two real matrices, stacked together in a three dimensional array. Again use array [A, B] to represent the convolutional kernel and let the input data be [X, Y]. For traditional two real-valued channel kernel, the dot product between the kernel and input data is: which is not the desired complex-valued output (which should be (9)). To obtain the desired outcome, we convolute with multiple kernels through multiple channels. That is, we use an equivalent real convolution scheme that has two kernels in the forms of [A, -B] and [B, A]. Such a two-kernel approach, as illustrated in Fig. 3, can produce the desired output given in (9).
In summary, a convolution layer in a complex-valued network can be implemented in a restricted form of a real valued convolution laver with twice as many kernels.

B. COMPLEXED-VALUED BATCH NORMALIZATION LAYER
(a) Complex-valued convolution (b) Equivalent real-valued convolution training in most types of deep networks. However, when the underlying problem lies in the complex domain, the data requires special handling as outlined in [35]. Briefly, one cannot perform two-way in dependent normalization in the real and imaginary parts of a complex number, as there is also information precisely in the relation of the real and imaginary axes. To properly handle the normalization process, we treat the complex numbers as 2D vectors and the process as 2D whitening. We scale the 0 centered data x by the inverse square root of its covariance matrix V, as: (11) where V is a 22  covariance matrix given by: As in batch normalization for real values, we also set the learnable shift parameter p and the scaling parameter y for complex batch normalization. The scaling parameter is given by: The real part and imaginary part of the shift parameter  and ri  ; in the scaling parameter  will be initialized to zero, while rr  and ii  in the scaling parameter  will be initialized to 1 2 . The initialization will satisfy a modulus of 1 for the variance of the normalized value [45]. Finally, we obtain the complex-valued batch normalization laver as:

C. COMPLEXED-VALUED ACTIVATION
The rectified linear unit (ReLU) has become very popular for DNNs, since it avoids vanishing gradients usually associated with the sigmoidal activation. In this paper, we propose Complex ReLU (or, ReLU), the complex-valued activation that applies separate ReLUs on both the real and imaginary parts of a neuron, which is given by: (Re( )) ( ( )) Relu Relu z Relu jIm z =+ (15) The surface plots for the real and imaginary parts of CRelu are presented in Fig. 4.

D. COMPLEXED-VALUED DENSE LAYER
The dense layer often serves as a classifier in BNN. To make full use of the complex-valued statistical information, we also present a mechanism, named Complex Dense Layer, to learn complex-valued features while computing complex-valued classification results. Denote a complexvalued dense vector weight as w a jb =+ and a complex-valued input as Similar to the complex-valued convolutional operation presented in Section III part A, we have: This process is illustrated in Fig.5.  The VGG model [40] is based on AlexNet [41] and has several unique features. Instead of using large receptive fields like AlexNet (11 x11 with a stride of 4), VGG uses very small receptive fields (3x3 with a stride of 1). Because there are now three ReLU units instead of just one, the decision function is more discriminative. There are also fewer parameters (27 times the number of channels, while AlexNet has 49 times the number of channels). VGG incorporates 1x convolutional lavers to make the decision function more non-linear without changing the receptive fields. The small-size convolution filters allow VGG to have a large number of weight layers; and in most cases, more layers lead to improved performance. This is not an uncommon feature, though. GoogleNet from Google Research, another model that uses deep CNNs and small convolution filters, also performed well in the 2014 ImageNet competition. The main challenge is how to make VGG accept complex-valued signal data format. In this paper, we refer to [7] and design a VGG architecture for modulated signal I/O raw waveforms and replace the 2D Convolutional laver with the 1D Convolutional laver. We do not perform any expert feature extraction or other pre-processing on the raw radio signal.
Instead, we allow the network to learn raw time series features directly from the high dimension data. The architecture of the proposed CvVGG is given in Fig. 6. The network layout parameters are given in Table 1.

F. COMPLEXED-VALUED VGG ARCHITECTURE
Deeper neural networks are usually more difficult to train. Deep Residual Network (ResNet) [42] is arguably the most groundbreaking work in the computer vision/DD community in the last few years. ResNet makes it possible to train up to hundreds or even thousands of layers while achieving a compelling performance. Taking advantage of its powerful representational ability, the performance of many applications other than image classification have been boosted, such as object detection, face recognition, and Wi-Fi fingerprinting [43].
The main challenges for creating a complex-valued ResNet model are twofold. First, how to make CvRN RN accept complex-valued signal data. Second, how to design the residual stack which can extract complex-valued signal features while guaranteeing projection shortcut match the dimension. In this paper, we design a CVRN architecture for handling I/Q raw waveforms of complex-valued wireless signal. We replace the 2D Convolutional layer in the residual block with the 1D Convolutional layer, and replace the real-valued convolutional laver and the BN layer with their complex-valued counterparts, respectively. We also utilize the 1x1 convolutional layer to match the dimension. A sketch of the proposed CvRN architecture is given in Fig. 7. The detailed CvRN network layout is given in Table 2 and the residual stack architecture is given in Fig.  8.

IV. EXPERIMENTAL
In this section, we will investigate the performance of the proposed CvVGG and CvRN models for wireless signal recognition, by comparing their classification accuracy, learning speed, computational complexity, and amount of parameters with their real-valued counterparts. We choose two public datasets in our study, including the RadioML2018.10a public dataset [19] and the real-world, over-the-air drone radio fingerprint dataset [46]. The experiment results validate that CvNN is more suitable for raw I/Q waveform recognition than their real-valued counterparts.

1) RADIOML2018.10A
The RadioML2018.10a dataset contains 24 types of modulations, including several high-order modulations (QAM256 and APSK256) [19]. Generally speaking, they are often used in low fading channels and high SNR environments, such as impulsive satellite links (e.g., DVB-S2X) [47]. The dataset was generated with many transmission impairments, such as carrier frequency offset (CFO), symbol rate offset (SRO), delay spread, and thermal no This dataset only takes into account the observation in relatively short time windows. The number of samples is 1,024. When the decision-making process does not have enough time to wait for more data to improve certainty, such short time classification would be painful but inevitable. This is particularly common in objective real-world systems, such as those in the environment where short bursts of signals occur or where observations are processed over time. One would not expect a classification rate close to 100% with this dataset when the signal-to-noise ratio (SNR) is low (i.e., Es/No is from -20dB to +30dB), which makes it a good benchmark for us to study the proposed CvNNs. In this experiment, 70% of the ww dataset is used for training and the remaining 30% is used for validation and testing.

2) REAL-WORLD DRONE FINGERPRINT DATASET
Based on the system model, a drone dataset is composed of the complex-valued (radio fingerprint) RF data collected from real drones [46]. These real drone signals are provided by a large drone database. This database has collected many valuable real drone RF data through the following three modules: drones under analysis module, fight control module, and RF sensing module. Furthermore, we used three types of drone signals and noise from this database. More important, three types of drone activities were carried out by three different brands of drones, where different brands have different prices, protocols, and technologies. After IQ sampling on the RF data of each drone activity, we can obtain the drone signal dataset.  drone activity. These samples are randomly split with a proportion of 7:3 for the training and testing of the CvNNs. The length of each sample is 1,024. In order to better extract the features in the dataset, we split each sample into in-phase and quadrature components, in the form of a realvalued matrix of dimension 1,024 x2. This dataset will be used as input to the CvNN drone identification system for identification of different drone signals. Fig. 9 shows how to construct an input tensor from I/Q waveforms.

B. HARDWARE AND MODEL CONFIGURATIONS
Experiments are performed on a SUGON sever equipped with two Intel Xeon Bronze 3104 CPUs (6 cores, each at 1.7GHz) with 256GB of DDR4 RAM and two Nvidia TitanXP GPUs, each with 12GB of VRAM. Each model we tested occupies a maximum of 8GB space on the GPU. The software we use is Keras 2.2.4 with TensorFlow 1.12.0 and Python 3.7.5 as y backend.
For both RadioML2018.10a and Drone RF datasets, an epoch is the number of iterations in which the total number of samples chosen is equal to the size of the training set. The networks are trained over 500 epochs with a batch size of 512. We find that adaptive learning rate annealing is helpful, and thus reduce the learning rate by 0.1 after every10 epochs, where the model fails to improve its validation accuracy. We also allow early stopping once the validation accuracy does not improve over 30 epochs. The results we report in this section represent the best generalization accuracy over 500 epochs or if the early stopping criterion is met. This is a standard practice for obtaining unbiased error estimations when facing significant computational requirements for training such networks and exploring different architectures, configurations, and datasets. We evaluate both SGD and the Adam optimizer, while Adam is usually superior than SGD. Real-valued layers are initialized using the uniform distribution [48], while complex-valued layers leverage complex weight initialization.

C. CLASSIFICATION ACCURACY
After training the models, we obtain the classification accuracy results using the RadioML2018.10a and drone RF datasets. The results are presented in Figs. 10, 11, 12and 13.

1) THE RADIOML 2018.10A DATASET
We test CVRN, ResNet, CVVGG, and VGG at every SNR level and the validation RN v GO accuracy results are plotted in Fig.10. For better visualization of the results, we also present the confusion matrix results for the four models when SNR is 10dB in Fig. 11. From Figs. 10 and 11, we have the following observations.
• When the SNR is less than 0dB (i.e., when the SNR is medium or low), both the CvRN and CvVGG curves are close to the corresponding real-valued curves, respectively. This is because the I/Q coherence information in the signal has been severely polluted when the SNR is not high. • When the SNR is higher than 0dB (i.e., when the highly coherent region), itis evident that both CVRN and CVVGG G outperform their realvalued counterparts, respectively. This is because CvNNs can better 1earn the l/Q coherence information in the signal, which is less polluted.

2) THE DRONE RF DATASET
We also investigate the CyNN performance with the drone RF dataset. The average classification accuracy results obtained by the four models are presented in Fig.12.
To provide more insight of the identification result, the confusion matrices of the four schemes are presented in Fig.   FIGURE 12. Comparison of the validation accuracy of the four models. 13. From the comparisons in Figs. 12 and 13, we can conc (c) CvVGG -lude that the two CvNN models performs well in identifying RF I/Q signals, and outperform their realvalued counterparts, respectively. VOLUME XX, 2017 9 (d) VGG

D. LEARNING SPEED
In this experiment, we use learning curves to measure the learning speed of the four schemes. The results obtained with the RadioML2018.10a dataset and the drone RF dataset are presented in Fig. 14. We can see that both CvNN models learn faster than their real-valued counterparts, respectively. We conjecture that the reason are as follows.
• Signal coherence: Due to the signal coherence information in the raw 1/Q data, CvNN ww can learn faster since it can rely on more data sources than real-valued models. • Degree of freedom: As discussed in Section IV, CvNN has fewer degrees of freedom since its weights can be represented by amplitude and phase. When the degree of freedom is reduced, the arbitrariness of the solution will also be reduced. Thus the CvNNS can learn faster than their realvalued counterparts, respectively.

PARAMETERS
Floating point operations per second (FLOPS) is used as an approximation so as to calculate the number of operations using the model [49]. Also, the number of parameters is a metric of capacity or the ability to approximate functions. When there are too many VOLUME XX, 2017 9 parameters, the neural network tends to overfit the data. On the other hand, with too few parameters, the neural network tends to underfit the data. We conduct an experiment for CvRN, ResNet, CvVGG, and VGG to compare these models' FLOPs and number of parameters. The experimental results, obtained from the TensorFlow API, are presented in Fig. 15. Comparing the computation complexity and parameters, we can conclude that CvRN is a better choice than ResNet, and similarly, CvVGG performs better than VGG. This is because they have fewer parameters and therefore require less FLOPs and storage space than their real-valued counterparts, respectively.

V. CONCLUSION
In this paper, we demonstrated the effectiveness of the proposed CvNNs, which are powerful models for wireless signal recognition. Specifically, we presented a system that can identify signal coherence information inherent in each modulated signal's raw I/Q waveforms. One of our main motivations was to explore the feasibility of building a DLaided system that is able to rapidly, robustly, and in realtime recognize signal modulation from raw signal I/Q waveforms. The proposed models are amenable for lifelong learning and defensing against adversarial attacks due to their fast learning speed and high confidence about the classification result. Our experimental study using two public datasets showed that such a system could be readily deployed and operate within realistic environments, without constraints on any prior knowledge of the underlying protocol or software implementations.
Our experimental study also quantified the effect of different factors across multiple dimensions of the testing and training processes. While our approach is robust and effective under realistic conditions, we believe that our work is an important exploratory step within a vast and challenging new space, such as transfer learning and continual 1ifelong learning with several interesting future directions identified. In addition, combining traditional feature engineering, knowledge graphs and deep learning for multimodal fusion learning is also a potential idea.