Deep Transfer Learning for IoT Attack Detection

The digital revolution has substantially changed our lives in which Internet-of-Things (IoT) plays a prominent role. The rapid development of IoT to most corners of life, however, leads to various emerging cybersecurity threats. Therefore, detecting and preventing potential attacks in IoT networks have recently attracted paramount interest from both academia and industry. Among various attack detection approaches, machine learning-based methods, especially deep learning, have demonstrated great potential thanks to their early detecting capability. However, these machine learning techniques only work well when a huge volume of data from IoT devices with label information can be collected. Nevertheless, the labeling process is usually time consuming and expensive, thus, it may not be able to adapt with quick evolving IoT attacks in reality. In this paper, we propose a novel deep transfer learning (DTL) method that allows to learn from data collected from multiple IoT devices in which not all of them are labeled. Specifically, we develop a DTL model based on two AutoEncoders (AEs). The first AE (AE1) is trained on the source datasets (source domains) in the supervised mode using the label information and the second AE (AE2) is trained on the target datasets (target domains) in an unsupervised manner without label information. The transfer learning process attempts to force the latent representation (the bottleneck layer) of AE2 similarly to the latent representation of AE1. After that, the latent representation of AE2 is used to detect attacks in the incoming samples in the target domain. We carry out intensive experiments on nine recent IoT datasets to evaluate the performance of the proposed model. The experimental results demonstrate that the proposed DTL model significantly improves the accuracy in detecting IoT attacks compared to the baseline deep learning technique and two recent DTL approaches.


I. INTRODUCTION
The Internet-of-Things (IoT) refers to connected devices, sensors, an actuators used in vehicles, electronic appliances, buildings, and structures. As the sensors, data storage, and the Internet become cheaper, faster, and more integrated together, IoT devices will find more and more applications [1] (e.g., in smart buildings, smart city, intelligent transportation systems, and healthcare). The rapid development of IoT to most corners of life, however, leads to various emerging cybersecurity threats. This is because IoT devices are often limited in computing capability and energy, making them particularly vulnerable to adversaries. IoT devices are more exposed to and unfortunately more difficult to be protected from The associate editor coordinating the review of this manuscript and approving it for publication was Omid Kavehei . cyber attacks than computers [2], [3]. Consequently, detecting attacks to protect IoT devices from malicious behaviors is critical to broadening the applications of IoT [4]- [7].
IoT attack detection methods can be categorized into signature-based and machine learning-based methods [8]- [10]. The signature-based methods [11]- [14] seek to find the signatures of IoT attacks in the incoming traffic. These methods require a high prior knowledge of known IoT attacks to define the signatures. The machine learning-based methods, on the other hand, attempt to learn the features of normal and malicious data in the training/offline phase. In the predicting/online phase, these models are used to detect attacks in the incoming traffic. Thanks to the capability to automatically and progressively learn useful information/features from collected data, machine-learning based methods can early detect various IoT attacks [3], [9], [15]- [17]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ However, the machine learning-based methods only perform well under an important assumption, i.e., the distributions of the training data and the predicting data are similar [18]. Nevertheless, in many practical applications, this assumption may not be always the case [19], [20]. Especially, in network security, new types of attacks (e.g., zero-day attacks) can be found on a daily basis [16]. As such, the practical IoT data for machine learning models (in the predicting/online phase) is usually very much different from the data used during the training/offline phase. To alleviate the above problem, a large volume of training data with label from multiple IoT devices is often required. However, manually labeling a huge volume of data is very time consuming and expensive [21], [22]. It, thus, limits the practical deployment of machine learning-based methods in detecting IoT attacks for various scenarios.
Given the above, this work proposes a novel deep transfer learning (DTL) approach based on AutoEncoder (AE) to enable further applications of machine learning in IoT attack detection. The proposed model is referred to as Multi-Maximum Mean Discrepancy AE (MMD-AE). MMD-AE can be trained on a dataset including both labeled samples (in the source domain) and unlabeled samples (in the target domain). After training, MMD-AE is used to predict IoT attacks in the incoming traffic in the target domain. Specifically, MMD-AE consists of two AEs: AE 1 and AE 2 . AE 1 in trained with labeled data while AE 2 is trained on the unlabeled data. The whole model, i.e., MMD-AE, is trained to drive the latent representation of AE 2 closely to the latent representation of AE 1 . As a result, the latent representation of AE 2 can be used to classify the unlabeled IoT data in the target domain. The major contributions of this paper are as follows: • We propose a novel DTL model based on AEs, i.e., MMD-AE, that allows to transfer knowledge, i.e., labeled information, from the source domain to the target domain. This model helps to lessen the problem of ''lack label information'' in collected traffic datasets from IoT devices.
• We introduce the Maximum Mean Discrepancy (MMD) metric to minimize the distance between multiple hidden layers of AE 1 and multiple hidden layers of AE 2 . This metric helps to improve the effectiveness of knowledge transferred from the source to the target domain in IoT attack detection systems.
• We experiment our proposed method using nine IoT attack datasets and compare its performance with the canonical deep learning model and the state-of-the-art TL models [18], [31]. The experimental results demonstrate the advantage of our proposed model against the other tested methods. The rest of paper is organized as follows. Section II highlights recent works on IoT attack detection. In Section III, we define a DTL model and briefly describe the AE architecture. The proposed model is then presented in Section IV. Section V discusses the experiment settings and Section VI provides detailed analysis and discussion related to experimental results. Finally, Section VII concludes with future work.

II. RELATED WORK
There are two main directions for cyberattack detection, i.e., signature-based and machine learning-based approaches, e.g., [8]- [10], [21]. The signature-based methods maintain a database of predefined signatures (i.e., patterns) that correspond to IoT known attacks and perform the detection task by comparing these to the incoming data stream [11]- [13], [24]. Zhang and Green II [11] proposed a lightweight and low-complexity algorithm to prevent Distributed Denial of Service (DDoS) attacks in which each IoT working node has a deep packet inspection to find attack signatures. If a sender repeatedly sends requests with the same content, it will be flagged as malicious requests. Dietz et al. [12] proposed a solution to proactively block the spreading of IoT attacks and isolate vulnerable IoT devices. Each IoT device is verified in two steps, i.e., scanning to open ports and services and using predefined list of commonly known credentials to check authentication. After that, a list of predefined rules is used to isolate the vulnerable IoT devices. Nobakht et al. [13] proposed a solution for IoT attack detection using Software Defined Network with the OpenFlow protocol to address malicious behaviours and block intruders from accessing the IoT devices. This method incorporates a database of all known in-home IoT devices along with the corresponding patterns of potential security risks. Then, the detection method simply maps the IoT traffic with the signatures of security risks stored in the database. The advantage of the signature-based methods is providing a low false positive rate attack detection system [24]. However, they require a prior human knowledge about the behaviours of known IoT attacks to design the database of attack signatures. Thus, the accuracy of these methods depends on the quality of the signature databases. Moreover, if the size of databases is increased, the processing time (i.e., search time) can be excessive [24].
The machine learning-based methods first train the detection models from collected data samples in IoT networks. Then, the trained models are used to classify the new incoming IoT data samples into normal or attack data. The popular traditional machine learning algorithms for IoT attack detection are Decision tree (C4.5), Support Vector Machine (SVM), K-Nearest Neighbour, Bayes Classifier, Neural Networks [8], [24]. Recently, the deep learning approach is widely used and achieved high performance in detecting cyberattacks [3], [9], [15]- [17]. Among, deep learning approaches, AE-based models project the original data to a new latent representation space to improve the accuracy in detection tasks [3], [15], [16]. Nevertheless, to train a good machine learning model for detecting IoT attacks, it is usually required to label a huge volume of training data as normal or attack [24]. Moreover, general machine learning models often need to assume that the data distribution of training datasets is similar to the data distribution of predicting datasets. This assumption, however, is usually not practical [19], [20], [25].
Recently, DTL techniques have been used to handle the above issues of machine learning methods where training data from a source domain and test data from a target domain are drawn from different distributions. A DTL model attempts to reduce the distribution divergence between the source domain and the target domain [25]. As a result, the trained knowledge of a learning task (e.g., classification) on the source domain can be used to support the learning task on the similar target domain [19], [25]- [27]. Gou et al. [28] applied an instance-based DTL approach in network intrusion detection that requires label information from the target domain. Zhao et al. [29] proposed the feature-based DTL technique to project the source and the target domain into the latent subspace via linear transformations, i.e., Principal Component Analysis (PCA) for network attack detection. However, PCA is a linear mapping technique that only works well with a simple data feature set [30].
Our proposed DTL model in this paper, i.e., MMD-AE, leverages a non-linear mapping, i.e., AE, to improve the performance of IoT attack detection on the target domain.
The key idea of our proposed DTL (compared with previous AE-based DTL methods [18], [31]) is that the knowledge of features in every encoding layers (instead of the only bottleneck layer in previous works) is transferred to the target domain. This helps to force the latent representation of the target domain similarly to the latent representation of the source domain. The experimental results illustrate the effectiveness of our proposed DTL model on the IoT attack detection task in the target domain.

III. FUNDAMENTAL BACKGROUND
This section presents the fundamental background of our proposed model.

A. TRANSFER LEARNING
Transfer learning (TL) refers to the situation where what has been learned in one learning task is exploited to improve generalization in another learning task [33]. Fig. 1 compares traditional machine learning methods including deep learning and TL models. In traditional machine learning, the datasets and training processes are separated for different learning tasks. Thus, no knowledge is retained/accumulated nor transferred from one model to another. In TL, the knowledge (i.e., features, weights, etc.) from previously trained models in a source domain is used for training newer models in a target domain. Moreover, TL can even handle the problems of having less data or no label information in the target domain.
TL is often used to transfer knowledge learnt from a source domain to a target domain where the target domain is different from the source domain but they are related data distributions. We consider a TL method with an input space X and its label space Y , two domain distributions are the source domain D S and the target domain D T . Two corresponding samples are given, i.e., the source sample and the target sample . n S and n T are the number of samples in the source domain and the target domain, respectively. In this paper, the TL model based on a deep neural network, i.e., deep transfer learning (DTL), is trained on the labeled data in the source domain and the unlabeled data in the target domain. After that, the trained model is used for IoT attack detection in the target domain.

B. AUTOENCODERS
This subsection describes the structure and the training process of an AutoEncoder (AE) that is fundamental for our DTL model. The reason we develop the TL models based on AE is that these models are proved as the most effective deep neural network for IoT attack detection [2], [3], [15], [16]. Additionally, to prove the effectiveness of the proposed model, we will compare our proposed model with the previous DTL techniques that are also based on AE.
An AE is a neural network trained to reconstruct the network's input at its output [34]. This network has two parts, i.e., encoder and decoder as shown in Fig. 2. Let W , W , b, and b denote the weight matrices and the bias vectors of the encoder and the decoder, respectively, and X = x 1 , x 2 , . . . , x n is a training dataset. φ = (W , b) and θ = (W , b ) are parameter sets for training the encoder and the decoder, respectively. Let q φ denote the encoder and z i denote the representation of the input data x i . The encoder maps the input x i to the latent representation z i (as in (1)). The decoder p θ attempts to map the latent representation z i back FIGURE 2. Architecture of an AutoEncoder(AE). VOLUME 8, 2020 into the input space. Therefore, the output of the decoder is formed as the input space, i.e.,x i (as in (2)).
where a f and a g are the activation functions of the encoder and the decoder, respectively. Fig. 2 shows an example of AE with input dimension as n, number of layers as 5, bottleneck layer size as 2.
The AE model is trained by minimizing a loss function so called Reconstruction Error (RE). RE is the difference between the input x i and the outputx i as in (3). This term encourages the decoder to learn to reconstruct the original data. If the decoder's output does not reconstruct the data well, it will incur a large cost in this loss term.
where l x i ,x i measures the difference between the input x i and the outputx i . In the AE model, the mean squared error (MSE) is commonly used [16].

C. MAXIMUM MEAN DISCREPANCY (MMD)
Maximum mean discrepancy (MMD) is a metric used to estimate the discrepancy of two distributions. MMD is more flexible than Kullback-Libler divergence (KL) [31] thanks to its ability to estimate the nonparametric distance [35]. Moreover, MMD does not require to compute the intermediate density of the distributions, thus avoiding the requirement of using a sophisticated optimization [36]. The definition of MMD of two datasets can be formulated as (4) [37].
where n S and n T are the number of samples of the source and target domain, respectively. ξ S and ξ T denote the representation of the source data, i.e., x i S , and the target data, i.e., x i T , respectively. . H represents the 2-norm operation in Reproducing Kernel Hilbert space (RKHS) [37].

IV. PROPOSED TRANSFER LEARNING APPROACH FOR IoT CYBERATTACK DETECTION
This section presents our proposed DTL models for IoT attack detection. We first describe the overview of the system structure. After that, the DTL model is discussed in details. Fig. 3 presents the system structure that uses DTL for IoT attack detection. First, the data collection module gathers data from all IoT devices. The training data consists of both labeled and unlabeled data. The labeled data is collected from some IoTs devices which are dedicated for labeling data. The labeling process is usually executed in two steps [22]: each data sample is extracted from captured packets using Tcptrace tool [38], then the data sample is labeled as a normal sample or an attack sample by manually analyzing the flow using Wireshark software [39]. Usually, the number of labeling IoT devices is much smaller than the number of unlabeling IoT devices. Second, the collected data is passed to the DTL model for training. The training process attempts to transfer the knowledge information learnt from the data with label information to data without label information. This is achieved by minimizing the difference between latent representations of the source data and the target data. After training, the trained DTL model is used in the detection module that can classify incoming traffic from all IoT devices as normal or attack data. The detailed description of the DTL model is presented in the next subsection.

B. TRANSFER LEARNING MODEL
The proposed DTL (i.e., MMD-AE) model includes two AEs (i.e., AE 1 an AE 2 ) that have the same architecture as Fig. 4. The input of AE 1 is the data samples from the source domain (x i S ) while the input of AE 2 is the data samples from the target domain (x i T ). The training process attempts to minimize the MMD-AE loss function. This loss function includes three terms: the reconstruction error ( RE ) term, the supervised ( SE ) term and the Multi-Maximum Mean Discrepancy ( MMD ) term.
We assume that φ S , θ S , φ T , θ T are the parameter sets of encoder and decoder of AE 1 and AE 2 , respectively. The first term, RE including RE S and RE T in Fig. 4, attempts to reconstruct the input layers at the output layers of both AEs. In other words, the RE S and RE T try to reconstruct the input data x S and x T at their output from the latent representations z S and z T , respectively. Thus, this term encourages two AEs to retain the useful information of the original data at the latent representation. Consequently, we can use latent representations for classification tasks after training. Formally, the RE term is calculated as follows: where l function is the MSE function [16], x i S ,x i S , x i T ,x i T are the data samples of input layers and the output layers of the source domain and the target domain, respectively.
The second term SE aims to train a classifier at the latent representation of AE 1 using labeled information in the source domain. In other words, this term attempts to map the value at two neurons at the bottleneck layer of AE 1 , i.e., z S , to their label information y S . This is achieved by using the softmax function [33] to minimize the difference between z S and y S . It should be noted that, the number of neurons in the bottleneck layer must be the same as the number of classes in the source domain. This loss encourages to distinguish the latent representation space from separated class labels. Formally, this loss is defined as follows: where z i S and y i S are the latent representation and labels of the source data sample x i S . y i,j S and z i,j S represent the j − th element of the vector y i S and z i S , respectively. The third term MMD is to transfer the knowledge of the source domain to the target domain. The transferring process is executed by minimizing the MMD distances between every encoding layers of AE 1 and the corresponding encoding layers of AE 2 . This term aims to make the representations of the source data and target data close together. The MMD loss term is described as follows: where K is the number of encoding layers in the AE-based model. ξ k S (x i S ) and ξ k T (x i T ) are the encoding layers k-th of AE 1 and AE 2 , respectively, MMD(, ) is the MMD distance presenting in (4).
Algorithm 1 presents the pseudo-code for training our proposed DTL model. The training samples with labels in the source domain are input to AE 1 while the training samples without labels in the target domain are input to AE 2 . The training process attempts to minimize the loss function in (8)). After training, AE 2 is used to classify the testing samples in the target domain as in Algorithm 2.

Algorithm 1 Training the Proposed DTL Model
INPUT: x S , y S : Training data samples and corresponding labels in the source domain Our key idea in the proposed model, i.e., MMD-AE, compared with the previous DTL model [18], [31] is to transfer the knowledge not only in the bottleneck layer but also in every encoding layer from the source domain, i.e., AE 1 , to the target domain, i.e., AE 2 . In other words, MMD-AE allows to transfer more knowledge from the source domain to the target domain. One possible limitation of MMD-AE is that it may incur the overhead time in the training process since the distance between multiple layers of the encoders in AE 1 and AE 2 is evaluated. However, in the predicting phase, only AE 2 is used to classify incoming samples in the target domain. Therefore, this model does not lead to increasing the predicting time compared to other AE-based models.

V. EXPERIMENTAL SETTING
This section presents the datasets, the performance metrics, the hyper-parameter settings and the sets of the experiments in our paper.

A. DATASETS
To evaluate the performance of MMD-AE we used nine IoT attack detection datasets from Meidan et al. [3]. These datasets were collected from nine commercial IoT devices in their lab. Each IoT dataset includes five or ten DDoS attacks based on types of IoT devices, such as Scanning the network for vulnerable devices (scan), Sending spam data (Junk), UDP flooding (udp), TCP flooding (tcp), and Sending spam data and opening a connection to a specified IP address and port (combo). Each dataset is divided into a training set (70% benign data samples and two random types of attacks) and the testing set (30% benign data samples and the rest of attacks). Thus, many attack types are not included in the training data. Each data sample has 115 attributes extracted from the packet stream. The number of training and testing datasets is presented in Table 1.

B. EVALUATION METRIC
To evaluate the effectiveness of the proposed model, we use a popular performance metric, i.e., Area Under the Curve (AUC) score. The advantage of AUC includes two aspects. First, it is scale-invariant. In other words, the AUC score measures how well predictions are ranked, rather than their absolute values. Second, AUC is classificationthreshold-invariant. It measures the quality of the model's predictions irrespective of what classification threshold is chosen [40].
The AUC score is created by plotting the True Positive Rate (TPR) or Sensitivity 1 against the False Positive Rate (FPR) 2 at various threshold settings. The space under the ROC curve is represented as the AUC score [40]. This measures the average quality of the classification model at different thresholds.

C. HYPER-PARAMETERS SETTING
The same configuration is used for all AE-based models in our experiments. This configuration is based on the AE-based models for detecting network attacks in the literature [2], [3], [15], [16]. As we integrate the SE loss term to MMD-AE, the number of neurons in the bottleneck layer is equal to the number of classes in the IoT dataset, i.e., 2 neurons in this paper. The number of layers including both the encoding layers and the decoding layers is 5. The ADAM algorithm [41] is used for optimizing the models in the training process. The ReLu function is used as an activation function of AE layers except for the last layers of the encoder and decoder where the Sigmoid function is used. For all datasets, we select 10% of training data as the validation sets for early stopping. This technique helps to stop training process automatically. The performance of each model is evaluated on the validation set at the end of each 10 epochs. If the the AUC score is reduced, the training procedure will be stopped.

D. EXPERIMENTAL SETS
We carried out three sets of experiments in this paper. The first set is to investigate how effective our proposed model is at transferring knowledge from the source domain to the target domain. We compare the MMD distances between the bottleneck layer of the source domain and the target domain after training when the transferring process is executed in one, two, and three encoding layers. The smaller MMD distance, the more effective transferring process from the source to the target domain [42].
The second set is the main result of the paper in which we compare the AUC scores of MMD-AE with AE and two recent DTL models [18], [31]. All methods are trained using the training set including the source dataset with label information and the target dataset without label information. After training, the trained models are evaluated using the target dataset. The methods compared in this experiment include the original AE (i.e., AE), and the DTL model using the 1 TPR measures the proportion of actual positive samples that are correctly identified. 2 FPR measures the ratio between the number of negative samples wrongly categorized as positive samples (false positives) and the total number of actual negative samples.
KL metric at the bottleneck layer (i.e., SKL-AE), the DTL method of using the MMD metric at the bottleneck layer (i.e., SMD-AE), and our model (MMD-AE).
The third set is to measure the processing time of the training and the predicting process of the above evaluated methods. The detailed results of three experimental sets are presented in the next section.

VI. RESULTS
This section presents the result of three sets of the experiments in our paper.

A. EFFECTIVENESS OF TRANSFERRING INFORMATION IN MMD-AE
MMD-AE implements multiple transfer between encoding layers of AE 1 and AE 2 to force the latent representation of AE 2 closer to the latent representation of AE 1 . In order to evaluate if MMD-AE achieve its objective we conducted an experiment in which, IoT-1 is selected as the source domain and IoT-2 is the target domain. We measured the MMD distance between the latent representation, i.e., the bottleneck layer, of AE 1 and AE 2 when the transfer information is implemented in one, two and three layers of the encoders. The smaller distance is, the more information is transferred from the source domain (AE 1 ) to the target domain (AE 2 ). The result is presented in Fig. 5. The figure shows that transferring task implemented on more layers results in the smaller MMD distance value.
In other words, more information can be transferred from the source to the target domain when the transferring task is implemented on more encoding layers. This result evidences that our proposed solution, MMD-AE, is more effective than the previous DTL models performing the transferring task only at the bottleneck layer of AE. Table 2 represents the AUC scores of AE, SKL-AE, SMD-AE and MMD-AE when they are trained on the dataset with label information in the columns and the dataset without information in the rows and tested on the dataset in the rows. In this table, the result of MMD-AE is printed in bold face. We can observe that AE is the worst method among the tested methods. Apparently, when an AE is trained on an IoT dataset (the source) and evaluating on other IoT datasets (the target), its performance is not effective. The reason for this ineffective result is that the predicting data in the target domain is far different from the training data in the source domain.

B. PERFORMANCE COMPARISON
Conversely, the results of three DTL models are much better than that of AE. For example, if the source dataset is IoT-1 and the target dataset is IoT-3, the AUC score is improved from 0.600 to 0.745 and 0.764 with SKL-AE and SMD-AE, respectively. These results prove that using DTL helps to improve the accuracy of AEs on detecting IoT attacks on the target domain.
More importantly, our proposed method, i.e., MMD-AE, usually achieves the highest AUC score in almost all IoT datasets. 3 For example, the AUC score is 0.937 compared to 0.600, 0.745, 0.764 of AE, SKL-AE and SMD-AE, respectively, when the source dataset is IoT-1 and the target dataset is IoT-3. The results on the other datasets are also similar to the results on IoT-3. These results demonstrate that implementing the transferring task in multiple layers of MMD-AE helps the model to transfer the label information from the source to the target domain more effectively. Subsequently, MMD-AE often achieves better results compared to AE, SKL-AE and SMD-AE in detecting IoT attacks in the target domain.
C. PROCESSING TIME ANALYSIS Fig. 6 shows the training and the predicting time of the tested model when the source domain is IoT-2 and the target domain is IoT-1. 4 In this figure, the training time is measured in hours and the predicting time is measured in seconds. It can be seen that, the training process of the DTL methods 3 The AUC scores of the proposed model in each scenario is presented by the bold text style. 4 The results on the other datasets are similar to this result. (i.e., SKL-AE, SMD-AE, and MMD-AE) is more time consuming than that of AE. One of the reason is that DTL models need to evaluate the MMD distance between the AE 1 and AE 2 at every iteration while this calculation is not required in AE. Moreover, the training time of MMD-AE is even much higher than those of SKL-AE and SMD-AE since MMD-AE needs to calculate the MMD distance between every encoding layers whereas SKL-AE and SMD-AE only calculate the distance metric in the bottleneck layer. However, it is important to note that the predicting time of all DTL methods is mostly equal to that of AE. The reason is that the testing samples are only fitted to one AE in all tested models. For example, the total of the predicting time of AE, SKL-AE, SMD-AE, and MMD-AE are 1.001, 1.112, 1.110, and 1.108 seconds, respectively, on 778, 810 testing samples of the IoT-1 dataset.

VII. CONCLUSION
In this paper, we have introduced a novel DTL-based approach for IoT network attack detection, namely MMD-AE. This proposed approach aims to address the problem of ''lack of labeled information'' for the training detection model in ubiquitous IoT devices. Specifically, the labeled data and unlabeled data are fitted into two AE models with the same network structure. Moreover, the MMD metric is used to transfer knowledge from the first AE to the second AE. Comparing to the previous DTL models, MMD-AE can operate at all the encoding layers instead of only the bottleneck layer.
We have carried out the extensive experiments to evaluate the strength of our proposed model in many scenarios. The experimental results demonstrate that DTL approaches can enhance the AUC score for IoT attack detection. Furthermore, our proposed DTL model, i.e., MMD-AE, operating transformation at all the level of encoding layers of the AEs helps to improve the effectiveness of the transferring process. Thus, the proposed model is meaningful when having label information in the source domain but no label information in the target domain.
One limitation of the proposed model is that it requires more time to train the model. However, the predicting time of MMD-AE is mostly similar to that of the other AE-based models. In the future, one can extend our current work in several directions. First, we will distribute the training process to the multiple IoT nodes by using the federated learning technique to speed up this process. Second, the current DTL model is developed based on AutoEncoder. In the future, we will attempt to extend this model based on other neural networks such as Deep Adaptation Network (DAN), Adversarial Discriminative Domain Adaptation (ADDA), Maximum Classifier Discrepancy (MCD), and Conditional Domain Adversarial Network (CDAN) [43].
LY VU received the M.S. degree from Inha University, South Korea, in 2014. She is currently pursuing the Ph.D. degree in the major of mathematics theory for information technology with Le Quy Don Technical University, Vietnam. Her research interests include data mining, machine learning, deep learning, and network security.