Traffic Prediction of Wireless Cellular Networks Based on Deep Transfer Learning and Cross-Domain Data

Wireless cellular traffic prediction is a critical issue for researchers and practitioners in the 5G/B5G field. However, it is very challenging since the wireless cellular traffic usually show high nonlinearities and complex patterns. Most existing wireless cellular traffic prediction methods, lacking abilities of modeling the dynamic spatial-temporal correlations of wireless cellular traffic data, thus cannot yield satisfactory prediction results. To improve the accuracy of 5G/B5G cellular network traffic prediction, more cross-domain data was considered, a cross-service and regional fusion transfer learning strategy (Fusion-transfer) based on the spatial-temporal cross-domain neural network model (STC-N) was proposed. Multiple cross-domain datasets were integrated. The training accuracy of the target service domain based on the data characteristics of its source service domain according to the similarity between services and the similarity between different regions was improved, so the predictive performance of the model was enhanced. The experimental results show that the prediction accuracy of the traffic prediction model is significantly improved after the integration of multiple cross-domain datasets, the RMSE performance of SMS, Call and Internet service can be improved about 8.39%, 13.76% and 5.7% respectively. In addition, compared with the existing transfer strategy, the RMSE of the three services can be improved about 2.48%~13.19%.


I. INTRODUCTION
As the age of 5G/B5G comes, the number of mobile devices and the Internet of things is showing an exponential growth worldwide, and people's demand for wireless mobile data is growing rapidly. According to Cisco's ''VNI Forecast Highlights Tool'', global service mobile data traffic will increase six-fold from 2017 to 2022, and its annual growth rate of 42% [1]. How to allocate and optimize existing cellular network resources scientifically and rationally, improve resource utilization, and reduce the energy consumption of The associate editor coordinating the review of this manuscript and approving it for publication was Arun Prakash . cellular base stations are the problem that the communications industry needs further thinking and solving. The mobile Internet is closely related to people's lives, and the use of mobile data traffic also reflects people's lives to a certain extent, which contains a lot of information [2]. Accurate prediction of wireless cell traffic is helpful for base station site selection, urban area planning and regional traffic prediction. However, accurate prediction of wireless service traffic is a very challenging problem, which is mainly due to the following three reasons. First, the source of wireless communication network traffic is mobile users, and the mobility of wireless users makes the traffic between multiple areas spatially dependent. In particular, the emergence of new types VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of transportation makes it possible for people to get from one end of the city to the other in a short time. This makes the spatial dependency of wireless service traffic not only local, but also a large-scale global dependency. On the other hand, the wireless traffic is also dependent on the time dimension.
The traffic value at a certain moment is highly correlated with the traffic value at a similar moment (short-term dependence) and a relative moment of a certain day (periodicity). Second, the spatial constraint of wireless service traffic caused by multi-source cross-domain data. The causes that affect wireless service traffic in a certain area are diverse. When making wireless cellular traffic prediction, not only should the hidden regular patterns of wireless service traffic be mined from the perspective of historical data, but also the spatial constraints of other cross-domain and cross-source data on traffic should be considered. For example, factors such as base station data in a certain area, point of interest information, and the level of social activities in the area will all have an impact on changes in traffic. Therefore, how to efficiently integrate these multisource and cross-domain data that do not seem to be directly related to wireless service traffic is a difficult problem to be solved. Furthermore, it is also a difficult problem how to achieve higher prediction accuracy of wireless cellular traffic in the case of considering time and space factors and combining cross-domain data.
The accuracy of wireless cell traffic prediction is restricted by many problems, so it is urgent to find new methods to solve them. In recent years, the era of big data has spawned the in-depth application of artificial intelligence technology. Artificial intelligence can solve the problems of learning changes, problem classification, and predicting the future through interaction with the environment [3]. Big data-driven network intelligence plays an important role in reducing wireless network overhead and improving user experience quality [4]. Deep learning is a feature learning algorithm based on data samples in machine learning [5], which is a significant way to achieve artificial intelligence [6]. With the accumulation of massive wireless cellular traffic data and the maturity of machine learning (deep learning) technologies, wireless cellular network prediction methods based on deep learning have been realized. How to use deep learning methods to accurately predict regional traffic changes has become a hot topic in the communication field. Literature [7] proposed a wireless traffic prediction framework based on deep confidence networks. Experiments show that the framework can predict the long-term dependence of wireless traffic, but this solution only considers the time correlation of wireless traffic. In order to further capture the spatial correlation between different regions and combine the results obtained by the long-term and short-term memory (LSTM) network [8] to capture the temporal correlation of deep learning models, but the model only considers time characteristics. The literature [9] proved that the LSTM model performs better than the traditional statistical ARIMA model. The prediction of cellular data traffic is more accurate, this work provides the basic structure for future wireless cell traffic prediction models. Literature [10] developed a deep model of traffic prediction combined with automatic encoder and long-shortterm memory network LSTM. It took the lead in using deep learning technology to carry out space-time modeling and solve prediction problems. The experimental results prove that space-time correlation modeling improves the importance of traffic prediction accuracy. However, the model can only be trained separately for each component. Literature [11] designed a spatiotemporal modeling module composed of conv-LSTM and conv-Net two neural network architectures to simultaneously capture the dependence of traffic on the time and space dimensions. The spatiotemporal modeling module can train the structure in an end-to-end manner, and large-scale prediction tasks can be carried out. Nonetheless, the literature [10], [11] does not make wireless cellular traffic prediction based on the actual scenario, and only considers the traffic prediction of a single region. The literature [12] further applies the dependence of the time and space dimensions to the real scene of the city scale, and makes the proposed neural network architecture more robust and work better, but this model can only get good prediction results in a certain time or a certain region. Literature [13] adopts LSTM to capture time correlation and adopts the idea of multi-task learning to fully integrate business traffic in different regions, but it does not consider the impact of other cross-domain data. In the field of crowd flow prediction, to consider the impact of cross-domain data more comprehensively in practical application scenarios, Literature [14] used LSTM to model the spatial and temporal dependencies of different scales, and merged multiple cross-domain data (weather, air quality, holiday information, etc.) to further improve the accuracy of crowd flows prediction. Reference [15] added cross-domain data (point-of-interest distribution data) and wireless service traffic data, and selected auxiliary traffic data based on distance as inputs to the traffic prediction model. Among them, the wireless service traffic data is used as the main feature sequence, and the point-of-interest data is generated by discrete embedded learning networks to generate external auxiliary features, and then the two are feature-fused and gradually predicted by the variant S2S model of the RNN network [16]. This research proves the effectiveness of multi-source data in wireless service traffic prediction. However, the cross-domain data considered in [14]- [16] is relatively single, while [17] considers the impact of multiple cross-domain data on wireless cellular traffic prediction, analyzes the correlation between them and three different traffic services, and compares the forecast Base station (BS) data, point of interest distribution (POI) data and social information (Social) data in the area are added to the prediction model as auxiliary traffic prediction. The Conv-LSTM and CNN modules used in the model can well capture the impact of space, time and various external factors, in order to capture regional differences and similarities from space and time domains. The literature [17] only analyzes the single auxiliary factors related to the prediction of cellular traffic. There is no systematic and comprehensive study on the performance of models incorporating different numbers of cross-domain datasets. In the field of airport delay prediction, reference [18] systematically and comprehensively added different numbers of cross-domain datasets (weather factors, flight operation data, etc.) to the airport delay prediction model. The results prove that the accuracy of airport delay prediction is higher than that of adding only one cross-domain dataset after fusing the weather information of the airport and related flight information. This work provides ideas for us to investigate the impact of different cross-domain datasets on wireless cellular traffic prediction.
Transfer learning is an optimization method for a model. It can apply the trained model of the first task to the second related task, which can make the model's progress in modeling the second task faster and improve its prediction performance. Transfer learning is introduced between multiple services at [17], the similarity of different types of cellular services in time and space is fully utilized, and the prediction performance is further improved. Thus, transfer learning can be used between different services to improve the prediction accuracy, but the learned model does not continue to transfer to different regions to achieve fusion transfer and further improve the prediction accuracy on the original basis. On the basis of the above work, we do a further study on transfer learning on wireless cellular traffic prediction.
This paper proposes a deep learning method for regional traffic prediction based on multiple cross-domain big data, and analyzes in detail the gains to the model after adding different numbers of cross-domain big datasets. The results fully illustrate the necessity of adding different numbers of cross-domain datasets to the prediction of cellular traffic. Using the characteristics of transfer learning, a new fusion transfer learning strategy is proposed to further improve the prediction performance on the original model [17].
The main contributions of this article include: Considering all kinds of influencing factors of wireless cell traffic prediction is not only helpful to the prediction results, but also more satisfying to the needs of the reality. Unfortunately, there is not much work on the analysis and processing of large datasets added to cross-domains, in order to fully analyze the impact of the types and number of large datasets on the prediction of traffic, this paper uses the spatio-temporal cross-domain neural network (STC-N) As a benchmark model, different types of cross-domain big datasets are taken as research objects, and the impact of different amounts of cross-domain big data on the accuracy of traffic prediction is discussed separately. In addition, the reasons for the different influences are analyzed. The experimental results verify that for one more cross-domain dataset considered, the prediction performance will be enhanced to varying degrees.
Since the traffic changes between different services and between different regions have a high similarity, the idea of transfer learning can be introduced into the regional wireless traffic prediction. We fuse transfer learning between different services and successive transfer learning between classes to get a new transfer learning strategy, namely: cross-service and regional fusion transfer strategy (Fusion-transfer for short), and apply the new transfer learning strategy to training in STC-N model. Finally, the trained model is used in the wireless service traffic dataset. The learning of the new datasets and tasks do not start from scratch, but has a certain a priori basis. The source domain and the target domain can share the model parameters, namely the model trained in the source domain through a large amount of data is applied to the target domain for prediction. The experimental results show that the newly added Fusion-transfer has better performance than the No-transfer and Part-transfer models, which fully shows that the new fusion migration strategy is better than other migration strategies. In this way, we apply a trained model of a service in one region to the other regions of that service and the other regions of other services. This transfer learning method greatly reduces the training data, computing power and generalization ability required to build a deep learning model.
The rest of this article is structured as follows. The second part introduces the dataset and preprocessing. The third part introduces the architecture of the neural network model and analyzes the network layer by layer. The fourth part shows the experimental results of the training model and evaluates it in detail. The last part is the summary of this paper.

II. DATASET A. DATASET PREPROCESSING
The dataset used in this article is derived from the detailed wireless cellular traffic data of Milan area [19]. As shown in Fig.1, the data preprocessing in this paper goes through the following three steps.
Step 1: Data cleaning. The time span of wireless cellular traffic data in Milan is from 0:00 on November 1, 2013 to 23:00 on January 1, 2014. The wireless cellular traffic data of Sms, Call and Internet are extracted from the experiment. For the missing traffic data of a certain area in a certain period, the average traffic value of the surrounding area or period will be used to fill in.
Step 2: Data screening. Since the recording interval of the original data is 10 minutes, and most of the recorded data values are 0, this results in sparse data values. Data are divided by hours and normalized by min-max to speed up the training process.
Step 3: Data alignment. To facilitate the formulation of its data below, the cleaned wireless cellular traffic data, crossdomain data and Milan city are divided into 100 ×100 grid area for one-to-one correspondence. VOLUME 8, 2020

B. WIRELESS CELLULAR TRAFFIC DATASETS
The wireless traffic in a certain region can be expressed as D k,t . It is a spatiotemporal series composed of a large number of data points, as follows.
Similarly, Eq. (1) applies to Sms service and Call service, d represents the amount of data in the wireless cellular traffic at time t in the coordinates (X , Y ) for k service, where k ∈ {Sms, Call, Internet}. In addition, wireless cellular traffic in different regions also has a similar periodicity. For example, in Fig.2(a), the SMS traffic changed tendency of three different areas are similar. 2) Differences in regional data. The data volume of wireless cellular traffic in different areas is quite different. for example, in fig.2(a), the Bocconi University area is a suburb of Milan, so the wireless cellular data volume is the least compared to other areas. And Navigli is Milan's nightlife area, the amount of wireless cellular data in this area is not much different within a week, and the amount of wireless cellular data is also not high. Milan's Duomo is the central city of Milan, with the highest amount of wireless cellular traffic data, and the amount of wireless cellular data on weekdays in the area is very different from weekends. 3) Differences in service data. The data volume of wireless cellular traffic between different services is also different. For instance, the duration of Internet traffic peaks is shorter than the other two services.

C. TIME STAMP
To make full use of the features of the timestamp (D meta ) for auxiliary prediction, four features are extracted from the timestamp, and the four features are processed into a vector m, which is then processed by the full connection layer into a tensor T consistent in size with the wireless cellular traffic dataset and cross-domain dataset. The four extracted features are shown in Table 1.

D. CROSS-DOMAIN DATASETS
The cross-domain data (D cross ) set mainly contains three types of social information (Social), base stations (BS), and points of interest (POI). The dataset of Social information is obtained through Dandelion API [20], the dataset about BS information is obtained from OpenCellID [21], and the dataset about POI information can be crawled using Google Places API [22]. Since these three data types have small changes on the time axis, we treat them as static datasets, and then map the data to specific areas based on coordinate information. Referring to Eq.1, Eq.2 can be obtained as follows: where d is the amount of c cross-domain data in the coordinates (X , Y ), c ∈ {BS, POI, Social}.  In order to analyze the correlation between different service traffic and cross-domain datasets, the Pearson correlation coefficients ρ are calculated as follows: where cov(·) denotes the covariance operator and σ is the standard deviation, d (X ,Y ) is wireless cellular traffic in (X , Y ) coordinates, d (X ,Y ) is wireless cellular traffic in (X , Y ) coordinates.
To further quantify the spatial correlations between crossdomain datasets and different cellular traffic, the Pearson correlation coefficients are calculated and shown in Fig.3. We conclude that: (1) Relevance of data. The correlation between Sms, Call and Internet is high, which shows that transfer learning strategy can be used to train the model between different service traffic data. (2) Similarity of data. The similarity between crossdomain data and wireless service traffic is also relatively high. Therefore, it can be regarded as a constraint on the spatial characteristics of wireless service traffic to make a more accurate prediction of service traffic. (3) The degree of correlation between the data. The correlation between POI, BS and wireless cellular traffic is greater than Social, which shows that the impact of POI and BS on the accurate prediction of service traffic is relatively larger than Social. Finally, we will get a multi-dimensional tensor T, which is composed of matrices D t , D meta and D cross . The data form is shown in Fig.3. As shown in the black square in Fig.4, takes Sms as an example, it represents the value of the cross-domain data of the Sms service, time stamp information and wireless cellular traffic value at (x, y) coordinates.  (1) Spatio-temporal correlation modeling Input: (D t−3 , D t−2 , D t−1 ). three historical moments directly adjacent to the predicting period. It imports a two-layer conv-LSTM network to model spatiotemporal dependencies and data sequence information. (2) Timestamp modeling Input: D meta . It imports a two-layer fully connected neural network for embedded learning. (3) Cross-domain data modeling Input: D cross . It is a collection of one to three crossdomain data. It imports a two-layer convolutional neural network to process these data. (4) Feature fusion layer Input: A new tensor, which is composed of the above four feature outputs stitched according to the specified dimension. It imports a densely connected convolutional network (DenseNet), which contains a total of L layers, and each layer implements a composite function transformation. This composite function is the same as the operation in cross-domain data feature learning, which includes batch regularization (BN), activation function (Relu) and convolution operation (Conv). The conv-LSTM layer realizes the protection and control of information through three self-parameterized controlling ''gates'', i.e., input gate i g , forget gate f g and output gate o g . i g selectively stores the required data information, f g also selectively ''forgets'' the redundant information, and the final hidden state is controlled by the o g , and determines the important data information required for output. The key operations of conv-LSTM are as follows: where σ (·) denotes the activation function, * denotes the convolution operation, is the Hadamard product, W (·) and b (·) are the weights and biases to be trained, respectively. And tanh(·) refers to the hyperbolic tangent function. Note that the i τ g , f τ g , c τ , o τ g , H τ in the conv-LSTM unit are all threedimensional tensors. The output of conv-LSTM network is denoted as o t ∈ R H ×X ×Y , where H is the number of feature maps.
The preliminary characteristics of D meta are denoted as O meta , and the processing is as follows: where w l meta , b l meta are learnable parameters, o meta ∈ R H ×X ×Y .
The preliminary characteristics of D cross are denoted as O cross , and the processing is as follows: where ⊕ is the concatenation operation, W cross is the weights that will be trained. f (·) represents a composite function that that contains the Batch Normalization (BN), rectified linear units (Relu) and convolution operation (Conv). The objective function of STC-N is to minimize the Frobenius norm of error matrix between prediction and truth over all areas. It can be expressed as where θ is the parameter of STC-N, which can be obtained by training through optimization techniques,D t is the predicted value of the wireless cell traffic, D t is the true value of the wireless cell traffic. The STC-N is optimized by stochastic gradient optimization technique, and trained for 300 epochs with batch size 32. An adaptive learning rate is used to speed up the training. The initial learning rate is set to 0.01, and then the learning rate is attenuated by 10 times at 150 epochs and 225 epochs. In the convolutional layer, the number of feature maps is 16, the size of the convolution kernel is 3 × 3, and Relu is used as the activation function. The feature map of the output layer is 1, and the size of the convolution kernel is 1 × 1. During training, the first seven weeks of the entire dataset are used as the training set, and the last week's data is used as the test set. Both the training set and the test dataset are constructed using a sliding window method with a window size of P = 3.

B. FUSION TRANSFER STRATEGY
Traditional machine learning trains different datasets separately, and different model parameters are learned. Multiple learning systems of models are relatively independent. And transfer learning is to learn knowledge on a certain  dataset, and transfer this knowledge to the learning of other datasets and tasks, so that the learning of the new datasets and tasks do not start from scratch, but has a certain a priori basis. As shown in Fig.6, assuming that the source domain (dataset 1) and the target domain (dataset 2) have some common features, and the source domain data and the target domain data in this space have the same data distribution and high similarity, the source domain and the target domain can share the model parameters, namely The model trained in the source domain through a large amount of data is applied to the target domain for prediction. Therefore, we first train a basic network model on the basic dataset and task, and then readjust or transfer the learned features to the second target network model to train another target dataset and tasks. Fig. 2 and Fig. 3 prove the similarity between different services on multiple datasets. The wireless traffic in different regions also has certain similarities, such as periodicity. However, the wireless service traffic data in different regions have both differences and similarities. To capture the differences in different regions, K-means clustering method is used to group similar regions together. The results of clustering, all regions are divided into three cluster [18]. On the basis of clustering results, the traffic pattern knowledge learned from one cluster could be transferred to other clusters. According to the similarity between services and the similarity of clustered regions, a cross-service and regional fusion transfer (Fusion-transfer) learning strategy was proposed, which fusion different kinds of cellular traffic transfer learning strategy and successive inter-cluster transfer learning strategy. Fusion-transfer strategy is illustrated in Fig. 7.
Take Sms service and Call service as examples, a model M S1 can be obtained after training STC-N using Call service data from the source domain S. Then M S1 can be transferred to cluster 1 of Sms service target domain by means of parameter initialization. The parameters of STC-N are continually trained using data from target domain and the model M S2 can be learned. Repeat the above steps until the parameters are transferred to cluster 3 of the Sms service target domain. Then, the model M T can be learned and the prediction is carried out on the test data of the target domain. Finally, the results can be obtained. Through this fusiontransfer strategy, it can not only make full use of the similarity of wireless service traffic of different services, but also make use of the similarity of wireless service traffic of different regions, so the prediction performance of the model can be improved. Similarly, due to the similarity between different services, this strategy is also applicable to Call and Internet services.

IV. EXPERIMENT
Four metrics are adopted in this work for the sake of a comprehensive evaluation of different prediction algorithms. They are Root Mean Square Error (RMSE), Mean Absolute Error (MAE), R-squared (R 2 ) and Explain the variance score (Explained variance score , hereinafter Called Variance).
where T is the time point, X and Y are the coordinate information of the time point,d RMSE is used to measure the deviation between the predicted value of the model and the true value. MAE could better reflect the actual situation of the model prediction value error. The value range of R 2 is [0, 1], the closer its value is to 1, the better the model's effect will be; otherwise, the worse the model's effect will be. Variance means a larger value implies a better fitting to the data, thus, a better performance. For RMSE and MAE, the smaller the value, the better the performance. On the contrary, for R 2 and Variance metric, a larger value implies a better fitting to the data, thus, a better performance.

A. PERFORMANCE OF ADDING MULTIPLE CROSS-DOMAIN DATASETS
To more intuitively prove the improvement effect of multiple cross-domain datasets on the prediction performance of wireless service traffic, we conducted a comparative experiment based on the STC-N model with No-cross, One-cross, Two-cross, and Three-cross as independent variables. The results of evaluation metrics on three different kinds of cellular traffic are plotted in Fig.7. No-cross in the x-axis of Fig.6 represents that we do not use the cross-domain datasets when training the model. One-cross means that model only uses one cross-domain dataset (D social ). Two-cross indicates that two cross-domain datasets are considered when training prediction model (D social ⊕ D BS ). Three-cross denotes that model adds the three cross-domain datasets (D social ⊕D BS ⊕D POI ). The y-axis represents the performance index. Fig.8, Fig.9 and Fig.10 has shown the benefits of introducing different numbers of cross-domain datasets into three cellular traffic prediction. From these figures we can draw the following two conclusions:

B. COMPARISON OF DIFFERENT TRANSFER LEARNING STRATEGIES
To illustrate the performance of the fusion-transfer learning strategy, we consider three types of services (Sms, Call, Internet) respectively, and compare the experiments using the fusion migration strategy (Fusion-transfer) and the model without the migration strategy (No-transfer). The valid loss value obtained by model training is shown in Fig. 11.  As described in Fig.11, the red dotted line indicates that the model without adding the transfer strategy (No-transfer) obtained the valid loss value, the black dotted line represents the value of valid loss obtained by the model after adding fusion-transfer strategy (Fusion-transfer), and the x-axis represents epochs of model training. y-axis represents valid loss of model training. From the overall trend of the curve in Figure 11, the red and black lines both drop rapidly first and then stabilize. The red lines stabilize after about 180 epochs, the black lines stabilize after about 150 epochs. In the three figures, the black lines always drop faster than the red lines, and the stable value of valid loss is lower. This show that on Sms, Call and Internet services, the model based on the fusion-transfer learning strategy is trained faster, and the fusion-transfer learning strategy has better prediction performance than the model without transfer learning strategy. This is mainly because transfer learning does not need to train the model from scratch, and can get shared parameters from other models. To comprehensively prove the gain of fusion-Transfer strategy on the predicted performance of the model, we conducted comparative experiments on models using fusion-Transfer strategy, no-transfer strategy and parttransfer strategy. The experiment will show the gain of the Fusion-transfer strategy to the training model (STC-N) From the four performance indicators of RMSE, MAE, R 2 and Variance. The performance evaluation index data measured in the experiment is shown in Table 2.
To more intuitively demonstrate the superiority of the fusion-transfer strategy (Fusion-transfer), the experimental results are shown in Fig.12. After using the Fusion-transfer strategy in Fig.12, compared with the model without the transfer strategy (No-transfer), the RMSE of the Sms service is reduced by about 7.85, the MAE is reduced by about 4.42, and the R 2 is increased by about 0.04, performance improvements for RMSE, MAE, and R^2 were 12.95%, 13.49% and 5.14%, respectively. The RMSE of the Call service decreased by about 5.14, MAE decreased by about 3.29, R 2 is increased by about 0.03, performance improvements for RMSE, MAE, and R^2 were 13.19%, 17.11% and 2.62%, respectively. The RMSE and MAE of Internet services decreased by approximately 4.38 and 4.29, respectively, and R 2 increased by approximately 0.003, performance improvements for RMSE, MAE, and R^2 were 2.48%, 4.12% and 0.39%, respectively. Compared with the three different strategies, it shows that for the same training model (STC-N), the performance of fusiontransfer strategy is the best, the performance of part-transfer strategy is the second, and the performance of no-transfer strategy is the worst. The experimental results further prove that the fusion-Transfer strategy can further improve the performance of all aspects of STC-N model, and it is also applicable to three kinds of services (Sms, Call and Internet). The performance of STC-N model training is better than that of STC-N model training without transfer strategy and part-transfer strategy.

V. CONCLUSION AND FUTURE WORK
In this paper, the influences of the types and quantities of cross-domain big datasets on the traffic prediction accuracy VOLUME 8, 2020 are fully analyzed. Taking the spatio-temporal cross-domain neural network (STC-N) as the benchmark model, different types of cross-domain big datasets are taken as the research objects to discuss the influences of different numbers of cross-domain big data on the traffic prediction accuracy. The experimental results verify that the prediction performance of the neural network will be enhanced to different degrees when considering one more cross-domain dataset mentioned.
In addition, a fusion-transfer strategy is proposed which fuses between different traffic transfer learning strategy and inter-cluster transfer learning strategy. The high similarity of traffic changes between different services and between different regions is fully exploited. This transfer learning strategy is to learn knowledge on source domain dataset, and transfer this knowledge to the learning of target domain datasets and tasks, so that the learning of the new datasets and tasks do not start from scratch, but has a certain a priori basis. The source domain and the target domain can share the model parameters, namely the model trained in the source domain through a large amount of data is applied to the target domain for prediction. Therefore, we first train a basic network model on the basic dataset and task, and then readjust or transfer the learned features to the second target network model to train another target dataset and tasks. This transfer learning method greatly reduces the training data, computing power, and generalization ability required to build a deep learning model. Experimental results show that the performance of the Fusion-transfer strategy is better than that of the model without no-transfer strategy and that using part-transfer strategy, which fully indicates that the new Fusion-transfer strategy is better than other transfer strategies.
Actually, the structure of the model is complex and the training time of the model is longer. In the future, we will take using a simpler and more efficient model architecture to improve the training accuracy of the model while reducing the training time. CHAO LI received the B.S. and M.S. degrees from the Shandong University of Science and Technology and the Ph.D. degree from the Chinese Academy of Sciences, in 2014. From 2014 to 2015, he was a Visiting Scholar with The Hong Kong University of Science and Technology. He is currently a Lecturer with the Shandong University of Science and Technology. His research interests include social media, natural language processing, data mining, and network embedding learning. He is a member of CCF.
GE SONG received the master's degree in computer science and technology from the Shandong University of Science and Technology, Shandong, China, in 2004. He is currently teaching with the College of Electronic and Information Engineering. His current research interests include the Internet of Things, wireless sensor networks, robot path planning, and deep learning. VOLUME 8, 2020