A Novel SDN-Based Application-Awareness Mechanism by Using Deep Learning

With the rapid development of the Internet of Things (IoT) and smart cities, more and more types of applications have been emerging. In fact, different applications have different features and different requirements on services. In order to satisfy users’ Quality of Service (QoS) requirements, the application-awareness technique should be leveraged to distinguish different applications for providing the differentiated services. However, the traditional Internet only can obtain the local network view, which belongs to the offline awareness mode and cannot adapt to the dynamical network environment. At the right time, Software-Defined Networking (SDN) has been accepted as a new networking paradigm thanks to its network awareness on the global status information, which can greatly facilitate the online application-awareness. At present, three ways, i.e., port number, depth packet inspection and deep learning can be used for the application-awareness. To the best of our knowledge, the deep learning based application-awareness method is the most cutting-edge technique. In spite of this, the previous related schemes fail to effectively guarantee the correctness and stability. To this end, this paper proposes a Convolutional Neural Network (CNN) based deep learning mechanism to do the application-awareness, including three phases, i.e., traffic collection, data pre-processing and application-awareness. The SDN environment is implemented based on the MiniNet and the simulation experiments are made based on the TensorFlow. The experimental results show that the proposed application-awareness mechanism outperforms three benchmarks on recall ratio, precision ratio, $F$ value and stability.


I. INTRODUCTION
In the traditional Internet, the network only supports the best effort services and fails to provide the differentiated services, that is, all applications have a fair competition on the limited network resources [1]. Furthermore, with the rapid development of the Internet of Things (IoT) [2] and smart cities [3], there are more and more new types of applications emerging, such as online live and online payment. In particular, different applications have different features and different requirements on services [4]. For example, the online video belongs to the real-time application, which has high demand on delay and delay jitter; the email belongs to the non-real-time application, which has high demand on packet loss ratio. Therefore, in order to satisfy users's In terms of the SDN-based application-awareness, to the best of our knowledge, there are three ways for addressing it, i.e., port number, Depth Packet Inspection (DPI) and deep learning [8]. To be specific, the port number based technique usually uses the common port number (0-1023) and the universal port number (1025-49151) to recognize the applications from the transmission layer. However, it is a credible technique no longer. For example, the application program can uses another port number to grasp the control privilege of network operating systems [9]. For DPI, each application has an unique signature for its recognition [10], [11]. However, it requires to resolve all fields of packet in case of using DPI, which consumes lots of CPU resources and has a serious influence on the scalability. With respect to this, some new techniques such as Deterministic Finite Automata (DFA), Field Programmed Gate Array (FPGA) and Graphics Processing Units (GPU) are usually employed to optimize DPI [12]. In spite of this, these techniques always have some limitations such as low efficiency, correctness and stability. Under such context, more and more researchers pay attention to the deep learning based application-awareness method which is the most cutting-edge technique.
The deep learning based application-awareness technique usually depends on the feature statistics of traffic which can be obtained by the header of packet rather than the whole packet, thus its consumed time is smaller than that consumed by DPI [13]. For the deep learning, it involves the supervised learning strategy [14], unsupervised learning strategy [15] and semi-supervised learning strategy [16]. Among them, the unsupervised learning strategy can adapt to these data without the special mark which exits in the most common datasets. In spite of this, the previous deep learning based application-awareness schemes fail to effectively guarantee the correctness and stability. To this end, this paper proposes a Convolutional Neural Network (CNN) [17], [18] based deep learning mechanism which belongs to the unsupervised learning strategy to do the application-awareness. CNN has two distinguished features, i.e., local connection and parameters sharing, and it is a network model with the multiple hidden layers, including an input layer, several convolution layers, several pooling layers, two fully connected layers and an output layer, where each hidden layer consists of many feature patterns and each feature pattern involves many neurons. Based on such architecture mode, CNN can effectively decrease the complexity of network structure and the overhead of updating network parameters.
Given the above consideration, this paper uses the CNN-based Deep learning method to address the SDN-based Application-awareness (CDSA), and the major contributions are summarized as follows.
• The system framework of CDSA is proposed, which is composed of three phases, i.e., traffic collection, data pre-processing and application-awareness. Among them, the first two are the preproduction phases while the last one is the practical-operation phase.
• The traffic information is collected under the SDN environment based on the OpenFlow and the data normalization is realized based on the min-max method.
• The application-awareness is done by the CNN model, including four parts, i.e., activation function, pooling function, classification function and loss function which are realized by Rectified Linear Unit (ReLU), t-distributed Stochastic Neighbor Embedding (t-SNE), Softmax and gradient descent algorithm respectively.
• The proposed CDSA is implemented, and the experimental results show it has more efficient recall ratio, precision ratio, F value (including F1 value) and stability than three benchmarks. The following Section 2 fully reviews and compares the related work. Section 3 presents and analyzes the proposed CDSA in details. The performance evaluation is made in Section 4. Finally, Section 5 concludes this paper.

II. RELATED WORK
In recent years, three ways, i.e., port number, DPI and deep learning have been widely used for the SDN-based application-awareness. To the best of our knowledge, the port number based technique has been laid aside gradually [19]. Given this, we in this section only reviews the related work on the other two techniques for the SDN-based application-awareness (also called application classification or traffic classification).
In [12], DPI was introduced into the control plane of SDN seamlessly by extending the structure of flow table, in which both traffic behaviors and network status were exploited by SDN cooperatively. In particular, to improve the efficiency of application-awareness module, the packet would be sent to SDN controller for identification only when it arrived at the first time. In [20], a DPI offloading mechanism for traffic classification based on a stateful SDN data plane in network switches was proposed, which could reduce the amount of traffic volume inspected by the DPI without reducing classification accuracy.
In [21], a software-defined traffic classification framework was introduced, which dynamically selected the best suitable flow features and most effective machine learning classifiers with the help of a controller and a group of virtual functions. In [22], a simple architecture deployed in an enterprise network that gathered traffic data using the OpenFlow protocol was devised to obtain the high accuracy classification by using the supervised learning method, including random forests and two variations of gradient boosting classifiers i.e., stochastic gradient boosting and extreme gradient boosting. In [23], an efficient sampling and classification approach with the two-phase elephant flow detection was proposed.
In the first phase, the sampling efficiency was improved by estimating the arrival interval of elephant flows and filtering out the redundant samples. In the second phase, the samples were classified with a new supervised classification algorithm based on correlation among data flows. In [24], a support vector machine based internet traffic identification and classification was proposed to identify application traffic, which not only classified traffic type but also could classify the streaming in length and quality. In [25], the deep learning based method was proposed to realize the application-aware network resource allocation for SDN. The controller worked as the brain of the network which acquired all the network status information. The virtualized network functions carried in the deep neural networks model worked at the data plane and was responsible for classifying and reporting the traffic class information. It could greatly improve the network service quality and resource utilization rate. In [26], an application-aware SDN architecture was devised by using semi-supervised traffic classification, where three different control points were designated, i.e., core controller, access controller and data collecting controller. In [27], the application-awareness in SDN was expanded to enable any application to have a customized performance-based metric. It automatically analyzed and translated an arbitrary performance metric into a comprehensible application metric used for making decisions in the network by the deep learning.
There have been a few proposals on the combination of DPI and deep learning. For example, in [28], an application layer classifier combining both DPI and deep learning was designed to do classification in SDN. It tried to take advantage of the two classifiers to achieve a high speed while maintaining the acceptable accuracy rate. In [29], a QoS-aware traffic classification framework for SDN was proposed by jointly exploiting DPI and semi-supervised deep learning so that realizing the accurate traffic classification, while requiring minimal communications between the network controller and the SDN switches. It classified the network traffic into different classes according to the QoS requirements, which provided the crucial information to enable the fine-grained and QoS-aware traffic engineering. Similar to [29], [30] also proposed an SDN flow classification framework using DPI and semi-supervised deep learning. However, different from [29], [30] used the multiple classifiers for training.
With respect to the related researches, they are summarized in Table 1, including the simple descriptions on approach and result.
Although the above reviewed proposals could address the SDN-based application-awareness to some extent, the DPI-based methods (i.e., [12] and [20]) consumed too many CPU resources and had a serious influence on the scalability. In addition, DPI devices were introduced into the network, which increases the complexity of network and there were some privacy concerns. In particular, the current many applications adopted the encryption technique for protecting users' privacy, which caused that it is scarcely possible to exploit DPI to do the SDN-based application-awareness. Furthermore, the deep learning based methods (i.e., [21]- [30]) failed to effectively guarantee the correctness and stability. Different from them, this paper proposes a CNN-based deep learning mechanism to do the application-awareness, which not only effectively decreases the complexity of network structure and the overhead of updating network parameters from the perspective of theory, but also can guarantee the correctness and stability (i.e., recall ratio, precision ratio and F value) from the perspective of experiment.

III. THE PROPOSED CDSA
In this section, we first present the system framework of CDSA. Then, we give the detailed introduction for each major module mentioned in the system framework. VOLUME 8, 2020 Finally, we introduce the K-fold Cross Validation (K-CV) to partition the dataset.

A. SYSTEM FRAMEWORK
As depicted in Figure 1, the proposed CDSA system framework consists of four major modules, i.e., traffic collection, data pre-processing, features modelling and applicationawareness. Among them, the traffic collection module is responsible for collecting traffic information under the SDN environment based on the OpenFlow. The second module is used to do data normalization for these collected traffic features. After the data pre-processing, a large proportion of data is regarded as the training sample used for features modelling based on TensorFlow, which is completed by the third module. For the remaining data, it is regarded as the test sample used for application-awareness based on CNN. In particular, the third module supports the last module; to be specific, the output of application-awareness module depends on the features modelling module. In the next sections, we will introduce traffic collection module, data pre-processing module and application-awareness module in details irrespective of the introduction of features modelling module, this is because the TensorFlow [31] is a classical training way and there is no need to over-introduce it.

B. TRAFFIC COLLECTION
The traditional traffic collection usually adopts the offline pattern because the network fails to automatically obtain the traffic information. Under such context, it requires an online pattern to support the efficient application-awareness without taking much labor and time to gather the statistical information. To our best of knowledge, SDN has the awareness function on the global network status information, that is to say, it can collect the traffic information of all applications without the additional overhead. In fact, for the traffic collection in SDN, it requires to use some flow tables which are installed by the OpenFlow protocol. Given this, under SDN environment, we deploy the OpenFlow switch to help complete the traffic collection. On this basis, the network deployment topology of traffic collection is shown in Figure 2. The upper half of Figure 2 is the SDN network environment while the bottom half of Figure 2 is the traditional Internet, and their communication is based on two switches, i.e., general switch and OpenFlow switch. The corresponding process is simply described as follows. The general switch collects the traffic of all hosts. According to the techniques of port mirroring and redirection, the general switch submits its collected traffic information to the OpenFlow switch (i.e., traffic replication). Finally, the OpenFlow switch transmits the related traffic to the SDN controller to which it belongs. Unlike the above way, the SDN controller is connected with the general switch directly. However, such way without deploying the OpenFlow is considerably difficult to do engineering implementation due to the fact that many application program interfaces need to be written.

C. DATA PRE-PROCEEDING
Each application is composed of many features, such as source port number, destination port number, duration, etc. In particular, these features are not at the same order of magnitude and have no the mutual comparability among them. In order to conveniently facilitate the features modelling and application-awareness, these features should be at the same order of magnitude via the operation of normalization. Suppose that there are n applications and the arbitrarily application is denoted as app i . Suppose that each application has m features, and the arbitrarily feature of app i is denoted as af i,j . Let Fe denote the set with respect to n applications and m features, and we have In this paper, we use the min-max method to do normalization for af i,j . Mathematically, we have equation (2).
Let Fe denote the normalized set regarding n applications and m features, and the values of n × m features are between 0 and 1, i.e., af i,j ∈ (0, 1). 160924 VOLUME 8, 2020

D. CNN-BASED APPLICATION-AWARENESS
CNN consists of many hidden layers with two distinguished advantages, i.e., local connection and parameters sharing. In this section, we use CNN to do the SDN-based applicationawareness. For the used CNN model, it includes four major parts, i.e., activation function, pooling function, classification function and loss function.

1) ACTIVATION FUNCTION
In order to improve the computation efficiency, the ReLU is exploited as the activation function. In particular, ReLU-based activation function can effectively solve the vanishing gradient problem. Let ln i,j denote the arbitrarily neuron which belongs to the i-th layer (denoted as layer i ) and the j-th neuron, and we have where s(ln i,j ) is the status of ln i,j . When the stimulus intensity reaches some level, ln i,j is activated. Furthermore, the output result of ln i,j after using the activation function is 0, which means that ln i,j is not active.

2) POOLING FUNCTION
The traffic features extracted from the convolution layers are the high-dimensional data, which increases the complexity of computation. Thus, it requires to find a method as the pooling function for the dimensionality reduction. In order to avoid the more complex computation, this paper exploits the t-SNE [32] rather than the Maxout function (involving too many parameters) [33] for it. Let af i,j (1) and af i,j (2) denote arbitrary two points in terms of the high-dimensional space, and the correspondingly mapped points to the low-dimensional space are denoted as af i,j (1) and af i,j (2), and we have the conditional probability with respect to af i,j (1) and af i,j (2) in the high-dimensional space, as follows.
Furthermore, we can obtain the joint probability distribution function on af i,j (1) and af i,j (2) in the high-dimensional space, as follows.
By using the law of t-distribution, and we can obtain the joint probability distribution function on af i,j (1) and af i,j (2) in the low-dimensional space, as follows.
where ld 1,2 is the Euclidean distance between af i,j (1) and af i,j (2). Let Cost denote the difference between hjpd 1,2 and ljpd 1,2 , and we have where the involved gradient training is defined as follows.
Let RFe denote the reduced result of dimensionality from Fe , and we have the iteration equation (10). Among them, I , η and α are the number of iterations, learning rate and momentum factor respectively. In particular, the three parameters can guarantee the convergence speed and avoid to fall into the local optimum.

3) CLASSIFICATION FUNCTION
The classifier is used for the result output and it is very significant in terms of the design of CNN. In this paper, we use the Softmax [34] as the function of output layer, because it can effectively handle the multi-classification problem by the modelling for the multinomial distribution. In addition, it has a distinguished advantage, that is, the correct classification has higher probability while the error classification has lower probability. The whole process on using the Softmax function is depicted in Figure 3. Among them, w i,j denote the weight of af i,j , and p i denotes the prediction probability for app i , and it is defined as follows.
Furthermore, we have p i ∈ [0, 1] and the following equation is satisfied.

4) LOSS FUNCTION
What the loss function exits means that it has the absolute difference between the real value and the prediction value via the Softmax function. Meanwhile, the smaller loss function value means the more accurate application-awareness. Given this consideration, it requires to use the gradient descent algorithm [35] to update and optimize these involved parameters in order to minimize the loss function value. Let loss denote the general loss function, and we have where K is the total number of samples, and λ k ∈ {0, 1} is used to mark the status of the k-th sample.

E. K-FOLD CROSS VALIDATION
In order to take maximum advantage of CNN, it is necessary to explore more useful information from the limited dateset (i.e., the number of samples). In this paper, we use the K-CV method [36] for it, which can effectively decrease the generalization error and prevent the over-fitting. To be specific, the dataset is divided into K sets randomly, and each set is regarded as a sample. For the K sets, K − 1 ones are regarded as the training samples and the remaining one is regarded as the test sample. The cross validation is performed for K times until each set has been regarded as the test sample. On this basis, we can obtain the K accuracy degrees on CNN-based application-awareness, and the corresponding average value is considered as the accuracy degree of CNN.

IV. EVALUATION PERFORMANCE A. ENVIRONMENT INSTALL
The proposed CDSA consists of two parts: one is the SDN-based traffic collection and the other one is CNN-based application-awareness. For the first one, the SDN environment is implemented based on the MiniNet; for the second one, the simulation experiments are made based on the Ten-sorFlow. The hardware environment is Intel(R), Core(TM)i7, CPU870 2.93GHz, 4.00RAM. This paper uses the GPU provided by Google to do the training for features modelling.
To be specific, the dataset is submitted to the cloud storage of Google firstly; then, the successfully debugged application programs are inserted into the development tool, i.e., Colaboratory; finally, the TensorFlow is started for training. In addition, the proposed CDSA is compared with three advanced schemes, i.e., [22], [23] and [27] which have been published in International Conference on Network Protocols (ICNP), on Advanced information networking and Applications (ICAA), and on Communication software and Networks (ICCN) respectively. Meanwhile, three classical factors, i.e., recall ratio, precision ratio and F value as well as a novel factor, i.e., stability are considered the metrics of performance evaluation. Furthermore, for the involved parameters, many groups of simulations under different settings are made and the most suitable combination is determined as follows. K = 10, α = 0.5, I = 80 and η = 0.42.

B. DATASET INTRODUCTION
This paper adopts an open Moore dataset [37]- [38] which has 24 hours duration of traffic collection from the University of Cambridge to verify the proposed CDSA. The dataset includes 12 different applications and 370000 items, and each application type owns 248 features. For these applications, their details are shown in Table 2. As can be seen from iTable 2, the number of items regarding GAMES or INTERACTIVE application type (i.e., the last two applications) is very small, thus this paper only considers to use the first ten application types as these samples for verification. In addition, in order to conveniently facilitate the comparability between the real value and the prediction value, this paper uses the one-hot coding method to quantify the application, that is to say, for some application, its location is marked as 1 while the others are marked as 0. On this basis, the marked results for all applications are shown in Table 3.

C. CNN STRUCTURE DETERMINATION
This section determines the structure of CNN. In order to decrease the number of parameters and the training time, the convolution kernel is set as 3 × 3, and the length of input matrix is 16, where the moving step lengths of convolution layer and pooling layer are set as 1 and 2 respectively. Given this, we provide 10 different CNN structures, and the detailed structure information is shown in Table 4.
For the 10 different CNN structures, we compute the corresponding whole accuracies, and the computation results are shown in Table 5, where the whole accuracy is defined as the ratio between the number of applications that have been perceived correctly and the total number of applications.
We can observe that the fifth CNN structure (i.e., S5) can contribute to the best whole accuracy. By combining with Table 4, we can future observe that, with the increasing of convolution kernels and weight parameters, the corresponding whole accuracy becomes to ascend. However, when reaching a certain level, the corresponding whole accuracy begins to descend, which is caused by the increased computation complexity. Furthermore, it indicates that the modest increasing on the number of convolution kernels for CNN model can be conducive to the whole accuracy in terms of the SDN-based application-awareness. Here, we emphasize that the following experimental results on the proposed CDSA are reported according to the S5-based CNN structure.

D. COMPARISON ANALYSIS 1) RECALL RATIO
Four different methods (i.e., CDSA, ICNP, ICAA and ICCN) under different application types for doing the SDN-based application-awareness, the corresponding results on recall ratio are reported in Figure 4.
We find that the proposed CDSA has the highest recall ratio, this is because CDSA uses the CNN model. To be specific, CNN has the multiple hidden layers, which can explore the implying information. Especially, the explored information can fully express the application feature which contributes to the whole accuracy and at the same time guarantees the recall ratio. Besides, we also find that ATTACK and MULTIMEDIA have lower recall ratio than the other application types, because their inclusive number of applications is smaller than that of the others. For these samples with the insufficient data items, it is very difficult to train the unique and correct application feature. In terms of the other three methods, ICNP has the lowest recall ratio because it is only deployed at an enterprise network and it adopts the stochastic gradient boosting and extreme gradient boosting, which cannot guarantee the universal recall ratio. Compared to ICAA, ICCN has higher recall ratio because it can automatically analyze and translate an arbitrary performance metric into a comprehensible application feature.

2) PRECISION RATIO
This section reports the experimental results on precision ratio with respect to CDSA, ICNP, ICAA and ICCN under different application types for doing the SDN-based application-awareness, as shown in Figure 5.
It is obvious that the proposed CDSA can obtain the highest precision ratio, and two related reasons are analyzed as follows. On one hand, during the process of CNN computation, the t-SNE is considered as the pooling function for the dimensionality reduction, greatly avoiding the complex and VOLUME 8, 2020 excessive computation on the redundant application features. On the other hand, the classifier is increased a deviator to adjust the difference between the real value and the prediction value during the process of engineering implementation; under such condition, the adopted gradient descent algorithm can update and optimize these involved parameters easily. In addition, similar to the recall ratio, the two special application types, i.e., ATTACK and MULTIMEDIA also have lower precision ratio than the others. Furthermore, for the other three application-awareness methods, ICCN has the highest precision ratio because it expands the SDN to enable each application to have a customized performance-based metric, which can greatly improve the feature recognition and obtain the relatively high precision ratio. Moreover, different from ICNP, ICAA establishes the arrival interval of elephant flows and filters out the redundant samples, which can guarantee the precision ratio. Thus, ICAA has higher precision ratio than ICNP.

3) F VALUE
In this section, we test F value which is computed by where ReR i and PreR i are the recall ratio and the precision ratio with respect to app i respectively. In addition, β is a parameter which is set as 1, 2, 3 and 4 in this section. The experimental results on F1, F2, F3 and F4 are reported in Figure 6, Figure 7, Figure 8 and Figure 9 respectively. We can see that the corresponding F value becomes smaller and smaller with the increasing of β. In general, the testing on F1 value is the most valuable reference, and the larger F1 value means that the involved method is more efficient. On this basis, we can find that the proposed CDSA has the largest F1 value, which indicates that CDSA is the best mechanism for the SDN-based application-awareness. The related reasons can be found in the above two sections.

4) STABILITY
The stability is a metric which reflects the effectiveness and robustness of an application-awareness method. In this   paper, it is quantified as the fluctuation coefficient [39] among CDSA, ICNP, ICAA and ICCN with respect to recall ratio, precision ratio and F1 value. Furthermore, the small fluctuation coefficient means that the application-awareness mechanism has good performance. As illustrated in Table 6, the stability with respect to CDSA, ICNP, ICAA and ICCN TABLE 6. The stability (the small fluctuation coefficient means that the application-awareness mechanism has good performance). is reported. We can observe that the proposed CDSA has the smallest fluctuation coefficient, which further suggests that it can be accepted as the best mechanism to do the SDN-based application-awareness.

V. CONCLUSION
This paper proposes a CNN-based mechanism to do the SDN-based application-awareness because CNN has local connection and parameters sharing advantages, which includes three major modules, i.e., traffic collection, data pre-processing and application-awareness. Among them, for the first part, the general switch submits its collected traffic information to the OpenFlow switch according to the techniques of port mirroring and redirection. For the second part, the min-max method is exploited to do normalization for facilitating the features modelling. For the last part, it consists of ReLU-based activation function, t-SNE-based pooling function, Softmax-based classification function and loss function. Additionally, in order to take maximum advantage of CNN, the K-CV method is used to explore more useful information from the limited dateset. Based on the real Moore dataset, the experimental results demonstrate that the proposed mechanism has more efficient recall ratio, precision ratio, F value and stability compared to three baselines.
However, as a novel CCN-based application-awareness for SDN, the proposed CDSA also has two limitations that cannot be ignored. On one hand, the t-SNE is regarded as the pooling function for dimensionality reduction, which has the relatively high computation complexity and decreases the application-awareness speed to some extent. On the other hand, the adopted CCN structure model contains too many hidden layers, which needs to consume too much computation time and puts off the convergence speed. In the future, we plan to enhance this paper from the following three aspects. At first, we improve the t-SNE and try to decrease its computation complexity. Secondly, we study the CCN structure model and find a relatively optimal one for supporting the SDN-based application-awareness. At last, we will test more applications (except the current 10 kinds of applications in this paper) and more datasets to further prove the application value of the proposed CDSA. CHENGDONG WU received the B.S. degree in electrical automation from the Shenyang Architectural and Civil Engineering Institute, Shenyang, China, in 1983, the M.S. degree in control theory and its applications from Tsinghua University, Beijing, China, in 1988, and the Ph.D. degree in industrial automation from Northeastern University, Shenyang, in 1994. He is currently a Professor with the Faculty of Robot Science and Engineering, Northeastern University. His research interests include wireless sensor networks, robot intelligent control, and artificial intelligence.