Regression Based Dynamic Elephant Flow Detection in Airborne Network

As an important part of the spatial information network, airborne network (AN), which connects air platforms with upper satellites and ground devices, has been increasingly important right now. Due to the heavy-tailed distribution of network traffic, elephant flow detection is usually used to catch and control the key part of network traffic with low costs, which is a practical strategy to strengthen network management and improve network performance. In this paper, we consider the problem of dynamic threshold elephant flow detection in AN, and an intelligent method based on regression with pre-classification is proposed to adapt to the limited and dynamically changing bandwidth. The filtering mechanism with waiting-window is used firstly to filter out parts of small flows to decrease the detection cost. Then, the pre-classification is used to divide the range to be predicted and the flow size regression can be carried out in a compressed range, which makes the results more accurate. Finally, the predicted size is compared with the specific detection threshold related to the specific moment, and the elephant flow is identified. Numerical experiments demonstrate that the proposed method has a better adaptability to dynamic threshold and the performance is much better.


I. INTRODUCTION
With the rapid development of information technology, the modes and forms of communication have been further evolved. An intelligent and interconnected spatial information network is coming. As an important part of the spatial information network, airborne network (AN), which connects air platforms with upper satellites and ground devices, has been increasingly important right now. Increasing and various services are being or will be transmitted over AN. In the civil field, AN can provide a convenient air access to the Internet, which can effectively cover the blind areas of ground wired network and further expand the range of communications [1], [2]. In the military field, AN can be used to link the air and ground combat platforms, which can realize a fast information sharing among all the combat platforms and establish an efficient cooperation between different combat platforms across different regions [3], [4]. Due to the heavy-tailed The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues . distribution of network traffic [5]- [8], a small number of elephant flows, such as video streaming in the civil field and surveillance and sensing message in the military field, contribute a significant amount of the traffic volume, which will occupy a large amount of the limited available bandwidth. If those elephant flows fail to be detected and all flows are treated equally without any difference, some elephant flows may converge on the same link, resulting in link congestion and message loss, while some links may be idle or with little flows, resulting in a waste of available bandwidth. Either link congestion or idle will greatly reduce the efficiency of information exchanging, resulting in poor network performance and user experience. Timely and accurate elephant flows detection has become an efficient and practical strategy to optimize network performance [9], [10]. Different from the traditional wired network, the electromagnetic environment in AN is more complex. Noise, interference and attenuation caused by meteorological or artificial factors are ubiquitous all the time. Coupled with the movement of the platforms and the directivity of antennas, communication connections VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ will be more vulnerable, the network topology and available bandwidth will also be dynamically changing. Moreover, owing to the burstiness and dynamics of network traffic, the numbers, types and transmission data volumes of carrying services and their QoS are also dynamically changing in AN. In this context, it is much meaningful to detect the elephant flows quickly, flexibly and dynamically, thus differentiated services can be provided under limited resources.
In traditional wired networks, elephant flow detection is mainly realized by counting [11], [12], sampling [13], [14] or LRU (Least Recently Used) [15], [16] queue management. The main idea is to compare the volume of passed data with the corresponding detection threshold or try to filter small flows and keep up elephant flows by entries update-exit mechanism in the queue. In literature [11], the strategies of counting and LRU are adopted, and an asymptotically optimal algorithm is proposed. In the algorithm, the storage is divided into active and inactive areas. Entries of flows are updated in the active area and small flows are removed from the inactive area. Before the capacity of active area reaches to the specific level, the areas of active and inactive are exchanged. By doing so, an accurate detection is achieved with lower storage. This method dose well in some traditional wired networks, but it may be not suitable for the real-time and dynamic scene, like AN. There are so many data need to be processed and the detection delay is relatively large.
With the introduction of artificial intelligence technology [16]- [19], the traditional strategy, which uses posterior statistics to detect the elephant flow, has changed. The historical traffic data are used for training with machine learning algorithms, and a classifier can be built to mine the mapping relationship between traffic class and early features of traffic flow [20], [21]. With the help of the classifier, elephant flow detection can be achieved in a short time with fewer data. Since elephant flow undetected is more serious than small flow mis-detected, literature [17] sets different costs for different kinds of misclassification, and builds a classification model based on the cost-sensitive decision tree to obtain a more accurate detection result. In literature [18], data mining technologies are applied to elephant flow detection under the framework of SDN, and a detection model with two-phase is suggested. In the first phase, a classifier is built on the switch with some features that could be easily obtained, and only suspected elephant flows are submitted to the controller for further confirmation. In the second phase, another classifier is constructed on the controller with more features extracted from the first few packets of traffic flow, and the suspected elephant flows are identified once again. After these two phases, a more accurate detection can be achieved with lower detection cost. Both the methods in [17], [18] are based on supervised learning with binary classification. Specifically, the training data are labeled with two classes according to the specific fixed threshold in advance. Then, the labeled dataset is used to train a binary classifier, which is applied to test new arrival data or detect new arrival elephant flow. During the process above, the threshold for elephant flow is fixed and unchangeable, which is only available for the scenes where the properties of traffic flows basically remain unchanged and the communication bandwidth keeps stable. Unlike the stable scenes, carrying traffic and available bandwidth in AN are dynamically changing with the missions, phases and communication environments. The fixed threshold is unavailable, and it is urgent to use the dynamic threshold to adapt the changing of bandwidth or other QoS constraints.
To solve the problem of dynamic threshold elephant flow detection in AN, we propose a regression method to adapt to the dynamically changing threshold. Firstly, a regression model is built to describe the relationship between the early features and the total sizes of traffic flows. Then, the size of the new arrival flow can be predicted by the regression model with the early features. By comparing the predicted flow size with the specific threshold, the elephant flow can be identified. In the proposed method, the filtering mechanism with waiting-window is used to eliminate parts of small flows and alleviate the problem of data imbalance in regression. And the strategy of pre-classification is adopted to compress the range of flow sizes to be predicted, and the accurate results can be got more easily.
The rest of this paper is structured as follows. Section II presents the models and assumptions related to our work. Section III describes the details of the proposed method. Extensive numerical experiments are presented in Section IV, and Section V concludes this article.

II. ELEPHANT FLOW DETECTION MODEL WITH DYNAMIC THRESHOLD
In most existing literatures [15], [22]- [25], elephant flow is often defined as the flow in which the number of packets or bytes carrying is greater than a certain value or a certain ratio of total traffic passed through the link. That is: where, f is the flow to be detected, F ele is the set of elephant flows, Fs total (f ) is the total size of flow f , and Tr c is the fixed threshold to determine elephant flows.
The definition with fixed threshold [15], [22]- [24] is used and appropriate for stable wired networks, where the bandwidth and the distribution of carrying traffic are almost unchanged. However, for AN, the stable scene has changed and the fixed threshold is no longer suitable. The threshold should be a dynamically adjustable value, which can be used to adapt to the changing of the available bandwidth and carrying traffic. Besides, the definition above is defined from the perspective of the total size of the flow, including the data that has passed and the data that is coming. But in most cases, the volume of data that is coming is unknown at the detection moment, so is the total size of the flow. And only when the flow ends, the total size can be obtained. Therefore, this definition is much suitable for a post-event network traffic analysis. Sometimes, the volume of data passed before the detection is also used to approximately evaluate the total size of the flow, and the result can be used for a real-time traffic scheduling. But this approximation will not probably work well due to the unknown of coming data, which are exactly the data that needs to be scheduled. Actually, by introducing the artificial intelligence, it is possible to learn from the historical data and make a prediction of flow size before the flow ends. In this case, the passed data are used to extract specific features, which are sent to the intelligent model, learned from the historical data, to discriminate traffic flows or predict the flow size. Then, the volume of the coming data can be obtained from the predicted flow size and the volume of passed data. Obviously, the coming data has great influence on the coming status of network, and should receive much more focus in real-time traffic scheduling of AN. Based on the above analysis, we modify the definition as follow: where, Fs v (f ) is the volume of data used for prediction, and Tr v is the detection threshold that can be dynamically changed.
It is worth noting that, in this paper, small flows serve as a complement to elephant flows, and then small flows can be identified with the same threshold.
Different from (1), the volume of data used for prediction Fs v (f ) and the dynamic threshold Tr v are taken into consideration in (2). If both Fs v (f ) and Tr v are constants, that is, the volume of data used for prediction and the detection threshold of elephant flow are fixed. Then equation (2) can be simplified to (1), where Tr c is substituted by the sum of Fs v (f ) and Tr v . In this case, elephant flows and small flows can be identified only according to the relationship between the flow size Fs total (f ) and the constant threshold of the sum of Fs v (f ) and Tr v . Thus, the historical data flows, with known flow sizes, can be labeled according to the fixed threshold, and a binary labeled training dataset can be obtained. And after establishing the connections between the features and the classes of data flows, a binary classifier can be trained from the labeled training dataset to detect the elephant flow easily.
According to (2), the classes of data flows are still related to the flow size, the volume of data used for prediction and the dynamic threshold. However, for AN, the detection threshold Tr v is a variable, and the volume of data used for prediction Fs v (f ) may also change. Thus, the relationships between the features and the classes of historical flows are inconsistent and changing, and the binary labeled training dataset with the uniform threshold may no longer be applicable. In this case, a static binary classifier is incompetent, and a regression model with dynamic threshold is needed. Different from the binary classification, the regression model does not need the labeling of historical flows. It is constructed only based on the features and the sizes of the historical data flows, which is no difference with the dynamic detection threshold. For a new arrival data flow, the predicted flow size can be obtained from the regression model inputted with the features extracted from the data passed. The dynamic threshold elephant flow detection is achieved by comparing the predicted flow size and the dynamic threshold.
Suppose D n is the training dataset, which contains n sam- is the features of the ith sample in the dataset, x im is the mth dimension feature of x i , and y i is the corresponding flow size. For the fixed detection threshold, as the detection threshold is fixed and knowable, the sample (x i , y i ) can be labeled with the formula: where, y i is the size of the flow and Tr c is the detection threshold.
After the labeling, the binary labeled training dataset (x i , l i ), i = 1, · · · , n can be obtained, where l i is either l ele or l mice . Based on the binary labeled training dataset, a mapping or a binary classifier M C : x → {l ele , l mice } can be obtained with a machine learning strategy. When a new flow f * arrivals, the corresponding features x * are sent to the model M C , and the class of the flow can be directly obtained by While, for the dynamic threshold detection, as the detection threshold is dynamic changing, then the samples cannot be labeled with a threshold. In this case, the sizes of flows are regarded as labels. Regression learning is directly taken on this consistent label dataset, and a regression model or predictor M R : x → y is used to predict the flow size. When a new flow f * arrivals, the corresponding features x * are sent to the model M R , the predicted flow size can be obtained by y * = M R (x * ). By substituting the detection threshold Tr v and the predicted flow size y * into (2), the class of the flow can be determined. The processes of detection with fixed threshold and dynamic threshold are shown in Fig. 1.
As can be seen in Fig. 1, the difference between the fixed threshold detection and the dynamic threshold detection lies in the labeling. In the fixed threshold detection, the classification is adopted, and the labeling is placed before training. While, in the dynamic threshold detection, the regression is adopted, and the labeling is placed after the prediction of flow size. It is obvious that the results of elephant flow detection with dynamic threshold are seriously influenced by the results of regression prediction. Therefore, the key of the dynamic elephant flow detection proposed in this paper is the regression for the flow size.

III. FLOW SIZE REGRESSION WITH PRE-CLASSIFICATION
Researches [26]- [28] show that the flow size of network traffic is distributed in a wide range, and the distribution is usually imbalanced. It is difficult to do the regression learning on the original training dataset. In order to reduce the difficulty and improve the accuracy, here we introduce a strategy of pre-classification for the flow size regression. Before the regression learning, a filtering mechanism with waiting-window is firstly used to filter out parts of small flows, which can compress the prediction range and alleviate the phenomenon of imbalance. Since fewer samples need to be further processed, the detection cost will decrease. Then, the pre-classification method is adopted to divide the range of flow size to be predicted. Classifiers are trained on the dataset labeled with dividing borders and regression predictors are trained on the divided dataset. Thus, the regression of flow size can be carried out in a compressed range and implemented much easier. After the regression, the predicted flow size is compared with the specific detection threshold related to the specific communication condition to detect the elephant flows. The entire process of flow size regression with pre-classification is shown in Fig. 2.
As can be seen in Fig. 2, the entire process of flow size regression with pre-classification consists of the offline training part and the online detecting part. In the training stage, waiting-window filtering mechanism is used to screen out available training samples, and pre-classification is used to pre-train the standby classifiers and regression predictors. While, in the testing stage, waiting-window filtering mechanism is used to filter out and detect parts of small flows, and pre-classification is used to select specific classifiers or regression predictors for elephant flow detection.

A. WAITING-WINDOW FILTERING
Due to the heavy-tailed distribution in network traffic, there are many small flows either in the training dataset or the testing data. Although the volume of carrying data is small, the number of small flows is huge. In the training stage, the huge number of small flows will lead to the sample imbalance of the training dataset, which will seriously affect the preferences of the prediction model to be generated. In the stage of prediction, since the number of packets used for feature extraction is very limited, it is almost impossible to predict the flow size of such small flows. These unpredictable small flows will lead to a lot of unnecessary prediction overhead. Even if we can make an accurate prediction at great cost, it is also not cost-effective to control the remaining data.
In order to reduce the negative impact of such small flows, a filtering mechanism with waiting-window is adopted. According to the property of fewer packets and relatively lower packets frequencies, parts of small flows can be eliminated with a time stack, together with the feature extraction. By setting a waiting-window, small flows that do not meet the packet number required for feature extraction within the specific time are eliminated, and potential elephant flows represented by the features are retained. In the training stage, this filtering mechanism can be used to alleviate the imbalance of the training dataset used for classification or regression, and then a relatively balanced modified training dataset can be obtained. In the detecting stage, it can be used to reduce the number of flows that need to be further processed by classification or regression, thus improving detection efficiency and reducing detection cost. The process of waiting-window filtering is shown in Fig. 3.
As can be seen in Fig. 3, in the waiting-window, packets of flows are collected until a sufficient number is satisfied. If the number of packets is sufficient within the waiting-window, the flow is retained, and the collected packets are sent for feature extraction. Otherwise, the flow is discarded.
In the module of feature extraction, the desired features to represent the flow are extracted based on the collected packets. Usually, the header information and statistical parameters of packets are adopted as features to deal with the encryption technology. These features can be obtained with Network data analysis tools and numerical calculation tools.
After the process of waiting-window filtering, features and the size of the flow are saved as a sample in the training dataset.

B. PRE-CLASSIFICATION
After the preprocessing of the waiting-window filtering, the number of small flows in the modified training dataset is greatly reduced, which alleviates the phenomenon of imbalance. However, as the flow size is distributed in a relatively large range, it is still difficult to make a regression prediction in the entire range. In order to further reduce the regression difficulty and improve the accuracy of prediction, we divide the range of flow size into several small ranges by means of pre-classification, and further compress the range that needs to be predicted.

1) DIVISION OF PREDICTION RANGE
Before dividing the range of flow sizes to be predicted, the concepts about the ranges of the flow sizes are necessary to be clarified. One is the range of flow sizes of the training dataset, covering the minimum and maximum flow sizes of the samples in the training dataset; the other is the range of the dynamic thresholds, used to detect the elephant flows, covering the minimum and maximum flow size to be further processed. Intuitively, the range of dynamic threshold is more desirable than the flow size range of the training dataset. But it is closely related to the changing of bandwidth and carrying traffic, and cannot be known in advance. Therefore, we have to settle for the second best, i.e., selecting the range of flow sizes of the training dataset.
For the range division, two important parameters need to be determined. One parameter is the number of the divided sub-ranges, which corresponds to the number of classes that need to be classified with pre-classification. Since the number of the classes to be classified increases with the number of sub-ranges, the finer the classification granularity the smaller the range. When the classification granularity is fine enough, the results of multiple classifications can even be regarded as the prediction value. But it is worth noting that the finer granularity leads to the higher cost. The other one is the specific division thresholds. These thresholds are determined based on the specific distribution of the training dataset on the premise of giving the sub-range number. The most simple and convenient method to determine the division thresholds is to divide the range of flow size or the sample number of the training dataset by isometric division.
In this paper, the entire range of the flow sizes of training dataset is not the target to be predicted, and we only select a subset of the dataset for the prediction. Here we select the 90th percentile and 99th percentile of the flow sizes in the training dataset as the lower and upper limits that need to be further processed, that is, only 1% to 10% of the data flows in the training dataset will be considered in the prediction model. Usually, flows over the 99th percentile are treated as elephant flows, and flows under the 90th percentile are treated as small flows. From the property of heavy-tailed distribution, it can be seen that even if only one-tenth or even one-hundredth of the flows at the top of the distribution are predicted and further processed, the actual volume of traffic packets is still considerable. After determining the range of flow size to be predicted, the method of equal quantity division is adopted to determine the division thresholds, and thus avoid the class imbalance between different ranges. In order to avoid unnecessary division of ranges caused by too small interval between percentiles, the minimum division interval is set in advance to reduce classes of classification and simplify the complexity of pre-classification.

2) MULTI-CLASS CLASSIFIER AND CLASSIFICATION
After dividing the training dataset into sub-ranges or classes, a multi-class classifier can be obtained by means of training or learning. Usually, the multi-class classifier can be achieved directly from a multi-class training, or obtained by the combination of multiple binary classifiers. Due to the mature skills and methods of feature selection and data preprocessing in binary classifier, a good binary classification is relatively easy to obtain. Therefore, we combine multiple binary classifiers to achieve the multi-class classifier, and compress the prediction range. In order to reduce the complexity of preclassification, we choose the decision tree C4.5, which is simple and fast, as the basic classifier. The performance of this algorithm has been verified in many network traffic classification studies [29]- [31]. In accordance with the aforesaid method of prediction range division, lots of classification training datasets, labeled by the division thresholds, can be obtained, and a number of decision trees can be trained based on the training datasets. By combining decision trees of adjacent division thresholds, a multi-class classifier and prediction range compression can be achieved. VOLUME 8, 2020 For example, suppose that the dynamic detection threshold of a new arrival flow is Tr v . If the classifier of the detection threshold Tr v belongs to the standby classifiers trained in advance, the new arrival flow can be classified and detected directly with the corresponding binary classifier. Otherwise, we can combine the classifiers of division thresholds Tr i and Tr i+1 , where Tr i and Tr i+1 are nearest to Tr v and satisfy Tr i < Tr v < Tr i+1 .
If the new arrival flow is classified as a small flow by the classifier Tr i , that means: then For the detection threshold Tr v , the new arrival flow is still a small flow. If the new arrival data flow is classified as an elephant flow by the classifier Tr i+1 , that means: then For the detection threshold Tr v , the new arrival flow is still an elephant flow. Besides, if the new arrival flow is classified as an elephant flow by the classifier Tr i and classified as a small flow by the classifier Tr i+1 , that means: Although a further regression processing is still needed to detect the elephant flow, the range of flow size to be predicted has been compressed, in other words, the sub-range has been obtained by the pre-classification. The entire process of preclassification is shown in Fig. 4. As can be seen in Fig. 4, in the training stage, the range of flow size is divided into several small ranges. Based on the dividing borders, many binary classification training datasets can be labeled, and binary classifiers are trained on the datasets to discriminate different sub-ranges. At the same time, a lot of regression predictors are trained on the divided datasets within the sub-ranges. In the detecting stage, the dynamic detection threshold of elephant flow is given, and the two classifiers, whose dividing borders are closest to the detection threshold, are selected. The results of classifiers are combined to determine whether a regression predictor is further needed to detect the elephant flow.

C. REGRESSION PREDICTION
With the processing of waiting-window filtering and preclassification, the classes of some flows have been identified. For the ones that are not identified yet, the flow sizes to be predicted have also been compressed by the preclassification. Thus, regression prediction can be carried out in the compressed range for further identification. Different from regression on the entire range of flow size, the complexity of regression on the compressed range has greatly decreased, and a more accurate prediction can be achieved. Currently, there are many algorithms available for regression prediction. Any algorithm with excellent performance can be adopted here.
In this paper, the Gaussian process regression [32], [33] is selected for the flow size prediction. This algorithm can be easily implemented and has strong generalization ability. The prediction of flow size in the Gaussian process regression is treated as a part of Gaussian process, in which any number of outputs is assumed to be consistent with the joint Gaussian distribution. Suppose D n is the training dataset, which contains n samples (x i , y i ), i = 1, · · · , n, where x i is the features of the ith sample in the dataset, and y i is the corresponding flow size. Let X be the matrix composed of all x i , and y be the vector composed of all y i , then the training dataset D n can be expressed as (X, y). For a new arrival flow f * , x * represents the input features, and the output flow size y * satisfies: where, N (·) represents the joint Gaussian distribution, K (·, ·) represents the covariance matrix between the input vectors, and σ 2 n is the noise variance. From (10), the posterior probability density function of the flow size of the new arrival flow can be obtained as follows: where µ = K (x * , X) · (K (X, X) + σ 2 n I) −1 · y (12) Since the probability density function of the Gaussian distribution is symmetric about the mean µ and has the greatest probability at the mean µ, the mean in (12) is generally regarded as an estimate of the flow size y * .

IV. NUMERICAL EXPERIMENTS A. DATASET AND SETTINGS
In order to verify the performance of proposed method for the dynamic threshold elephant flow detection in airborne network, an evaluation dataset from an airborne network is needed. However, currently, there is no public and available airborne network traffic datasets, and traffic generation and emulation for airborne network have not been fully studied. Fortunately, the focus of this paper is on the problem of elephant flow dynamic detection, and the phenomenon of heavy-tailed distribution is universal for both airborne network and ground wired computer network. In this paper, a modified dataset, which is modified from the UNIBS-2009 dataset [34], is adopted for the numerical experiments. The original traces of UNIBS-2009 are collected on the edge  In this paper, 75% of the samples in the dataset are randomly selected as the training dataset, and the remaining 25% are the testing dataset. To achieve the early detection of elephant flow, the first ten packets of the data flow are used for feature extraction. Here we extract some parameters related to the first ten packets as features, including the packet size, inter-arrival time (IAT), and statistics of packet size and IAT. Additionally, the source port, destination port, protocol type and duration of first ten packets are also extracted. Among the desired features, source port, destination port, and protocol type are extracted from the header of the first packet. The size and inter-arrival time (IAT) of the first ten packets are extracted from the header and timestamp of each packet. The statistics of size and IAT of the first ten packets are obtained based on the size and inter-arrival time (IAT) of the first ten packets. The duration of the first ten packets is also extracted from the timestamp of the first ten packets. As the basis for these features, both the header and the timestamp of packets are extracted with the tool Wireshark, and statistics and numerical calculations are conducted with MAT-LAB. The numerical experiments are running on a DELL XPS8930 with an Intel i7-8700 3.2 GHz CPU and 16GB RAM. Weka 3.8.4 and MATLAB 2018b are used as software frameworks, which are running on Windows 10 64-bit OS. The Weka is used for feature selection for pre-classification, and the MATLAB is used for regression prediction with its own Regression Learner tool. The results of the feature selection in adjacent binary classifiers are used for regression. It is assumed that the dynamic threshold obeys the normal distribution and changes every 20 samples. The simulation is repeated 20 times and the average result can be obtained.

B. PERFORMANCE EVALUATION
In this paper, precision, recall and f-score are used to evaluate the performance of dynamic elephant flow detection. They VOLUME 8, 2020 are defined as follows: To evaluate the performance of the proposed method, we compare the performance of the existing binary classifier (C1), multi-class classifier (CM), global regression predictor (R1) and the proposed regression predictor with preclassification (RC). The precision, recall, f-score and detecting time of the four methods are compared respectively. Fig. 6 shows the results of the comparison. Influenced by the dynamic threshold, all the precision, recall and f-score of binary classifier, which is trained on the fixed threshold, are not very good. In contrast, in the multi-class classifier and regression predictor, the negative influence of dynamic threshold can be mitigated to some extent and better results are achieved. Compared with multi-class classifier, regression predictor with pre-classification has better performance. This is because the proposed modified regression predictor makes a further regression on the multi-classification results instead of directly selecting the nearest classification results, thus the misclassification caused by the difference between actual threshold and training threshold in the multi-class classifier is improved. However, the addition of regression step also increases the testing time of the proposed method. Comparing the results of global regression predictor and regression predictor with pre-classification, it can be found that the performance of global regression is much poorer. This is because it is hard to construct a global model on the larger range. If the model is not reasonable or the parameters are not well adjusted, the performance of the global regression will be greatly reduced. Different from the global regression, the proposed regression with pre-classification compresses the range of flow size to be predicted in advance, which results in better performance. In addition, early detection of elephant flows in the pre-classification stage also helps to reduce detection time. It is worth noting that, in this paper, the model and parameters of the global regression are almost same with the pre-classification regression. And under the same parameter settings, the pre-classification regression is much better. Global regression prediction is a complicated problem. If we have enough training data and could adjust the model and parameters regardless of the cost, global regression may get a better result.

2) EFFECTS OF THE NUMBER OF CLASSES IN PRE-CLASSIFICATION
In order to investigate the effects of the number of classes in pre-classification, we compare the multi-class classifier and the proposed regression predictor under different numbers of classes. The results are shown in Fig. 7.
As shown in Fig. 7, with the increase of the number of classes, the performances of multi-class classifier and regression predictor with pre-classification continue to improve, and finally tend to almost the same high level. The simulation results indicate that the increase of the number of classes is conducive to the improvement of the performance of the elephant flow detection under dynamic thresholds. Besides, in terms of the precision, recall and f-score, the regression predictor with pre-classification is always better than the multi-class classifier. With the increase of the number of classes, the gaps between them decrease. This is because that, the classification granularity is continuously refined with the VOLUME 8, 2020 increase of the number of classes and the role of regression has been weakened. Under the circumstances, the better performance can be achieved only with multi-class classifier.

3) INFLUENCE OF DYNAMIC THRESHOLD
In order to verify the robustness of the proposed method, we evaluate the performance of the method under different changing intensities of the detection threshold. As mentioned earlier, the changing of dynamic threshold is assumed to be normally distributed. Therefore, we can change the intensity of the detection threshold by changing the standard deviation of normal distribution. Fig. 8 shows the performance comparison of four different elephant flow detection methods under different standard deviations of dynamic threshold. It can be found that, with the increase of standard deviation of the dynamic threshold, the detection performance of binary classifier (C1) degrades greatly. Different from continuous degradation of binary classifier, the performances of both the regression predictor (R1 and RM) and the multi-class classifier (CM) decrease slightly with the increase of standard deviations of the dynamic threshold. Comparing the regression predictor with pre-classification and the multi-class classifier, the former is even lesser. In contrast, limited by the accuracy of flow size prediction, the performance of the global regression method keeps at a low level.
Through the comparisons above, it can be seen that the proposed method of pre-classification regression can be well applied to the problem of dynamic threshold flow detection, and the performance is relatively good.

V. CONCLUSION
In this paper, we propose a regression method to deal with the dynamic elephant flow detection in AN. Flow size regression is regarded as an intermediate to adapt to the dynamic change of detection thresholds. The elephant flows are identified by comparing the regression result with the specific detection threshold. In order to reduce the detection cost and improve the accuracy of flow size regression, waiting-window filtering mechanism and pre-classification strategy are used to filter out most small flows and compress the range of flow size to be predicted. The simulation results verify the proposed method, and the performance is relatively good.
For future work, it is necessary to make further studies of traffic generation and emulation for AN, as the actual data of AN is hard to collected. In addition, further studies of more fine-grained traffic classification, such as, the combination of regular traffic classification and elephant flow detection, and more general frameworks or models of traffic classification are also necessary to be studied to improve network performance.