Real-Time Network Intrusion Prevention System Based on Hybrid Machine Learning

Recent advancements in network technology and associated services have led to a rapid increase in the amount of data traffic. However, the detrimental effects caused by cyber-attacks have also significantly increased. Network attacks are evolving in various forms. Two primary approaches exist for addressing such threats: signature-based detection and anomaly detection. Although the aforementioned approaches can be effective, they also have certain drawbacks. Signature-based detection is vulnerable to variant attacks, while anomaly detection cannot be used for real-time data traffic. For resolving such issues, this paper proposes a two-level classifier that can simultaneously achieve high performance and real-time classification. It employs level 1 and 2 classifiers internally. The level 1 classifier initially performs real-time detection with moderate accuracy for incoming data traffic. If the data cannot be classified with high probability by the classifier, the classification is delayed until the traffic flow terminates. The level 2 classifier then collects the statistical features of the traffic flow for performing precise classification. Compared to existing techniques, the proposed two-level classification method can achieve superior performance in terms of accuracy and detection time.


I. INTRODUCTION
Recently, network technology and related equipment have been evolving at a fast rate, and accordingly, the performance and total traffic volume of networks are rapidly increasing. The damage caused by cyber-attacks, however, is also increasing, not merely because the number of cyber-attacks is increasing, but because they have become sophisticated and their variants have been created. Nowadays, it is almost impossible to defend a network completely from malicious hackers [1]. A zero-day attack, which exploits the newly discovered vulnerability of network systems before its solution developed, is one of the most serious cyber-crimes. The network inevitably becomes defenseless until a solution is developed and applied to the system [2]. According to previous work, a zero-day attack lasts for 312 days on average and can last up to 30 months. Moreover, the total number The associate editor coordinating the review of this manuscript and approving it for publication was Yassine Maleh . of zero-day attacks can increase up to five times after a new vulnerability has been disclosed [3].
Today, network security systems supporting real-time attack prevention mainly use signature-based methods to detect certain patterns among incoming packets in a way similar to virus scanners [4]- [6]. Real-time detection requires in-line processing algorithms which can detect attacks at the line rate. Although we can increase the detection speed using distributed systems, they can suffer from expensive synchronization overhead, so it is not fundamental solution. Thus, fast detection algorithm is inevitable to implement real-time detection systems.
The signature-based approach has a high accuracy and speed for detecting prior known attacks. However, it is almost impossible to detect unknown attacks such as zero-day attacks, and it is vulnerable to variant attacks that bypass signature-based detection using code obfuscation or encryption. It also has a high burden of keeping the signature database up to date.
In contrast to the signature-based approach, the anomaly detection approach observes the statistical characteristics of each flow of network traffic and detects it as an attack if it differs from normal behavior by exceeding the normal range of statistics. Because it needs not a signature database, it does not have any overhead for maintaining a database, and it can be very robust against zero-day or variant attacks. Various machine learning techniques to classify attacks from normal flows are widely adopted to modern network intrusion detection systems (NIDSs) based on anomaly detection.
However, it is very difficult to find fast and accurate machine learning algorithms to process incoming network traffics in real time. In addition, anomaly detection algorithms are designed to classify traffic based on the characteristics of each complete flow, for example, the number of packets transmitted and received during the flow, the number of packets lost, etc., rather than classifying based on every packet. Thus, it only starts determining whether each flow is malicious when the flow has been terminated. As a result, per-flow classification cannot detect attacks until the flow finishes and cannot defend the current network and users under attack.
In this paper, we propose a novel approach to resolve the problems of existing network intrusion detection and defense technologies. It is based on machine learning techniques, but it can be differentiated from other machine learning-based approaches in the following aspects.

A. REAL-TIME ATTACK DETECTION
There are a few previous works that use non-machinelearning-based methods to detect network attacks in real-time. However, they just provide several Gbps throughput, much lower than 100 Gbps. The proposed approach can support real-time attack detection at up to 100 Gbps traffic rates even though it uses a machine learning technique. As mentioned earlier, machine learning became popular in NIDS due to its ability to detect unknown attacks. However, machine learning-based algorithms are too slow to handle many Gbps of traffic, therefore they cannot be deployed in high throughput networks. To solve this problem, we suggest two-levels of classifiers: one for per-packet, the other for per-flow detection.

B. HIGH ATTACK DETECTION ACCURACY
Because network attacks have widely varying types and behaviors, it cannot be effective to adopt a single approach to detect all kinds of network attacks. Therefore, the proposed approach uses a two-level attack detection. We can detect some attacks by inspecting just a few packets, but for others such as distributed denial of service (DDoS), we need to observe the network-wide behavior of flows to detect the attack. Hence, our approach not only analyzes individual packets but also uses in-flow and inter-flow statistics flows to increase the accuracy of detecting attacks. We use two different classifiers simultaneously, achieving very high detection accuracy compared to existing work. This paper is composed of five parts. In Section II, we briefly introduce related work and results. In Section III, we propose a new intrusion prevention system (IPS) with real-time attack detection and high accuracy. In Section IV, we analyze the performance of the proposed approach and show the results of comparing some selected recent works. Finally, we conclude our paper in Section V.

II. EXISTING WORK
Machine learning-based network intrusion detection systems (NIDSs) have been continuously developed. The early NIDS employed a simple structure with a single machine learning algorithm. However, there was a limit to detecting various network attacks accurately with a single machine learning algorithm, and therefore, NIDS research using a combination of various machine learning algorithms has been actively underway.
The machine-learning-based IDS is divided into a packetbased method that uses packet data and a session-based method that uses session data to train the model, based on the type of data used. The packet-based method converts raw packet data into machine learning features and then performs learning and classification using conventional machine learning algorithms such as convolutional neural networks (CNN). It does not need to extract features before training, so it simplifies training procedures without manual intervention.
The session-based method creates data to generate features of the session inside the IDS when processing packets of a new session, and it updates these session data whenever it processes the packets of the session. After processing the last packet of the session, it generates features for the session by processing the final updated data, and then, these features are used in machine learning. The biggest advantage of this method is that instead of using a large number of packets of a session, it uses a small number of statistical values for the session; thus, the number of features used for training and classifying is very small. As this is very effective for fast training and fast classification, it can handle large network traffics.
Because machine learning-based NIDS can be classified into single and multiple classifier-based NIDS according to the number of algorithms used, machine learning-based NIDS can be classified into four types, as seen in Fig. 1. We examine the main research on each method.

A. PACKET-BASED SINGLE-MACHINE LEARNING ALGORITHM METHOD
This method learns and classifies packet data through a single machine learning algorithm. It has the advantage of detecting malicious code in packet payload data. VOLUME 9, 2021 However, because individual packets are analyzed independently to determine whether an attack has occurred, the conventional signature-based NIDS belongs to this method [4]. Therefore, this method has the advantage of being able to detect an attack when it occurs in real-time. However, it is vulnerable to zero-day attacks, variant attacks, and bypass using packet fragmentation to avoid detection. Recently, an attack detection method that collects multiple packets of a session rather than a single packet has been proposed to overcome these weaknesses.

B. PACKET-BASED MULTIPLE-MACHINE LEARNING ALGORITHM METHOD
This method detects attacks using multiple machine learning algorithms rather than a single algorithm for packet-based data. Hence, it can perform training and classification more effectively than a single algorithm. However, as with the packet-based single machine learning algorithm method, a large number of features (over thousands) should be generated from packets for the training machine learning algorithm. Therefore, it has the disadvantage of being difficult to use in large networks because of the very slow training and classification speed [5].

C. SESSION-BASED SINGLE-MACHINE LEARNING ALGORITHM METHOD
Instead of using packets, this method extracts features for a session and applies them to a single algorithm for training and classification [24]- [29]. It is one of the most common studies in the early machine learning-based literature. As it does not use packet data and generates a fixed number of features regardless of the number of packets or packet size belonging to a session, the memory usage is very low. In particular, as it processes a small number of features (e.g., less than a hundred) using a single algorithm, the training and classification speed can be very fast. Thus, it is applicable to large networks with heavy traffic. However, it is difficult to provide high detection rate for various attack types using a single algorithm. Further, as features are generated after the session ends, an attack has most likely been already completed when it is detected. This is a critical limitation of this category.

D. SESSION-BASED MULTIPLE-MACHINE LEARNING ALGORITHM METHOD
This method performs training and classification by using features for a session while simultaneously using various classification algorithms. Among this category, well-known types are ensemble and multi-layered methods [30], [31]. The ensemble method simultaneously applies several algorithms and integrates the results. It can improve the detection performance by using several algorithms for various classes. The multi-layered method executes the next algorithm based on the result after executing a specific algorithm. In most cases, it makes use of unsupervised learning and supervised learning together. For example, it can perform partitioning using k-nearest neighbor (kNN) and applying a decision tree (DT) to each partition. The session-based multiple machine learning algorithm method has a very high classification performance. However, in reality, it is difficult to support real-time attack detection because it is impossible to process network traffic in real-time because of the very high computational cost caused by multiple machine learning algorithms. Further, because the overall implementation cost is high, it is difficult to apply to a real network security system.
Various approaches have been adopted to increase the detection accuracy and speed. However, as of now, there is very limited research on real-time intrusion prevention systems that can detect and defend attacks in real-time; thus, there is an urgent need for research on this topic.

III. MACHINE-LEARNING-BASED REAL-TIME IDS
As discussed in Section II, existing NIDSs are struggling to increase both detection speed and accuracy simultaneously, but no practical solutions are available. As a solution to this problem, we need a different strategy than the signature-based or anomaly-based approaches. For achieving accurate attack detection, using anomaly-based machine learning is inevitable, due to the limitations of using pre-configured signature databases, e.g., the low detection accuracy of zero-day attacks. Hence, we should aggressively adopt machine learning techniques in our NIDS. However, machine learning-based approaches are considerably slower than signature-based ones. As a result, real-time detection becomes almost impossible with machine-learning-based NIDS, making it difficult to defend against various attacks.

A. MOTIVATION
Basically, classification accuracy and classification time tend to be proportional [32]. For example, DT can be classified quickly by using a single tree, but classification performance may be degraded by instability and dependency on a particular set of features. On the other hand, random forest (RF) classifies more accurately than DT for most cases, but the classifying speed is much slower that DT. Thus, if we take advantage of the fast but less accurate classifier and the slow but more accurate classifier, we can develop a fast and accurate classifier. Let us consider the case of trying to classify the incoming traffic using the fast classifier at first. In this case, we need to check and evaluate the reliability of the result for each classification. If it has very high reliability, i.e., high score, it would be good to believe the result and process the traffic according to it. If the reliability is low, it is necessary to ignore the result and to run the slow but more accurate classifier. In this case, the most important design factor is that the fast classifier should handle as much traffic as possible, enabling the slow classifier to handle the remaining traffic without queueing. If this condition cannot be met, the speed of the slow classifier becomes a serious bottleneck of fast classification.
In order to satisfy such a condition, it implies that the distribution of traffic according to the difficulty of classification must be proportional. Fortunately, even a simple DT can 46388 VOLUME 9, 2021 classify normal and attack traffic with a fairly high accuracy. Therefore, it is reasonable to assume that the amount of traffic with high classification difficulty is relatively small.
For high speed classification, the speed of the classification algorithm itself is the most critical factor. However, feature extraction also consumes much time. Hence, we need to design the first classifier so that required features can be extracted easily without any special processing overhead. In addition, the number of features important because the number of features increases affects the classification speed, therefore, feature selection is necessary.
On the other hand, the second classifier needs to obtain important features even though it takes a much time for accurate classification. it will be essential to use session features completely describing the entire session, which is acquired by waiting until the session finishes. Now, based on this conclusion, we will explain how the proposed approach was designed to provide real-time detection and accurate detection simultaneously.

B. TWO-LEVEL CLASSIFIER
The proposed approach is a two-level scheme using two different classifiers simultaneously. To provide both high speed and accuracy in attack detection, the proposed scheme operates as follows: the level 1 classifier handles receiving packets at line speed, only if it can precisely classify them as normal or attack. The level 2 classifier handles any remaining packets which were not classified as attack in level 1, and classifies them with a slow but exact classifying algorithm because level 2 classifier has no constraint or burden of realtime processing.
The level 1 classifier adopts a classification algorithm optimized for classification speed, even though it sacrifices accuracy. This classifier plays a vital role in supporting line speed packet processing and in reducing the burden of the level 2 classifier. Thus, the level 1 classifier should process as many incoming packets as it can, but it also should not process them when it cannot do so with high accuracy. It sensibly determines if the packet is classified in level 1 or postponed to the level 2 classifier.
The level 2 classifier will handle packets unprocessed in the level 1 classifier because they could not be accurately classified. By handling only a small portion of the entire traffic, the burden of real-time processing is considerably less, which allows sophisticated and time-consuming classification for accurate detection. To achieve this goal, the level 2 classifier uses the statistical features of each flow for the whole lifespan instead of packet data. Although detection is delayed until the flow finishes, it is possible to effectively detect attacks that cannot be detected by analyzing only some of the packet data.
In addition to the unique two-level structure, the proposed approach trains classifiers, as shown in Fig. 2, to improve the performance of the level 2 classifier. It trains the level 1 classifier using the entire training dataset, as in other existing machine-learning-based classifiers. To train the level 2 classifier, we have two options: First, it can use the entire training dataset in the same way as the level 1 classifier. Second, it is possible to train the level 2 classifier using only data that the level 1 classifier cannot classify with high accuracy, because the level 2 classifier does not need to process the traffic that could be determined accurately by the level 1 classifier. This training can improve the detection accuracy of level 2 classifiers by reducing unnecessary training data.
Based on the second option, the proposed approach creates a new dataset to train the level 2 classifier in the following way. First, it trains the level 1 classifier using the entire dataset, and then the classifier classifies each data entry of the training dataset. If the score of the classification result exceeds the predefined threshold value called minimum level 1 classification score ( ), the data entry is excluded from the training dataset of the level 2 classifier. Therefore, the level 2 classifier is trained using the dataset consisting only of the remaining data entries whose score is less than .
After training both the level 1 and 2 classifiers is complete, it is ready to classify the actual traffic. Fig. 1 shows how the proposed approach classifies traffic using two-level classifiers. As described above, the level 1 classifier must process traffic at high speed. To enable this, the level 1 classifier should be designed to use features that can be generated simply and quickly. The proposed approach builds features from only the first data packet of each flow to determine whether the flow is attack or normal. If the packet is classified as an attack, with a score higher than , by the primary classifier, the packet is discarded, and the flow is blocked. Otherwise, the packet is forwarded, and the flow is allowed but monitored. For monitoring the flow, statistics of the flow are generated on receiving the first packet and updated whenever it receives a packet belonging to the flow until the flow is terminated. When the flow ends, the level 2 classifier uses the monitoring data as features for machine learning, and the flow is evaluated to detect an attack. Features for the second classifier will be described later. The overall procedure of the proposed algorithm is shown in Fig. 3.
C. LEVEL 1 PACKET-BASED CLASSIFIER As described above, the proposed approach uses only the first data packet of each flow to determine whether the flow is an attack or a normal flow. The flow is basically composed of multiple packets; therefore, we can significantly increase the number of flows processed in one second when using only the first data packet of each flow compared to the existing work that processes all packets of the flow. However, the detection accuracy of our approach is lower due to the lack of information.
Several works analyzing packet data instead of flow statistics already exist. For example, HAST-IDS 1 uses all packet data and converts each byte of data through one-hot encoding [6]. It is known to achieve a very high classification accuracy, but it can suffer from very large features generated by one-hot encoding. For example, it will have 25,600 features for a 100-byte packet because one-hot encoding generates 256 features for each byte. Such a considerable feature size is a serious burden to achieving high classification speed.
For fast classification, our proposed algorithm uses each byte value as a feature without one-hot encoding. This approach inevitably causes reduced classification accuracy, but such feature generation significantly reduces the total number of features used for training. The proposed approach compensates for the decreased performance with the level 2 classifier. However, it can be insufficient for handling 100-Gbps line speeds. Therefore, we perform feature selection to choose only some features to reduce the size of the feature set further. Feature selection helps both classification speed and accuracy because it eliminates unnecessary or less important features. In addition, source and destination IP 1 Hierarchical spatial-temporal IDS addresses in the IP header are excluded from the feature set. If an IP address is included in the set, the classifier can determine an attack using specific server and host IP addresses, so the classifier has high dependency on some specific flow. Excluding IP addresses from features helps the classifier to get better trained without severe skewness. Fig. 4 shows the difference in feature generation between one-hot encoding based on existing work and the proposed approach.

D. LEVEL 2 FLOW-BASED CLASSIFIER
The level 1 classifier tries to detect attacks based on the first data packet of each flow. However, for some network attacks such as denial-of-service, it will be better to examine intra-/inter-flow statistical information rather than a packet to detect attacks exactly [10]. To achieve this end, the proposed level 2 classifier generates and uses flow-specific features in addition to packet-based features for detecting malicious flows. The list of all flow features is shown in Table 14 in the Appendix. Some features can be generated from packets, but others, which are noted in grey, can be generated only when the flow ends. In the list, we have 46 features in total, and onehot encoding is performed for some features, i.e., protocol, session state, and service, contrary to our level 1 classifier, resulting in 231 features used for machine learning. Compared to the level 1 classifier, the number of features used in the level 2 classifier is greater, and feature generation takes a long time waiting until the flow being monitored finishes. This makes the classification speed of the level 2 classifier quite low for supporting 100 Gbps. Therefore, we need to design our IDS such that the level 1 classifier should handle most of the traffic and leave an indeterminable and very small portion of the traffic to the level 2 classifier.
For training the level 2 classifier, we can use the same training dataset used for the level 1 classifier. In this case, the dataset includes data that was classified with a score larger than by the level 1 classifier. Because such data never reach the level 2 classifier, they are not useful for the 46390 VOLUME 9, 2021 classifier. Therefore, we train the level 2 classifier using a sub-dataset that only includes the data whose scores from the level 1 classifier are less than .

E. PACKET ORIENTED FAST CLASSIFIER
As classifiers based on machine learning are general-purposed, they can be used in various environments, achieving a high classification accuracy with moderate classification speed. However, their speed is not enough to process incoming network traffic in real-time. On the other hand, packet classifiers have been developed for a long time to handle firewall policies and access control lists at high speed. Such classifiers can deliver more than 3 M packet classifications per second, which can support network traffic in the tens of Gbps. However, this kind of classifier cannot be used for our purpose because packet classifiers cannot support any learning process and it passively constructs search tables using a pre-defined policy set. Thus, we cannot directly use a packet classifier to boost classification speed.
To solve this problem, we propose a novel approach that builds DT at first and extracts a static policy set from training results. For example, Fig. 5 shows a DT after training whose leaf node has a condition for reaching the node, matching class ID, and matching probability.
The policy of the packet classifier consists of policy priority, matching condition, and action. We can get such information from all leaf nodes in the DT. Matching class ID and probability are saved as actions. We can ignore policy priority because the matching conditions of all leaf nodes of the DT are disjoint. Thus, the policy priority has no effect on the packet classifier. For this reason, we simply set the leaf node ID as the policy priority. By doing so, we can create a policy set for packet classifiers from the DT, as shown in Table 1. Although we use the packet classifier based on DT instead of DT, the searching results are the same as for DT. As a result, we can increase classification speed significantly without any loss of detection accuracy.

IV. PERFORMANCE EVALUATION
For analyzing the performance of the two-level real-time classification algorithm proposed in this research, we evaluated and compared the performance using various existing competitive algorithms.

A. EVALUATION ENVIRONMENT
We compared our algorithm with existing ones, such as DT, RF, and HAST-IDS [11], [12], [5]. The environment of the performance evaluation was as follows. A Jupyter server running on Intel Xeon E5-2640 v4 with two GeForce TITAN Xp graphics cards was used for evaluation. Scikit-learn and Weka libraries were used to measure classification performance and time except for HAST-IDS, which was implemented with CNN using the TensorFlow library. Only HAST-IDS uses GPU and CPU simultaneously, and other algorithms run on only CPU.
For comparison, we used sub-dataset created on January 22, 2015 from the UNSW-NB15 dataset [19]. We added the same class data from the dataset on a different date to the evaluation dataset to avoid the minority class problem by increasing the size of the class. We also removed some classes such as worms, backdoors, and analysis from the dataset because the total size of samples was too small even after balancing the dataset. Therefore, we evaluate the performance using seven classes, one for normal and the others for attack classes.
We also used CICIDS2017 dataset which include various DoS and DDoS attacks [30]. We removed Heatbleed and Infiltration among twelve classes because they have a too small data size.
For the level 1 classifier, we generated the features using only the first 100 bytes of the first data packet of each flow except for the source and the destination IPs. If a packet size was smaller than 108 bytes, we added some null values as padding. To build the features of the level 2 classifier, we used 46 features, which expanded to 231 features after one-hot encoding. We normalized all features to the range 0 to 1. We used the Pearson Correlation as a feature selection algorithm.
Each dataset includes 24,225 and 1,009,809 samples. It was divided into training and test datasets in the ratio of 6:4, so the size of total flows in test datasets were 9,692 and 302,966. Tables 2 and 3 show the size of data samples used for training and testing for each class of each dataset.

B. PARAMETER CONFIGURATION
Before the comparative experiment, we needed to find the best parameters for the proposed algorithm to achieve the highest accuracy and the fastest classification through pre-experiments. To find the optimal parameters, we measured the classification performance according to the combination of the feature size and classification algorithm for the level 1 classifier and the classification algorithm for the level 2 classifier. We also perform the measurement as , i.e., minimum level 1 classifier score increases.   For level 1 and 2 classifiers, the size of total features was increased by 10 from 10 to 100 and from 10 to 230, respectively.
There are many kinds of classification algorithms, but we considered only DT and RF for the classification algorithm of the proposed approach in our performance evaluation. DT is simple, but it has the advantage of fast classification speed. However, it can have low accuracy and over-fitting in classification. On the other hand, RF can achieve very high accuracy without a serious over-fitting issue. Because it internally uses multiple DTs, its classification speed is lower than DT's, but it is still fast compared to other classification algorithms, such as DNN and kNN. For these reasons, we selected the two algorithms as candidate algorithms for level 1 and 2 classifiers in our proposed algorithm. To maximize the strength of each algorithm, we apply the same algorithm to the level 1 and 2 classifiers.
It is very important to select a value of that will ensure high classification performance. To determine the optimal , we measured the classification results as the score was increased from 0 to 1 by 0.1, and then we chose the that achieved the best result. In addition to multi-class classification, we also conducted binary-class classification where all six attacks are regarded as one attack during training and classification to evaluate the classification performance under various situations. Tables 4 and 5 show the chosen optimal parameter values for each combination of algorithms for each dataset. We can see that it achieves a higher classification accuracy when  RF is used than when DT regardless of multi/binary class classification for UNSW-NB15. We also see that RF still shows the higher performance for CICIDS2017 even though the differences are marginal.

C. EVALUATION RESULTS AND ANALYSIS
For fair performance comparison, we found the optimal configuration for each algorithm. We used the optimal number of features obtained from each flow for the best F1-score of DT and RF. For HAST-I, we used 100-and 300-byte packet data, which showed the best classification speed and the highest F1-score, respectively. We use the notation HAST-I (N), which means N-byte packet data is used for HAST-I for the proposed algorithm. Table 6 shows the result of comparing average classification performance in multi-class classification for UNSW-NB15.

1) COMPARISON OF THE PERFORMANCE FOR MULTI-CLASS CLASSIFICATION
Our proposed algorithm using RF shows the highest performance in accuracy, precision, recall, and F1-score. Even the proposed algorithm using DT shows the second-highest performance of all metrics. HAST-I shows the worst, although the performance gap is less than one percent point. It reflects that the packet-based approach relying on some fixed-length packet data is not an efficient approach for detecting network intrusions.    Table 7 shows the result of comparing average classification performance for CICIDS2017. CICIDS2017 dataset is well-known for high classification result, so we can see that classification performances are very high regardless of algorithms. However, our proposed algorithm using RF shows the highest performance for precision, recall, and F1-score. HAST-I shows lower performance than ours but better than RF and DT, so it confirms that the proposed approach is efficient for detecting network intrusions. Fig. 6 shows the total classification time for processing all 9,692 flows of UNSW-NB15. The proposed algorithm using DT shows unbeatable classification speed compared to other approaches. It reaches 60-and 90-times faster classification compared to CNN-based HAST-I (100) and HAST-I (300). It also shows 4.7 times faster classification compared to DT, which is the fastest competing approach. Through these experiments, we can see that only our approach can process tremendous incoming traffic in modern networks. The proposed algorithm using RF is about 8 times slower than using DT, but it is also the fastest except for existing DT. Fig. 7 shows that the total classification time for processing all flows of CICIDS2017. As you can see, it shows the almost the same result compared to Fig. 6. Our approach with DT shows the highest speed and HAST-I still shows the lowest one even though HAST-I leverages GPU to boost the classification speed. Table 8 depicts the numbers of data classified in the level 1 and 2 classifiers of the proposed algorithm with the optimal detection-rate parameters in multi-class classification for UNSW-NB15. 97.3% of incoming packets are classified in the level 1 classifier, and only 2.7% are classified in the level 2 classifier. Therefore, we can see that the total classification performance of the proposed algorithm is mostly determined by the level 1 classifier. Table 9 shows the result for CICIDS2017. In the UNSW-NB15, the level 2 classifier classifies sessions more than the level 1 classifier for some classes such as shellcode. However, for CICIDS2017, the level 1 classifier absolutely classifies much more sessions than level 2 regardless of class types. VOLUME 9, 2021 TABLE 9. Number and the rate of total flow processed by level 1 and level 2 classifiers for each class data when the proposed algorithm using RF is applied on multi-class classification for CICIDS2017. By considering the total traffic size of flows for the entire test set, we can calculate the traffic processing rate in Gbps for each dataset and the results are shown in Figs. 8 and 9. In Fig. 8, the proposed algorithm using PRFC-DT represents classification results that are obtained by replacing the level 1 classifier DT with one of the existing fastest packet classification algorithms, PRFC [23] for UNSW-NB15. While the existing approaches show 157 Mbps to 5.8 Gbps throughput, the proposed algorithm shows 1.3 Gbps for using RF and 9.6 Gbps for using DT. Moreover, when the level 1 classifier is replaced with PRFC, the traffic processing performance was significantly improved to 149 Gbps without decreasing the attack detection accuracy. Fig. 9 also shows the very similar results for UNSW-NB15. From this result, we can confirm that our approach is very effective to support real-time intrusion detection and high classification accuracy simultaneously.

2) COMPARISON OF THE PERFORMANCE FOR BINARY-CLASS CLASSIFICATION
Tables 10 and 11 list the result of comparing the binary class classification performance for each algorithm with UNSW-NB15 and CICIDS2017. As with multi-class classifications   for UNSW-NB15, binary class classification results also show that the proposed approach yields the highest detection accuracy for all performance metrics regardless of dataset types. In particular, the proposed approach shows a significant increase in binary class classification performance compared to multi-class classification. However, competing algorithms such as HAST-I show a decreased detection accuracy regardless of packet size for UNSW-NB15. Interestingly, HAST-I becomes inferior to DT and RF for binary class classification of CICIDS2017 while HAST-I shows better performance than DT and RF for multi-class classification of CICIDS2017.
Figs. 10 and 11 shows comparisons of binary class classification speeds for UNSW-NB15 and CICIDS2017. From multiple classification rate comparisons, the proposed algorithm using DT is the fastest and HAST-I shows the slowest for all cases. Noting that proposed one does not rely on GPU but on   Number and rate of total flows processed by level 1 and level 2 classifiers for each class data when the proposed algorithm using RF is applied on binary class classification for UNSW-NB15.
CPU but HAST-I requires GPU and CPU, the result show that our approach is very promising for real-time detection. Tables 12 and 13 show that most flows are classified at the level 1 classifier in binary class classification similarly to multi-class classification for UNSW-NB15 and CICIDS2017. Interestingly, the ratio of attacks classified at the level 2 classifier is higher than that of multi-class cases. From this difference, it seems that minor class problem still exists in binary class classification. Such flows tend to have low scores in the level 1 classifier, therefore they are relatively more often classified at the level 2 classifier compared to normal traffic. TABLE 13. Number and rate of total flows processed by level 1 and level 2 classifiers for each class data when the proposed algorithm using RF is applied on binary class classification for CICIDS2017.   Fig. 12 also shows the packet processing performances of each algorithm for UNSW-NB15. As with multiple classification, the proposed algorithm using DT shows 9.59 Gbps throughput. It shows very high performance compared to competing algorithms, which show at most 2.2 Gbps. Moreover, it can be further improved by replacing DT for the first level with PRFC, with the result that it achieves 149 Gbps throughput. Fig. 13 shows the packet processing throughput for CICIDS2017, which is quite similar to that for UNSW-NB15. From these results, we can conclude that our approach is very helpful to provide fast classification and high accuracy.
The proposed approach reaches the highest detection accuracy in multi class classification and binary class classification, showing enough processing speed to handle intrusion detection in real time supporting enterprise-level and even backbone networks. The proposed algorithm using DT has a lower detection performance of 0.37% and 0.18% on F1-score compared to the proposed algorithm using RF, but it shows better processing performance than existing approaches for UNSW-NB15. Although the proposed algorithm using RF achieves the highest detection accuracy, the algorithm using DT surpasses any other competing algorithms in terms of detection speed. Therefore, from the experimental results, the proposed algorithm using DT or using PRFC and DT is a good choice if the target is a real-time IPS, and the proposed algorithm using RF is also a good choice if detection accuracy is more important.

V. CONCLUSION
As the total traffic volume of networks is rapidly increasing, cyber-attacks are also becoming more sophisticated and transforming into variants. Therefore, real-time IDSs are essential for protecting networks from such attacks. However, real-time detection cannot adopt elaborate and modern techniques due to the processing overhead, exposing weakness to zero-day attacks. We proposed a two-level intrusion detection approach supporting real-time processing with a high detection accuracy. It exploits packet-and flow-based classifiers to compensate for the performance and accuracy. The level 1 classifier extracts some selected features from the packet first to promote the fast classification, achieving real-time attack detection. The level 2 classifier only handles flows that were not classified by the level 1 classifier, therefore the traffic is small enough to be processed by a time-intensive machine-learning-based classifier. Such a unique structure of the two-level classifier can provide classification speed and accuracy simultaneously. We confirmed the effectiveness of this approach by extensive performance evaluation. We expect that it can be an effective solution to build real-time IPSs for overcoming the weaknesses of modern network security. Table 14.