Skip to Main Content
This Internet traffic classification using Machine Learning is an emerging research field since 1990's, and now it is widely used in numerous network activities. The classification technique focuses on modeling attributes and features of data flows to accomplish the identification of applications. In the paper we design and implement the classification model based on header-derived flow statistical features. Compared with the traditional methods, the model designed here, which is totally insensitive to port numbers and contents of payload on application level, overcomes difficulty in operation caused by unreliable port numbers and complexity of payload interpretation. Rather than relatively complex ML algorithms or even in mixture, supervised k-Nearest Neighbor estimator is adopted for the sake of computational efficiency, along with the effective and easy-to-calculate statistical features selected according to the operational background. Our results indicate that about 90% accuracy on per-flow classification can be achieved, which is a vast improvement over traditional techniques that achieve 50-70%.