Skip to Main Content
Network traffic classification and application identification provide important benefits for IP network engineering, management and control and other key domains. Current popular methods, such as port-based and payload-based, have shown some disadvantages, and the machine learning based method is a potential one. The traffic is classified according to the payload-independent statistical characters. This paper introduces the different levels in network traffic-analysis and the relevant knowledge in machine learning domain, analysis the problems of port-based and payload-based methods in traffic classification. Considering the priority of the machine learning-based method, we experiment with unsupervised K-means to evaluate the efficiency and performance. We adopt feature selection to find an optimal feature set and log transformation to improve the accuracy. The experimental results on different datasets convey that the method can obtain up to 80% overall accuracy, and, after a log transformation, the accuracy is improved to 90% or more.