Skip to Main Content
This work evaluates three methods for encrypted traffic analysis without using the IP addresses, port number, and payload information. To this end, binary identification of SSH vs non-SSH traffic is used as a case study since the plain text initiation of the SSH protocol allows us to obtain data sets with a reliable ground truth. The methods are subject to several tests using different export options, feature sets, and training and test traffic traces for a total of 128 different configurations. Of particular interest are test cases which that use a test set from a different network than that which the model was trained on, i.e. robustness of the trained models. Results show that the multi-objective genetic algorithm (MOGA) based trained model is able to achieve the best performance among the three methods when each approach is tested on traffic traces that are captured on the same network as the training network trace. On the other hand, C4.5 achieved the best results among the three methods when tested on traffic traces which are captured on totally different networks than the training trace. Furthermore, it is shown that continuous sampling of the training data is no better than random sampling, but the training data is very important for how well the classifiers will perform on traffic traces captured from different networks. Moreover, the C4.5 based approach provides the fastest and the most human readable model, whereas the MOGA reduces the complexity of the k-means clustering algorithm tremendously.
Date of Conference: 11-15 April 2011