We apply streaming data mining techniques, and in particular, concept-adapting very fast decision tree (CVFDT) to identify peer-to-peer (P2P) applications in Internet traffic, as the Internet data flows dynamically in large volumes (streaming data), and in P2P applications, new communities of peers often attend and old communities of peers often leave, requiring the identification methods to be capable of coping with concept drift, and updating the model incrementally. We captured Internet traffic at a main gateway router, performed pre-processing on the captured data, selected the most significant attributes, and prepared a training data stream to which the CVFDT model was applied. We tested our approach on a data stream with 3.5 million P2P and NonP2P traffic records. The results show that our approach can effectively deal with dynamic nature of streaming data and detect the changes in communities of peers. The classification accuracy is higher than 95%, and the method is well-scalable in both time and space complexities, making it competent for large-scale dynamic data. We extracted attributes only from the IP layer, eliminating the privacy concern associated with the techniques that use deep packet inspection.
Published in:
Tools with Artificial Intelligence, 2008. ICTAI '08. 20th IEEE International Conference on
(Volume:1
)
Date of Conference: 3-5 Nov. 2008