Skip to Main Content
In a LAN, Internet access should be managed well for a better user experience. Those using a larger share of the bandwidth may be restricted during peak hours to enable others to use the Internet. This can be viewed as a problem of classifying the users based on their Internet usage into normal and high categories, following which control policies may be applied. For this purpose, a proxy-based mechanism has been proposed for classification of users according to the share of their Internet access. The advantage of this approach is that users sharing the same computer can be distinguished by the proxy server and appropriate control policies can be exercised. To understand user behaviour, data is collected at the proxy server in a campus LAN. Machine learning algorithms are then used to learn and characterise user behaviour. In particular, Naive Bayes' and Gaussian Mixture Model based classifiers are used. It is observed that the algorithms are able to scale in that users are clustered into two different groups. Performance evaluation on a held out data set indicates that users can be accurately distinguished 94.96% of the time. The algorithm is also practical since the time consuming task of model building need be done only once a month offline, while the daily task of classification may be accomplished in a period of 20 mins for GMMs. It has also been shown how the user behavior of the two groups of users may be characterized. This would be a useful aid in the design of policies and algorithms for Internet access control.