Skip to Main Content
When we are dealing with community structure detecting in the blogosphere, we have come to face some obstacles. The data in a blog may be updated frequently by its owner, making the whole blogosphere become very large during a short period of time. It can be very expensive to deal with such huge amount of data using those traditional methods. Meanwhile, few blogs in the blogosphere can be identified as members of a specify community clearly from their own characters, while we have to judge most blogs depending on the relationship with other neighboring blogs using centrality metrics. Recently, a new method that combines active learning and semi-supervised learning gives quite a good performance on improving the speed and accuracy of machine learning on large scale of data. In this paper, we employ this method to solve the community clustering problem with a vast and complex data set. We try to show that this method really does a better job on labeling and clustering large scale of data by comparing the result with the one achieved in the traditional way. Afterward, we may make some improvements and use it to deal with community detecting in the blogosphere.