Loading [a11y]/accessibility-menu.js
Active constraints selection based on density peak | IEEE Conference Publication | IEEE Xplore

Abstract:

Semi-supervised clustering which aims to integrate side information to improve the performance of clustering process, has received a lot of attentions in research communi...Show More

Abstract:

Semi-supervised clustering which aims to integrate side information to improve the performance of clustering process, has received a lot of attentions in research community. Generally, there are two kinds of side information called seed (labelled data) and constraint (must-link, cannot-link). By integrating information provided by the user or domain expert, the semi-supervised clustering can produce expected results of users. In fact, clustering results usually depend on side information provided, so different side information will produce different results. In some cases, the performance of clustering may decrease if the side information is not carefully chosen. This paper addresses the problem of selecting good constraints for semi-supervised clustering algorithms. For this purpose, we propose an active learning algorithm for the constraints collection task, which relies on the min-max algorithm and peaks estimation based on density score. Experiments conducted on some real data sets from UCI show the effectiveness of our approach.
Date of Conference: 13-16 February 2022
Date Added to IEEE Xplore: 11 March 2022
ISBN Information:

ISSN Information:

Conference Location: PyeongChang Kwangwoon_Do, Korea, Republic of

I. Introduction

Recently, semi-supervised clustering has received a lot of attentions in researcher communities [1, 2]. The advantage of semi-supervised clustering consists in possibility to use a small set of side information to improve clustering results. There are two kinds of side information including constraints and seeds. Constraints include must-link and cannot-link pairwise dependencies in which must-link constraint between two objects x and y means that x and y should be grouped in the same cluster and cannot-link constraint means that x and y should not be grouped in the same cluster. In real applications, we hypothesis that the side information is available or can be collected from users/experts.

Contact IEEE to Subscribe

References

References is not available for this document.