Abstract:
In many real-life applications, the existing data cannot be stored in a computer’s random access memory, requiring learning algorithms to process the data as streams. In ...Show MoreMetadata
Abstract:
In many real-life applications, the existing data cannot be stored in a computer’s random access memory, requiring learning algorithms to process the data as streams. In some cases, limited information about the data set is available, which can be used to enhance learning outcomes. In semisupervised clustering (SSC), such information is presented as pairwise constraints in the form of “cannot-link” and “mustlink” constraints. This paper develops a SSC algorithm for data streams. We consider the data as a sequence of blocks and use a nonsmooth, nonconvex optimization model for the SSC problems. The algorithm is designed to process one block of data at each iteration. Special procedures are developed to identify cluster-determining points, which are used in subsequent iterations. The proposed algorithm is evaluated using some real-world data sets.
Date of Conference: 22-23 November 2024
Date Added to IEEE Xplore: 28 January 2025
ISBN Information: