Abstract:
Frequent itemset-based text clustering has emerged as a promising way to automatic organization of text documents, because it allows high clustering accuracy combined wit...Show MoreMetadata
Abstract:
Frequent itemset-based text clustering has emerged as a promising way to automatic organization of text documents, because it allows high clustering accuracy combined with understandable cluster descriptors. However, the clustering results may not be satisfactory because they do not reflect the user's point of view. In this context, active learning is an interesting approach to incorporate the user's knowledge in the text clustering task by querying the users about the data. We introduce an active learning approach to frequent itemset-based text clustering called AL2FIC. In our approach, the users can provide feedback directly on the cluster descriptors without the need to know the document labels. An experimental evaluation on real text collections demonstrated that our AL2FIC approach significantly increases the text clustering performance even when only few descriptors are selected by the users.
Date of Conference: 11-15 November 2012
Date Added to IEEE Xplore: 14 February 2013
ISBN Information:
ISSN Information:
Conference Location: Tsukuba, Japan