Skip to Main Content
Text clustering is the major route for topic detection. The major shortcoming which the current algorithms always suffers is the high computing complexity and great time cost when the number of instance is too large. We introduce a new algorithm which cluster the text copra is two steps: in the C-process we divide the copra into some overlapping subsets using Canopy clustering; in the K-process we take X-means algorithm to generate rough clusters from the canopies which share common instance. Experiments show this text clustering technique reveals the true number of the clusters from the copra and runs faster than Single-pass and K-means clustering algorithms.