Abstract:
We consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of m points in n dimensions, n,m → ∞ and α = m/n stays finite...Show MoreMetadata
Abstract:
We consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of m points in n dimensions, n,m → ∞ and α = m/n stays finite. Using exact but non-rigorous methods from statistical physics, we determine the critical value of α and the distance between the clusters at which it becomes information-theoretically possible to reconstruct the membership into clusters better than chance. We also determine the accuracy achievable by the Bayes-optimal estimation algorithm. In particular, we find that when the number of clusters is sufficiently large, r > 4+2√α, there is a gap between the threshold for information-theoretically optimal performance and the threshold at which known algorithms succeed.
Published in: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
Date of Conference: 27-30 September 2016
Date Added to IEEE Xplore: 13 February 2017
ISBN Information: