Skip to Main Content
We present an optimal clustering algorithm for grouping multivariate normal distributions into clusters using the divergence, a symmetric, information-theoretic distortion measure based on the Kullback-Liebler distance. Optimal solutions for normal distributions are shown to be obtained by solving a set of Riccati matrix equations and the optimal centroids are found by alternating the mean and covariance matrix intermediate solutions. The clustering performance of the new algorithm compared favorably against the conventional, non-optimal clustering solutions of sample mean and sample covariance in its overall rate-distortion and even distributions of samples across clusters. The resultant clusters were further tested on unsupervised adaptation of HMM parameters in a framework of structured maximum a posterior linear regression (SMAPLR). The Wall Street Journal database was used for the adaptation experiment. The recognition performance with respect to the word error rate, was significantly improved from a nonoptimal centroid (sample mean and covariance) of 32.6% to 27.6% and 27.5% for the diagonal and full covariance matrix cases, respectively.