This paper proposes a new covariance modeling technique for Gaussian mixture models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., Σj-1=Pj=Σk=1DλkjakakT, λkj∈R,ak∈Rd. A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set {akakT}k=1D and the expansion coefficients {λkj}. This model, called the extended maximum likelihood linear transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from D=d to D=d(d+1)/2 one gradually moves from a maximum likelihood linear transform (MLLT) model to a full-covariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.
Published in:
Speech and Audio Processing, IEEE Transactions on
(Volume:12
,
Issue:
1
)
Date of Publication: Jan. 2004