Skip to Main Content
This paper presents a new feature compensation approach to noisy speech recognition by using high-order vector Taylor series (HOVTS) approximation of an explicit model of environmental distortions. Formulations for maximum-likelihood (ML) estimation of both additive noises and convolutional distortions, and minimum mean squared error (MMSE) estimation of clean speech are derived. Experimental results on Aurora2 and Aurora4 benchmark databases, where the modeling assumption of the distortion model is more accurate, demonstrate that the standard HOVTS-based feature compensation approaches achieve consistently significant improvement in recognition accuracy compared to traditional standard first-order VTS-based approach. For a real-world in-vehicle connected digits recognition task on Aurora3 benchmark database where the modeling assumption of the distortion model is less accurate, modifications are necessary to make VTS-based feature compensation approaches work. In this case, the second-order VTS-based approach performs only slightly better than the first-order VTS-based approach.