Skip to Main Content
The performance of speech recognition systems relies on the consistency and adaptation of the speech feature in complex conditions during the training and testing stages. Traditional systems usually perform poorly under adverse noisy conditions and are not applicable to most real world problems. In this paper, we investigate the speech feature extraction problem in a noisy environment and propose a novel approach based on Gabor filtering and tensor factorization. Recent physiological and psychoacoustic experimental results suggest that the localized spectro-temporal features are essential for auditory perception. To explore this property, we represent the speech signal by using a general higher order tensor and employ two-dimensional Gabor functions with different scales and directions to analyze the localized patches of the power spectrogram. Then the Nonnegative Tensor PCA with sparse constraints is proposed to learn the projection matrices from multiple interrelated feature subspaces. The objective of the sparse constraints is to preserve the statistical characteristic of clean speech data by finding projection matrices of speech subspaces and reduce the noise components which have distributions different from those of clean speech. A multifactor analysis method is proposed to extract robust sparse features by processing the data samples in tensor structure. The simulation results indicate that our proposed method is able to improve the speech recognition performance, especially in noisy environments, compared with the traditional speech feature extraction methods.