By Topic

Robust Multifactor Speech Feature Extraction Based on Gabor Analysis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Qiang Wu ; Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China ; Liqing Zhang ; Guangchuan Shi

The performance of speech recognition systems relies on the consistency and adaptation of the speech feature in complex conditions during the training and testing stages. Traditional systems usually perform poorly under adverse noisy conditions and are not applicable to most real world problems. In this paper, we investigate the speech feature extraction problem in a noisy environment and propose a novel approach based on Gabor filtering and tensor factorization. Recent physiological and psychoacoustic experimental results suggest that the localized spectro-temporal features are essential for auditory perception. To explore this property, we represent the speech signal by using a general higher order tensor and employ two-dimensional Gabor functions with different scales and directions to analyze the localized patches of the power spectrogram. Then the Nonnegative Tensor PCA with sparse constraints is proposed to learn the projection matrices from multiple interrelated feature subspaces. The objective of the sparse constraints is to preserve the statistical characteristic of clean speech data by finding projection matrices of speech subspaces and reduce the noise components which have distributions different from those of clean speech. A multifactor analysis method is proposed to extract robust sparse features by processing the data samples in tensor structure. The simulation results indicate that our proposed method is able to improve the speech recognition performance, especially in noisy environments, compared with the traditional speech feature extraction methods.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:19 ,  Issue: 4 )