Skip to Main Content
This letter proposes a novel scheme that applies feature statistics normalization techniques for robust speech recognition. In the proposed approach, the processed temporal-domain feature sequence is first decomposed into nonuniform subbands using the discrete wavelet transform (DWT), and then each subband stream is individually processed by well-known normalization methods, such as mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all of the modified subband streams using the inverse DWT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately.