Cart (Loading....) | Create Account
Close category search window

MVA Processing of Speech Features

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Chia-Ping Chen ; Univ. of Washington, Seattle, WA ; Bilmes, J.A.

In this paper, we investigate a technique consisting of mean subtraction, variance normalization and time sequence filtering. Unlike other techniques, it applies auto-regression moving-average (ARMA) filtering directly in the cepstral domain. We call this technique mean subtraction, variance normalization, and ARMA filtering (MVA) post-processing, and speech features with MVA post-processing are called MVA features. Overall, compared to raw features without post-processing, MVA features achieve an error rate reduction of 45% on matched tasks and 65% on mismatched tasks on the Aurora 2.0 noisy speech database, and an average 57% error reduction on the Aurora 3.0 database. These improvements are comparable to the results of much more complicated techniques even though MVA is relatively simple and requires practically no additional computational cost. In this paper, in addition to describing MVA processing, we also present a novel analysis of the distortion of mel-frequency cepstral coefficients and the log energy in the presence of different types of noise. The effectiveness of MVA is extensively investigated with respect to several variations: the configurations used to extract and the type of raw features, the domains where MVA is applied, the filters that are used, the ARMA filter orders, and the causality of the normalization process. Specifically, it is argued and demonstrated that MVA works better when applied to the zeroth-order cepstral coefficient than to log energy, that MVA works better in the cepstral domain, that an ARMA filter is better than either a designed finite impulse response filter or a data-driven filter, and that a five-tap ARMA filter is sufficient to achieve good performance in a variety of settings. We also investigate and evaluate a multi-domain MVA generalization

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:15 ,  Issue: 1 )

Date of Publication:

Jan. 2007

Need Help?

IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.