Close category search window
 

Frequency-domain linear prediction for temporal features

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Athineos, M. ; Dept. of Electr. Eng., Columbia Univ., New York, NY, USA ; Ellis, D.P.W.

Current speech recognition systems uniformly employ short-time spectral analysis, usually over windows of 10-30 ms, as the basis for their acoustic representations. Any detail below this timescale is lost, and even temporal structures above this level are usually only weakly represented in the form of deltas etc. We address this limitation by proposing a novel representation of the temporal envelope in different frequency bands by exploring the dual of conventional linear prediction (LPC) when applied in the transform domain. With this technique of frequency-domain linear prediction (FDLP), the 'poles' of the model describe temporal, rather than spectral, peaks. By using analysis windows on the order of hundreds of milliseconds, the procedure automatically decides how to distribute poles to model the temporal structure best within the window. While this approach offers many possibilities for novel speech features, we experiment with one particular form, an index describing the 'sharpness' of individual poles within a window, and show a relatively large word error rate improvement from 4.97% to 3.81% in a recognizer trained on general conversational telephone speech and tested on a small-vocabulary spontaneous numbers task. We analyze this improvement in terms of the confusion matrices and suggest how the newly-modeled fine temporal structure may be helping.

Published in:
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on

Date of Conference: 30 Nov.-3 Dec. 2003

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.