By Topic

Frequency-time-shift-invariant time-delay neural networks for robust continuous speech recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Sawai, H. ; ATR Interpreting Telephony Res. Lab., Kyoto, Japan

The authors propose neural network (NN) architectures for robust speaker-independent, continuous speech recognition. One architecture is the frequency-time-shift-invariant time-delay neural network (FTDNN). Another architecture is based on windowing each layer of the NN with local time-frequency windows. This architecture makes it possible for the NN to capture global features from the upper layers as well as precise local features from the lower layers. Recognition experiments on easily confused phonemes were performed using /b/, /d/, /g/, /m/, /n/, and /N/ (syllabic nasal) phoneme tokens to verify robustness to variations of speech. Performance results for the different architectures are presented

Published in:

Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on

Date of Conference:

14-17 Apr 1991