Cart (Loading....) | Create Account
Close category search window
 

A Bottom-Up Modular Search Approach to Large Vocabulary Continuous Speech Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Siniscalchi, S.M. ; Fac. of Archit. & Eng., Univ. of Enna “Kore, Enna, Italy ; Svendsen, T. ; Chin-Hui Lee

A novel bottom-up decoding framework for large vocabulary continuous speech recognition (LVCSR) with a modular search strategy is presented. Weighted finite state machines (WFSMs) are utilized to accomplish stage-by-stage acoustic-to-linguistic mappings from low-level speech attributes to high-level linguistic units in a bottom-up manner. Probabilistic attribute and phone lattices are used as intermediate vehicles to facilitate knowledge integration at different levels of the speech knowledge hierarchy. The final decoded sentence is obtained by performing lexical access and applying syntactical constraints. Two key factors are critical to warrant a high recognition accuracy, namely: (i) generation of high-precision sets of competing hypotheses at every intermediate stage; and (ii) low-error pruning of unlikely theories to reduce input lattice sizes while maintaining high-quality hypotheses for the next layers of knowledge integration. The decoupled nature of the proposed techniques allows us to obtain recognition results at all stages, including attribute, phone and word levels, and enables an integration of various knowledge sources not easily done in the state-of-the-art hidden Markov model (HMM) systems based on top-down knowledge integration. Evaluation on the Nov92 test set of the 5000-word, Wall Street Journal task demonstrates that high-accuracy attribute and phone classification can be attained. As for word recognition, the proposed WFSM-based framework achieves encouraging word error rates. Finally, by combining attribute scores with the conventional HMM likelihood scores and re-ordering the N-best lists obtained from the word lattices generated with the proposed WFSM system, the word error rate (WER) can be further reduced.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:21 ,  Issue: 4 )

Date of Publication:

April 2013

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.