By Topic

WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)

During the last decade, weighted finite-state transducers (WFSTs) have become popular in speech recognition. While their main field of application remains hidden Markov model (HMM) decoding, the WFST framework is now also seen as a brick in solutions to many other central problems in automatic speech recognition (ASR). These solutions are less known, and this work aims at giving an overview of the applications of WFSTs in large-vocabulary continuous speech recognition (LVCSR) besides HMM decoding: discriminative acoustic model training, Bayes risk decoding, and system combination. The application of the WFST framework has a big practical impact: we show how the framework helps to structure problems, to develop generic solutions, and to delegate complex computations to WFST toolkits. In this paper, we review the literature, discuss existing approaches, and provide new insights into WFST enabled solutions. We also present a novel, purely WFST-based algorithm for computing the exact Bayes risk hypothesis from a lattice with the Levenshtein distance as loss function. We present the problems and their solutions in a unified framework and discuss the advantages and limits of using WFSTs. We do not provide new experimental results, but refer to the existing literature. Our work helps to identify where and how the transducer framework can contribute to a compact and generic solution to LVCSR problems.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:20 ,  Issue: 2 )