Skip to Main Content
This paper proposes a new feature representation, Bag Of Arcs (BOA) for speech segments. A speech segment in BOA is simply represented as a set of counts for unique arcs in a finite state machine. Similar to the Bag Of Words model (BOW), BOA disregards the order of arcs, and thus, efficiently models speech segments. A strong motivation to use BOA is provided by a fact that the BOA representation is tightly connected to the output of a Weighted Finite State Transducer (WFST) based ASR decoder. Thus, BOA directly represents elements in the search network of a WFST-based ASR decoder, and can include information about context-dependent HMM topologies, lexicons, and back-off smoothed n-gram networks. In addition, the counts of BOA are accumulated by using the WFST decoder output directly, and we do not require an additional overhead and a change of decoding algorithms to extract the features. Consequently, we can combine the ASR decoder and post-processing without a process to extract word features from the decoder outputs or re-compiling WFST networks. We show the effectiveness of the proposed approach for some ASR post-processing applications in utterance classification experiments, and in speaker adaptation experiments by achieving absolute 1% improvement in WER from baseline results. We also show examples of latent semantic analysis for BOA by using latent Dirichlet allocation.
Date of Conference: 25-30 March 2012