In recent years, Conditional Random Fields (CRFs) have been examined as a statistical model for speech recognition. In this paper, we explore the use of features derived via CRFs as inputs to a Tandem- style HMM ASR system (that is, a Crandem system). We present a model for deriving frame-level posterior features via CRFs to use in Crandem modeling and additionally provide experimental results that show the Crandem system can slightly significantly outperform both a comparable Tandem system and a comparable CRF system on the task of phone recognition.
Published in:
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Date of Conference: March 31 2008-April 4 2008