We describe a novel approach for modeling segmental information in speech recognition, through the use of thumbnail features. By taking into account dependencies at the segmental level, thumbnail features are more resistant to changes in speaking rates and other factors. While the traditional acoustic features are fixed for every utterance, one set of thumbnail features is computed for each hypothesis, which may violate the traditional scoring paradigm. To this end, we introduce a conditional exponential modeling framework. It allows better integration of various knowledge sources in a discriminative fashion. We present preliminary experiments on the Switchboard task.
Published in:
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
(Volume:1
)
Date of Conference: 17-21 May 2004