Multiple time resolution analysis of speech signal using MCE training with application to speech recognition
Dimopoulos, S.
Potamianos, A.
Lussier, E.-F.
Chin-Hui Lee
Dept. of Electron. & Comput. Eng., Tech. Univ. of Crete, Chania;
Abstract
In this paper, we propose two methods of multiple time-resolution analysis of speech and their application to automatic speech recognition (ASR). Constant frame-rate multi-scale analysis is proposed based on a box of multi-scale features. Then a variable rate analysis is proposed based on the selection of the optimal temporal resolution on the fly by a properly trained non-linear classifier unit. The classifier's parameters are trained using the discriminative method of minimum classification error (MCE) training. We use the recently proposed conditional random fields (CRF) phonetic recognition system that effectively combines highly correlated features. Results are reported on a frame-wise classification task and also on TIMIT phone recognition task. Results show that (i) CRFs can effectively combine multi-scale features and (ii) MCE trained variable rate CRFs are competitive with the ldquoboxrdquo combination method.
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.