Abstract:
In this paper, we propose two methods of multiple time-resolution analysis of speech and their application to automatic speech recognition (ASR). Constant frame-rate mult...Show MoreMetadata
Abstract:
In this paper, we propose two methods of multiple time-resolution analysis of speech and their application to automatic speech recognition (ASR). Constant frame-rate multi-scale analysis is proposed based on a box of multi-scale features. Then a variable rate analysis is proposed based on the selection of the optimal temporal resolution on the fly by a properly trained non-linear classifier unit. The classifier's parameters are trained using the discriminative method of minimum classification error (MCE) training. We use the recently proposed conditional random fields (CRF) phonetic recognition system that effectively combines highly correlated features. Results are reported on a frame-wise classification task and also on TIMIT phone recognition task. Results show that (i) CRFs can effectively combine multi-scale features and (ii) MCE trained variable rate CRFs are competitive with the ldquoboxrdquo combination method.
Date of Conference: 19-24 April 2009
Date Added to IEEE Xplore: 26 May 2009
ISBN Information: