Skip to Main Content
This paper presents methods and results for joint optimization of the feature extraction and the model parameters of a detector. We further define a discriminative training criterion called Minimum Detection Error (MDE). The criterion can optimize the F-score or any other detection performance metric. The methods are used to design detectors of subwords in continuous speech, i.e. to spot phones and articulatory features. For each subword detector the MFCC filterbank matrix and the Gaussian means in the HMM models are jointly optimized. For experiments on TIMIT, the optimized detectors clearly outperform the baseline detectors and also our previous MCE based detectors. The results indicate that the same performance metric should be used for training and test and that accuracy outperforms F-score with respect to relative improvement. Furter, the optimized filterbanks usually reflect typical acoustic properties of the corresponding detection classes.