Skip to Main Content
Structured discriminative models are a flexible sequence classification approach that enable a wide variety of features to be used. This paper describes a particular model in this framework, structured support vector machines (SSVM), and how it can be applied to medium to large vocabulary speech recognition tasks. An important aspect of SSVMs is the form of the joint feature spaces. Here, context-dependent generative models, hidden Markov models, are used to obtain the features. To apply this form of combined generative and discriminative model to medium and larger vocabulary tasks, a number of issues need to be addressed. First, the features extracted are a function of the segmentation of the utterance. A Viterbi-like scheme for obtaining the “optimal” segmentation is described. Second, SSVMs can be viewed as large margin log linear models using a zero mean Gaussian prior of the discriminative parameter. However this form of prior is not appropriate for all features. A modified training algorithm is proposed that allows general Gaussian priors to be incorporated into the large margin criterion. Finally to speed up the training process, a 1-slack algorithm, caching competing hypotheses and parallelization strategies are also described. The performance of SSVMs is evaluated on small and medium to large speech recognition tasks: AURORA 2 and 4.