Skip to Main Content
This paper proposes a new class loss function as an alternative to the standard sigmoid class loss function for optimizing the parameters of decoding graphs using discriminative training based on minimum classification error (MCE) criterion. The standard sigmoid based approach tends to ignore a significant number of training samples that have a large difference between the scores of the reference and their corresponding competing hypotheses and this affects the parameters optimization. The proposed function overcomes this limitation through considering almost all the training samples and thus improved the parameter optimization when tested on large decoding graphs. The decoding graph used in this research is an integrated network of weighted finite state transducers. The primary task examined is 64K words, continuous speech recognition task. The experimental results show that the proposed method outperformed the baseline system based on both the maximum likelihood estimation (MLE) and sigmoid-based MCE and achieved a reduction in the word error rate (WER) of 28.9% when tested on the TIMIT speech database.