Skip to Main Content
In this paper, we extend our recent data-sampling based ensemble acoustic modeling technique for the speaker-independent task of TIMIT and propose new methods to further improve the effectiveness of the ensemble acoustic models. We propose applying overlapped speaker clustering in data sampling to construct an ensemble of acoustic models for speaker independent speech recognition. In addition, we evaluate the method of data sampling in recurrent neural network for constructing a RNN based frame classifier. We also investigate using CVEM in place of EM in our ensemble acoustic model training. By using these methods on the speaker independent TIMIT phone recognition task, we have obtained a 2.5% absolute gain on phone accuracy over a standard HMM baseline system.