Says Who? Deep Learning Models for Joint Speech Recognition, Segmentation and Diarization | IEEE Conference Publication | IEEE Xplore

Says Who? Deep Learning Models for Joint Speech Recognition, Segmentation and Diarization


Abstract:

The field of speech recognition has seen tremendous advances in the recent past owing to the development of powerful deep learning architectures. However, the closely rel...Show More

Abstract:

The field of speech recognition has seen tremendous advances in the recent past owing to the development of powerful deep learning architectures. However, the closely related fields of speech segmentation and di-arization are still primarily dominated by sophisticated variants of hierarchical clustering algorithms. We propose a powerful adaptation of the state-of-the-art Speech Recognition models for these tasks and demonstrate the effectiveness of our techniques on standard datasets. Our architectures are a combination of Bidirectional Long Short Term Memory (LSTM) Networks, Convolutional Networks, and Fully Connected Networks, trained by Gradient Descent to minimize the Cross Entropy and the Connectionist Temporal Classification (CTC) losses. We adapt the Libri Speech corpus for the task of segmentation and diarization. We obtained comparable results with respect to state-of-the-art in both tasks.
Date of Conference: 15-20 April 2018
Date Added to IEEE Xplore: 13 September 2018
ISBN Information:
Electronic ISSN: 2379-190X
Conference Location: Calgary, AB, Canada

Contact IEEE to Subscribe

References

References is not available for this document.