Abstract:
The field of speech recognition has seen tremendous advances in the recent past owing to the development of powerful deep learning architectures. However, the closely rel...Show MoreMetadata
Abstract:
The field of speech recognition has seen tremendous advances in the recent past owing to the development of powerful deep learning architectures. However, the closely related fields of speech segmentation and di-arization are still primarily dominated by sophisticated variants of hierarchical clustering algorithms. We propose a powerful adaptation of the state-of-the-art Speech Recognition models for these tasks and demonstrate the effectiveness of our techniques on standard datasets. Our architectures are a combination of Bidirectional Long Short Term Memory (LSTM) Networks, Convolutional Networks, and Fully Connected Networks, trained by Gradient Descent to minimize the Cross Entropy and the Connectionist Temporal Classification (CTC) losses. We adapt the Libri Speech corpus for the task of segmentation and diarization. We obtained comparable results with respect to state-of-the-art in both tasks.
Published in: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 15-20 April 2018
Date Added to IEEE Xplore: 13 September 2018
ISBN Information:
Electronic ISSN: 2379-190X