Nasal Speech Sounds Detection Using Connectionist Temporal Classification | IEEE Conference Publication | IEEE Xplore

Nasal Speech Sounds Detection Using Connectionist Temporal Classification

CodeAvailable

Abstract:

Phone attributes, known also as distinctive or phonological features, belong to important classification of the speech sounds used in automatic speech processing. Trainin...Show More

Abstract:

Phone attributes, known also as distinctive or phonological features, belong to important classification of the speech sounds used in automatic speech processing. Training of conventional phone attribute detectors (classifiers), either based on acoustic measurements or deep learning approaches, requires decent phone boundary segmentation. This paper proposes a solution to train a phone attribute detector without phone alignment using an end-to-end phone attribute modeling based on the connectionist temporal classification. Experiments, performed for the nasal phone attribute on the LibriSpeech database, confirm that the proposed system outperforms conventional deep neural network detector, trained even on the same training data. Further improvements are observed with more training data. Conventional complex system that consists of feature extraction, phone force-alignment and deep neural network training is replaced by a more simpler Python package based on PyTorch, released as open-source.
Date of Conference: 15-20 April 2018
Date Added to IEEE Xplore: 13 September 2018
ISBN Information:
Electronic ISSN: 2379-190X
Conference Location: Calgary, AB, Canada

Contact IEEE to Subscribe

References

References is not available for this document.