Abstract:
Automatic Speech Recognition (ASR) technologies can be life-changing for individuals who suffer from dysarthria, a speech impairment that affects articulatory muscles and...Show MoreMetadata
Abstract:
Automatic Speech Recognition (ASR) technologies can be life-changing for individuals who suffer from dysarthria, a speech impairment that affects articulatory muscles and results in incomprehensive speech. Nevertheless, the performance of the current dysarthric ASR systems is unsatisfactory, especially for speakers with severe dysarthria who most benefit from this technology. While transformer and neural attention-base sequences-to-sequence ASR systems achieved state-of-the-art results in converting healthy speech to text, their applications as a Dysarthric ASR remain unexplored due to the complexities of dysarthric speech and the lack of extensive training data. In this study, we addressed this gap and proposed our Dysarthric Speech Transformer that uses a customized deep transformer architecture. To deal with the data scarcity problem, we designed a two-phase transfer learning pipeline to leverage healthy speech, investigated neural freezing configurations, and utilized audio data augmentation. Overall, we trained 45 speaker-adaptive dysarthric ASR in our investigations. Results indicate the effectiveness of the transfer learning pipeline and data augmentation, and emphasize the significance of deeper transformer architectures. The proposed ASR outperformed the state-of-the-art and delivered better accuracies for 73% of the dysarthric subjects whose speech samples were employed in this study, in which up to 23% of improvements were achieved.
Published in: IEEE Transactions on Neural Systems and Rehabilitation Engineering ( Volume: 31)
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Speech Recognition ,
- Speech Recognition Systems ,
- Training Data ,
- Data Augmentation ,
- Transfer Learning ,
- Deep Architecture ,
- Speech Disorders ,
- Audio Data ,
- Speech Samples ,
- Transformer Architecture ,
- Automatic Speech Recognition System ,
- Set Of Results ,
- Long Short-term Memory ,
- Recurrent Neural Network ,
- Feed-forward Network ,
- Language Model ,
- Speech Intelligibility ,
- Level Of Intelligence ,
- Synaptic Weights ,
- Encoder Module ,
- Word Error Rate ,
- Higher Intelligence ,
- Common Words ,
- Normal Speech ,
- Encoder Architecture ,
- Speech Data ,
- Vocabulary Size ,
- Hidden Representation ,
- Speech Utterances
- Author Keywords
- MeSH Terms
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Speech Recognition ,
- Speech Recognition Systems ,
- Training Data ,
- Data Augmentation ,
- Transfer Learning ,
- Deep Architecture ,
- Speech Disorders ,
- Audio Data ,
- Speech Samples ,
- Transformer Architecture ,
- Automatic Speech Recognition System ,
- Set Of Results ,
- Long Short-term Memory ,
- Recurrent Neural Network ,
- Feed-forward Network ,
- Language Model ,
- Speech Intelligibility ,
- Level Of Intelligence ,
- Synaptic Weights ,
- Encoder Module ,
- Word Error Rate ,
- Higher Intelligence ,
- Common Words ,
- Normal Speech ,
- Encoder Architecture ,
- Speech Data ,
- Vocabulary Size ,
- Hidden Representation ,
- Speech Utterances
- Author Keywords
- MeSH Terms