Abstract:
Amyotrophic lateral sclerosis (ALS) patients experience progressive speech deterioration due to muscle paralysis, leading to eventual loss of verbal communication capabil...Show MoreMetadata
Abstract:
Amyotrophic lateral sclerosis (ALS) patients experience progressive speech deterioration due to muscle paralysis, leading to eventual loss of verbal communication capability. Text-to-speech synthesis (TTS) is an important technology for speech generating devices, enabling users to communicate using generic electronic voices, but often without the vocal identity of the users. Our work is aimed at personalizing TTS voices for people with ALS induced dysarthria by integrating machine learning and speech processing techniques of voice conversion (VC) and TTS. This is challenging as only small quantities of dysarthric speech are available from individual patients. Our system includes both timbre and prosody conversion for VC, neural TTS to generate TTS speech, and neural feature converter to interface VC and TTS. We collected speech data from 4 ALS target speakers with mild to severe dysarthria. Subjective listening tests showed that on average, our approach improved speech intelligibility by about 72% over the target speakers’ speech, the converted voice was 2 to 3 times more similar to ALS targets than to TTS sources, and the converted speech quality was in the MOS scale of fair to good.
Date of Conference: 27-30 July 2021
Date Added to IEEE Xplore: 10 August 2021
ISBN Information: