Skip to Main Content
This work describes experiments on using noisy adaptation data to create personalized voices with HMM-based speech synthesis. We investigate how environmental noise affects feature extraction and CSMAPLR and EMLLR adaptation. We investigate effects of regression trees and data quantity and test noise-robust feature streams for alignment and NMF-based source separation as preprocessing. The adaptation performance is evaluated using a listening test developed for noisy synthesized speech. The evaluation shows that speaker-adaptive HMM-TTS system is robust to moderate environmental noise.