Abstract:
This paper presents a voice conversion framework that uses phonetic information in an exemplar-based voice conversion approach. The proposed idea is motivated by the fact...Show MoreMetadata
Abstract:
This paper presents a voice conversion framework that uses phonetic information in an exemplar-based voice conversion approach. The proposed idea is motivated by the fact that phone-dependent exemplars lead to better estimation of activation matrix, therefore, possibly better conversion. We propose to use the phone segmentation results from automatic speech recognition (ASR) to construct a sub-dictionary for each phone. The proposed framework can work with or without parallel training data. With parallel training data, we found that phonetic sub-dictionary outperforms the state-of-the-art baseline in objective and subjective evaluations. Without parallel training data, we use Phonetic PosteriorGrams (PPGs) as the speaker-independent exemplars in the phonetic sub-dictionary to serve as a bridge between speakers. We report that such technique achieves a competitive performance without the need of parallel training data.
Date of Conference: 16-20 December 2017
Date Added to IEEE Xplore: 25 January 2018
ISBN Information: