ASGAN-VC: One-Shot Voice Conversion with Additional Style Embedding and Generative Adversarial Networks | IEEE Conference Publication | IEEE Xplore

ASGAN-VC: One-Shot Voice Conversion with Additional Style Embedding and Generative Adversarial Networks


Abstract:

In this paper, we present a voice conversion system that improves the quality of generated voice and its similarity to the target voice style significantly. Many VC syste...Show More

Abstract:

In this paper, we present a voice conversion system that improves the quality of generated voice and its similarity to the target voice style significantly. Many VC systems use feature-disentangle-based learning techniques to separate speakers' voices from their linguistic content in order to translate a voice into another style. This is the approach we are taking. To prevent speaker-style information from obscuring the content embedding, some previous works quantize or reduce the dimension of the embedding. However, an imperfect disentanglement would damage the quality and similarity of the sound. In this paper, to further improve quality and similarity in voice conversion, we propose a novel style transfer method within an autoencoder-based VC system that involves generative adversarial training. The conversion process was objectively evaluated using the fair third-party speaker verification system, the results shows that ASGAN-VC outperforms VQVC + and AGAINVC in terms of speaker similarity. A subjectively observing that our proposal outperformed the VQVC + and AGAINVC in terms of naturalness and speaker similarity.
Date of Conference: 07-10 November 2022
Date Added to IEEE Xplore: 21 December 2022
ISBN Information:

ISSN Information:

Conference Location: Chiang Mai, Thailand

Contact IEEE to Subscribe

References

References is not available for this document.