Abstract:
The process of transforming a source speaker’s vocal style or vocal feature to that of a target speaker while keeping the linguistic information of the source speaker unc...Show MoreMetadata
Abstract:
The process of transforming a source speaker’s vocal style or vocal feature to that of a target speaker while keeping the linguistic information of the source speaker unchanged is known as voice conversion (VC). Deep Learning algorithms have become widely used in VC processes in recent years. As a result of advances in deep learning algorithms, especially in deep generative models, they have been used for a number of applications such as automatic movie dubbing, singing voice conversion, military applications, etc. Despite of the immense improvements of deep learning based VC models, the objective analysis of the synthesized speech samples still exists as a major challenge. Previous studies in this domain have primarily focused on the developments of VC models. However, there is scope for the analysis of objective evaluation metrics utilised for performance analysis of VC models. As a result in this work, the extensive study of the VC processes, the speech features used in VC, and the objective evaluation metrics of VC models are explored in detail.
Date of Conference: 24-26 November 2022
Date Added to IEEE Xplore: 16 February 2023
ISBN Information: