Loading [a11y]/accessibility-menu.js
Non-parallel voice conversion using joint optimization of alignment by temporal context and spectral distortion | IEEE Conference Publication | IEEE Xplore

Non-parallel voice conversion using joint optimization of alignment by temporal context and spectral distortion


Abstract:

Many voice conversion systems require parallel training sets of the source and target speakers. Non-parallel training is more complicated as it involves evaluation of sou...Show More

Abstract:

Many voice conversion systems require parallel training sets of the source and target speakers. Non-parallel training is more complicated as it involves evaluation of source-target correspondence along with the conversion function itself. INCA is a recently proposed method for non-parallel training, based on iterative estimation of alignment and conversion function. The alignment is evaluated using a simple nearest-neighbor search, which often leads to phonetic miss-matched source-target pairs. We propose here a generalized approach, denoted as Temporal-Context INCA (TC-INCA), based on matching temporal context vectors. We formulate the training stage as a minimization problem of a joint cost, considering both context-based alignment and conversion function. We show that TC-INCA reduces the joint cost and prove its convergence. Experimental results indicate that TC-INCA significantly improves the alignment accuracy, compared to INCA. Moreover, subjective evaluations show that TC-INCA leads to improved quality of the synthesized output signals, when small training sets are used.
Date of Conference: 04-09 May 2014
Date Added to IEEE Xplore: 14 July 2014
Electronic ISBN:978-1-4799-2893-4

ISSN Information:

Conference Location: Florence, Italy

References

References is not available for this document.