By Topic

IBM voice conversion systems for 2007 TC-STAR evaluation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
3 Author(s)
Shuang, Zhiwei ; IBM China Research Lab, Beijing 100084, China ; Bakis, Raimo ; Qin, Yong

This paper proposes a novel voice conversion method by frequency warping. The frequency warping function is generated based on mapping formants of the source speaker and the target speaker. In addition to frequency warping, fundamental frequency adjustment, spectral envelope equalization, breathiness addition, and duration modification are also used to improve the similarity to the target speaker. The proposed voice conversion method needs only a very small amount of training data for generating the warping function, thereby greatly facilitating its application. Systems based on the proposed method were used for the 2007 TC-STAR intra-lingual voice conversion evaluation for English and Spanish and a cross-lingual voice conversion evaluation for Spanish. The evaluation results show that the proposed method can achieve a much better quality of converted speech than other methods as well as a good balance between quality and similarity. The IBM1 system was ranked No.1 for English evaluation and No.2 for Spanish evaluation. Evaluation results also show that the proposed method is a convenient and competitive method for crosslingual voice conversion tasks.

Published in:

Tsinghua Science and Technology  (Volume:13 ,  Issue: 4 )