Skip to Main Content
Dialect generation is one of the most important aspects of Chinese speech synthesis. Using the method of conversion prosodic features we can realize high quality speech synthesis. Firstly, A Lanzhou dialect corpus has been built based on "word-list in dialectal survey" for the generation of Lanzhou dialect. Speech corpus was recorded with contrastive (Lanzhou dialect vs. Mandarin) recordings. A pitch target model is introduced, which is optimized to describe feature parameters of the Mandarin speech and Lanzhou dialect speech in the training set of speech corpus. Secondly, the Gaussian Mixture Model (GMM) can map the subtle prosody distributions between Mandarin and Lanzhou dlialect speech, we train GMM conversion parameter in the training set, and get converted F0 contours of Lanzhou dialect speech by GMM conversion parameter. Using the converted Lanzhou dlialect F0 contours, we can generate high quality Lanzhou dlialect speech by STRAIGHT algorithm. Subjective experiments demonstrated that the generated speech achieve 4.06 of the average mean opinion score(MOS).