Abstract:
When one hears his/her recorded voices for the first time, s/he is probably surprised and maybe disappointed at the differences in voice quality between the recorded voic...Show MoreMetadata
Abstract:
When one hears his/her recorded voices for the first time, s/he is probably surprised and maybe disappointed at the differences in voice quality between the recorded voices and his/her own voices. Conversion from recorded voices of a speaker to his/her own voices was technically investigated in previous studies, and in the current study, we propose a novel framework for conversion. Here, four new ideas are introduced and some of them are tested experimentally: a) multiple pathways of in-body voice transmission from the oral cavity to the inner ear are taken into account for recording, b) body-conducted speech, not bone-conducted speech, is defined and simulated, c) a special device is prepared to avoid habituation effects in listening tests, and d) a network-based voice conversion technique is applied to generate one's own voices from his/her recorded voices by using a parallel corpus developed with the above three ideas. Experiments show that the proposed framework can generate one's own voices with higher quality compared to a conventional method, even in cross-language contexts. It is very interesting that body-conducted speech has an unexpectedly larger role to simulate one's own voices compared to air-conducted speech.
Published in: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Date of Conference: 14-17 December 2021
Date Added to IEEE Xplore: 03 February 2022
ISBN Information:
ISSN Information:
Conference Location: Tokyo, Japan