1. INTRODUCTION
Future cast system (FCS) [1] is the world's first entertainment system which enables anyone to easily participate in a prerecorded movie as an instant CG movie star. FCS can automatically perform all the processes from capturing participant's facial characteristics using a 3D range scanner for rendering them into the movie. Additionally, this system allocates a suitable role for the participants in the story and each actor begins to speak and perform in a fully CG based movie as vividly as any real actor. However, the prerecorded voice of either an actor or actress is used as a substitute for that of each participant. The substitute voice is selected depending on only each participant's gender information which is estimated based on the scanned face shape without consideration of other information such as age and voice quality. This caused some mismatch for those who perceive the voice of the character to be different from their own or the people they know. Therefore we decided to focus on selecting the similar speaker from speech database to reduce the mismatch. We propose a method to measure the perceptual similarity of speech, because it is impossible to record all speech of participants in advance, and to convert voice quality sufficiently with the present conversion technology.