Conferences >2008 IEEE International Confe...

Perceptual similarity measurement of speech by combination of acoustic features

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Future cast system is a new entertainment system where participant’s face is captured and rendered into the movie as an instant Computer Graphics (CG) movie star, which h...Show More

Metadata

Abstract:

Future cast system is a new entertainment system where participant’s face is captured and rendered into the movie as an instant Computer Graphics (CG) movie star, which had been first exhibited at the 2005 World Exposition in Aichi Japan. We are working to add new functionality which enables mapping not only faces but also speech individualities to the cast. Our approach is to find a speaker with the closest speech individuality and apply voice conversion. This paper investigates acoustic features to estimate perceptual similarity of speech individuality. We propose a method linearly combined eight acoustic features related to the perception of speech individualities. The proposed method optimizes weights for the acoustic features considering perceptual similarities. We have evaluated performance of our method with Spearman’s rank correlation coefficients to perceptual similarities. As the results, the experiments evidenced that the proposed method achieves a correlation coefficient of 0.66.

Published in: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing

Date of Conference: 31 March 2008 - 04 April 2008

Date Added to IEEE Xplore: 12 May 2008

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP.2008.4518746

Conference Location: Las Vegas, NV, USA

Contents

1. INTRODUCTION

Future cast system (FCS) [1] is the world's first entertainment system which enables anyone to easily participate in a prerecorded movie as an instant CG movie star. FCS can automatically perform all the processes from capturing participant's facial characteristics using a 3D range scanner for rendering them into the movie. Additionally, this system allocates a suitable role for the participants in the story and each actor begins to speak and perform in a fully CG based movie as vividly as any real actor. However, the prerecorded voice of either an actor or actress is used as a substitute for that of each participant. The substitute voice is selected depending on only each participant's gender information which is estimated based on the scanned face shape without consideration of other information such as age and voice quality. This caused some mismatch for those who perceive the voice of the character to be different from their own or the people they know. Therefore we decided to focus on selecting the similar speaker from speech database to reduce the mismatch. We propose a method to measure the perceptual similarity of speech, because it is impossible to record all speech of participants in advance, and to convert voice quality sufficiently with the present conversion technology.

References is not available for this document.

Perceptual similarity measurement of speech by combination of acoustic features

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Perceptual similarity measurement of speech by combination of acoustic features

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?