Summary form only given. In order to achieve sufficient improvement in speaker-adaptation techniques, such as the MLLR method, it is essential to obtain an adequate number of samples of the user's voice, rendering the application of the method difficult in practical environments. Prior development of a library of highly precise acoustic models is necessary to ensure high enough speech recognition performance from the outset of using the system. It is quite important to analyze a target acoustic space to design an efficient acoustic model library. However, the analysis of multidimensional acoustic space is generally a difficult task. In order to support the analysis of acoustic space through the capability of human visual perception, we proposed the COSMOS (COmprehensive Space Map of Objective Signal, previously aCOustic Space Map Of Sound) method. It features the visualization of an aggregate of acoustic models based on stochastic models, such as HMM and GMM, into a two-dimensional map (called COSMOS map) by utilizing a statistical multidimensional scaling technique of nonlinear projection. First, the paper formulates the COSMOS method. Then, a quantitative analysis of a speaking style COSMOS map is described. Error analysis of the mapping from multidimensional space to two-dimensional space in the COSMOS map is investigated. Furthermore, it is suggested that there exist multiple radiated axes of acoustic feature continuity in the COSMOS map.
Published in:
Nonlinear Signal and Image Processing, 2005. NSIP 2005. Abstracts. IEEE-Eurasip
Date of Conference: 18-20 May 2005