Spectral Dynamics as a Source of Discontinuity in Concatenative Speech Synthesis
Kirkpatrick, B.; Oapos;Brien, D.; Scaife, R.; Errity, A.
Digital Signal Processing, 2007 15th International Conference on
Volume , Issue , 1-4 July 2007 Page(s):615 - 618
Digital Object Identifier 10.1109/ICDSP.2007.4288657
Summary:The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity have proven difficult to define and standard measures do not accurately reflect human perception of spectral discontinuity in concatenated speech. Previous studies on spectral join costs have focused predominantly on static spectral measures extracted from the unit boundary. In this paper spectral dynamic behaviour is investigated as a source of discontinuity in concatenated speech. A number of measures representing spectral dynamics are tested for the task of detecting discontinuities. The spectral dynamic measures tested contain information correlating with human perception of discontinuities, suggesting that spectral dynamics are a source of discontinuity in concatenated speech. A strategy to effectively combine dynamic and static measures is proposed using principal component analysis (PCA).
View citation and abstract |