Spectral Dynamics as a Source of Discontinuity in Concatenative Speech Synthesis
Kirkpatrick, B.
O'Brien, D.
Scaife, R.
Errity, A.
Dublin City Univ., Dublin;
This paper appears in: Digital Signal Processing, 2007 15th International Conference on
Publication Date: 1-4 July 2007
On page(s): 615-618
Location: Cardiff,
ISBN: 1-4244-0882-2
INSPEC Accession Number: 9855707
Digital Object Identifier: 10.1109/ICDSP.2007.4288657
Current Version Published: 2007-08-13
Abstract
The quality of concatenative speech synthesis depends on the cost function employed for unit selection. Effective cost functions for spectral continuity have proven difficult to define and standard measures do not accurately reflect human perception of spectral discontinuity in concatenated speech. Previous studies on spectral join costs have focused predominantly on static spectral measures extracted from the unit boundary. In this paper spectral dynamic behaviour is investigated as a source of discontinuity in concatenated speech. A number of measures representing spectral dynamics are tested for the task of detecting discontinuities. The spectral dynamic measures tested contain information correlating with human perception of discontinuities, suggesting that spectral dynamics are a source of discontinuity in concatenated speech. A strategy to effectively combine dynamic and static measures is proposed using principal component analysis (PCA).
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.