Skip to Main Content
This paper presents a parameter generation method for hidden Markov model (HMM)-based statistical parametric speech synthesis that uses a similarity measure for probability distributions. In contrast to conventional maximum output probability parameter generation (MOPPG), the method we propose derives a parameter generation criterion from the distribution characteristics of the generated acoustic features. Kullback-Leibler (KL) divergence between the sentence HMM used for parameter generation and the HMM estimated from the generated features is calculated by upper bound approximation. During parameter generation, this KL divergence is minimized either by optimizing the generated acoustic parameters directly or by applying a linear transform to the MOPPG outputs. Our experiments show both these approaches are effective for alleviating over-smoothing in the generated spectral features and for improving the naturalness of synthetic speech. Compared with the direct optimization approach, which is susceptible to over-fitting, the feature transform approach gives better performance. In order to reduce the computational complexity of transform estimation, an offline training method is further developed to estimate a global transform under the minimum KL divergence criterion for the training set. Experimental results show that this global transform is as effective as the transform estimated for each sentence at synthesis stage.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:20 , Issue: 5 )
Date of Publication: July 2012