Skip to Main Content
In our (knowledge-based) synthesis system [G. L. Jayavardhana Rama et al., 2002], we use single instances of basic-units, which are polyphones such as CV, VC, VCV, VCCV and VCCCV, where C stands for consonant and V for vowel. These basic-units are recorded in an isolated manner from a speaker and not from continuous speech or carrier-words. Modification of the pitch, amplitude and duration of basic-units is required in our speech synthesis system [G. L. Jayavardhana Rama et al., 2002] to ensure that the overall characteristics of the concatenated units matches with the true characteristic of the target word or sentence. Duration modification is carried out on the vowel parts of the basic-unit leaving the consonant portion in the basic-unit intact. Thus, we need to segment these polyphones into consonant and vowel parts. When the consonant present in any basic-unit is a plosive or fricative, the energy based method is good enough to segment the vowel and consonant parts. However, this method fails when there is a co-articulation between the vowel and the consonant. We propose the use of oriented principal component analysis (OPCA) to segment the co-articulated units. The test feature vectors (LPC-cepstrum & mel-cepstrum) are projected on the consonant and vowel subspaces. Each of these subspaces are represented by generalized eigenvectors obtained by applying OPCA on the training feature vectors. Our approach successfully segments co-articulated basic-units.
Statistical Signal Processing, 2003 IEEE Workshop on
Date of Conference: 28 Sept.-1 Oct. 2003