Skip to Main Content
In this paper, we examine the 1/f nature of voiced speech residual signal. Speech signals are generally classified as voiced or unvoiced. Voiced speech signals are considered to be generated by the vocal cords vibration signal exciting the vocal tract. In our model, the vocal tract is considered as a linear system. The excitation signal is the 1/f noise which is called the speech residual signal. Since the vowels are the largest and the most evident voiced phoneme group, we study some of these vowels, i.e., /IY/, /IH/, /EI/, /EH/, /AE/, /ER/, /AH/, /AW/, /OA/, /OO/, /UW/ and /UH/ which are generated by several men and women. To extract the speech residual, first, we force whiten the power spectrum of the speech signal by using a pre-emphasis filter and then perform the linear predictive analysis on the whitened speech to obtain the vocal tract parameters. The speech residual signal is obtained by the inverse filter. A wavelet decomposition technique is applied to the residual signal to obtain the wavelet coefficients. The power-law relationship is observed in the progression of the variances of these coefficients along scales. The self-similarity parameters (the slope of the progression) are then estimated. We investigate and compare the behavior of the self-similarity parameters for the speech of 40 men and women.