In this paper, we present an iterative method for the accurate estimation of amplitude and frequency modulations (AM-FM) in time-varying multi-component quasi-periodic signals such as voiced speech. Based on a deterministic plus noise representation of speech initially suggested by Laroche (“HNM: A simple, efficient harmonic plus noise model for speech,” Proc. WASPAA, Oct., 1993, pp. 169-172), and focusing on the deterministic representation, we reveal the properties of the model showing that such a representation is equivalent to a time-varying quasi-harmonic representation of voiced speech. Next, we show how this representation can be used for the estimation of amplitude and frequency modulations and provide the conditions under which such an estimation is valid. Finally, we suggest an adaptive algorithm for nonparametric estimation of AM-FM components in voiced speech. Based on the estimated amplitude and frequency components, a high-resolution time-frequency representation is obtained. The suggested approach was evaluated on synthetic AM-FM signals, while using the estimated AM-FM information, speech signal reconstruction was performed, resulting in a high signal-to-reconstruction error ratio (around 30 dB).
Published in:
Audio, Speech, and Language Processing, IEEE Transactions on
(Volume:19
,
Issue:
2
)
Date of Publication: Feb. 2011