Skip to Main Content
We present a novel method for decomposing speech into voiced and unvoiced components. After demodulating the variations in the spectral envelope, energy and pitch, the method involves applying a bank of Kalman filters to separate the harmonic and non-harmonic components of the signal. This approach relies on a state-space representation of the composite signal, and provides a way to estimate accurately the harmonic component without the large delay required by a linear phase comb filter. However it also requires prior knowledge of the variance of the unvoiced component and the state transition parameters. We present a novel method to determine these parameters accurately based on a variant of the expectation-maximization algorithm. Modifications for dealing with unvoiced segments and voicing onset are also described.