Skip to Main Content
This paper develops a general detection theory for speech analysis based on time-varying autoregressive models, which themselves generalize the classical linear predictive speech analysis framework. This theory leads to a computationally efficient decision-theoretic procedure that may be applied to detect the presence of vocal tract variation in speech waveform data. A corresponding generalized likelihood ratio test is derived and studied both empirically for short data records, using formant-like synthetic examples, and asymptotically, leading to constant false alarm rate hypothesis tests for changes in vocal tract configuration. Two in-depth case studies then serve to illustrate the practical efficacy of this procedure across different time scales of speech dynamics: first, the detection of formant changes on the scale of tens of milliseconds of data, and second, the identification of glottal opening and closing instants on time scales below ten milliseconds.