Skip to Main Content
We address the robustness of features for fully automatic recognition of vibrato, which is usually defined as a periodic oscillation of the pitch (F0) of the singing voice, in recorded polyphonic music. Using an evaluation database covering jazz, pop and opera music, we show that the extraction of pitch is challenging in the presence of instrumental accompaniment, leading to unsatisfactory classification accuracy (61.1 %) if only the F0 frequency spectrum is used as features. To alleviate, we investigate alternative functionals of F0, alternative low-level features besides F0, and extraction of vocals by monaural source separation. Finally, we propose to use inter-quartile ranges of F0 delta regression coefficients as features which are highly robust against pitch extraction errors, reaching up to 86.9% accuracy in real-life conditions without any signal enhancement.