Skip to Main Content
This paper introduces a new technique to discriminate between music and speech. The strategy is based on the concept of multiple fundamental frequencies estimation, which provides the elements for the extraction of three features from the signal. The discrimination between speech and music is obtained by properly combining such features. The reduced number of features, together with the fact that no training phase is necessary, makes this strategy very robust to a wide range of practical conditions. The performance of the technique is analyzed taking into account the precision of the speech/music separation, the robustness face to extreme conditions, and computational effort. A comparison with previous works reveals an excellent performance under all points of view.