Skip to Main Content
In this paper we present a novel hierarchical and scalable three-stage algorithm to effectively perform musical audio semantic segmentation. In the first stage, the energy spectrum of the entire audio track is analyzed to find significant energy textures that may characterize different semantic segments; in the second and third stages, tonal and timbric features are used to refine the segmentation by moving or deleting segment boundaries. Experimental results on a set of 58 songs show that our algorithm is able to attain good semantic segmentation just after the first step, with a precision of 64% and a recall of 96%. After second step the precision increases to 79%; the best precision result is obtained after the third step, where a value of 85% is reached. In this step the minimum average recall value of 92% is obtained.