Skip to Main Content
In this paper, we explore nonlinear methods, inspired by the fractal theory for the analysis of the structure of music signals at multiple time scales, which is of importance both for their modeling and for their automatic computer-based recognition. We propose the multiscale fractal dimension (MFD) profile as a short-time descriptor, useful to quantify the multiscale complexity and fragmentation of the different states of the music waveform. We have experimentally found that this descriptor can discriminate several aspects among different music instruments, which is verified by further analysis on synthesized sinusoidal signals. We compare the descriptiveness of our features against that of Mel frequency cepstral coefficients (MFCCs), using both static and dynamic classifiers such as Gaussian mixture models (GMMs) and hidden Markov models (HMMs). The method and features proposed in this paper appear to be promising for music signal analysis, due to their capability for multiscale analysis of the signals and their applicability in recognition, as they accomplish an error reduction of up to 32%. These results are quite interesting and render the descriptor of direct applicability in large-scale music classification tasks.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:21 , Issue: 4 )
Date of Publication: April 2013