In this paper, we propose a new general low-level feature representation for audio signals. Our approach, called Dominant Audio Descriptor is inspired by the MPEG-7 Dominant Color Descriptor. It is based on clustering time-local features and identifying dominant components. The features used to illustrate this approach are the well-known Mel Frequency Cepstral Coefficients. The performance of the proposed framework is evaluated on audio classification and retrieval tasks. In particular, the experiments are performed on a benchmark music data set. The results are compared to those previously obtained on the same data base. We show that our approach improved classification and retrieval results by more then 3%, and for the case of retrieval reached almost perfect retrieval rate of 99:36%. In addition, the paper presents comparative results against several state of the art classifiers, such as Hidden Markov Models, Support Vector Machines and k-Nearest Neighbors.
Published in:
Machine Learning and Applications, 2009. ICMLA '09. International Conference on
Date of Conference: 13-15 Dec. 2009