By Topic

Automatic music tagging by low-rank representation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Panagakis, Y. ; Dept. of Inf., Aristotle Univ. of Thessaloniki, Thessaloniki, Greece ; Kotropoulos, C.

A novel multi-label annotation method is proposed and applied to music tagging. Each music recording is represented by its auditory temporal modulations (ATMs). Given a set of training music recordings represented by the tag-music recording matrix having zero-one (indicator) vectors of the tags associated with each recording in its columns along with the matrix of the ATM representations in its columns, a low-rank weight matrix is sought, such that the tag-music recording matrix is expressed as the product of the weight matrix and the matrix of the ATM representations plus an error matrix. Clearly, such a weight matrix captures the relationships between the labels (i.e., tags) and the audio features. It can be derived by solving a convex nuclear norm minimization problem, if the tag-music recording matrix and the matrix of the ATM representations are assumed to be jointly low-rank. Having found the weight matrix, the annotation vector for labeling any test music recording can be obtained by multiplying the weight matrix with its ATM representation. The just outlined method is referred to as low-rank representation based multi-label annotation (LRRMA). The LRRMA outperforms the state-of-the-art auto-tagging systems, when applied to the CAL500 dataset in a 5-fold cross-validation experimental protocol.

Published in:

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Date of Conference:

25-30 March 2012