Skip to Main Content
A novel multi-label annotation method is proposed and applied to music tagging. Each music recording is represented by its auditory temporal modulations (ATMs). Given a set of training music recordings represented by the tag-music recording matrix having zero-one (indicator) vectors of the tags associated with each recording in its columns along with the matrix of the ATM representations in its columns, a low-rank weight matrix is sought, such that the tag-music recording matrix is expressed as the product of the weight matrix and the matrix of the ATM representations plus an error matrix. Clearly, such a weight matrix captures the relationships between the labels (i.e., tags) and the audio features. It can be derived by solving a convex nuclear norm minimization problem, if the tag-music recording matrix and the matrix of the ATM representations are assumed to be jointly low-rank. Having found the weight matrix, the annotation vector for labeling any test music recording can be obtained by multiplying the weight matrix with its ATM representation. The just outlined method is referred to as low-rank representation based multi-label annotation (LRRMA). The LRRMA outperforms the state-of-the-art auto-tagging systems, when applied to the CAL500 dataset in a 5-fold cross-validation experimental protocol.