By Topic

Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Saari, P. ; Dept. of Music, Univ. of Jyvaskyla, Jyvaskyla, Finland ; Eerola, T. ; Lartillot, O.

Classification of musical audio signals according to expressed mood or emotion has evident applications to content-based music retrieval in large databases. Wrapper selection is a dimension reduction method that has been proposed for improving classification performance. However, the technique is prone to lead to overfitting of the training data, which decreases the generalizability of the obtained results. We claim that previous attempts to apply wrapper selection in the field of music information retrieval (MIR) have led to disputable conclusions about the used methods due to inadequate analysis frameworks, indicative of overfitting, and biased results. This paper presents a framework based on cross-indexing for obtaining realistic performance estimate of wrapper selection by taking into account the simplicity and generalizability of the classification models. The framework is applied on sets of film soundtrack excerpts that are consensually associated with particular basic emotions, comparing Naive Bayes, k-NN, and SVM classifiers using both forward selection (FS) and backward elimination (BE). K-NN with BE yields the most promising results - 56.5% accuracy with only four features. The most useful feature subset for k-NN contains mode majorness and key clarity, combined with dynamical, rhythmical, and structural features.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:19 ,  Issue: 6 )