Skip to Main Content
Typically, a machine learning model of automatic music emotion recognition is trained to learn the relationship between music features and perceived emotion values. However, simply assigning an emotion value to a clip in the training phase does not work well because the perceived emotion of a clip varies from person to person. To resolve this problem, we propose a novel approach that represents the perceived emotion of a clip as a probability distribution in the emotion plane. In addition, we develop a methodology that predicts the emotion distribution of a clip by estimating the emotion mass at discrete samples of the emotion plane. We also develop model fusion algorithms to integrate different perceptual dimensions of music listening and to enhance the modeling of emotion perception. The effectiveness of the proposed approach is validated through an extensive performance study. An average R2 statistics of 0.5439 for emotion prediction is achieved. We also show how this approach can be applied to enhance our understanding of music emotion.