Skip to Main Content
The amount of multimedia sources from websites is extremely growing up every day. How to effectively search data and to find out what we need becomes a critical issue. In this work, four affective modes of exciting/happy, angry, sad and calm in songs and speeches are investigated. A song clip is partitioned into the main and refrain parts each of which is analyzed by the tempo, normalized intensity mean and rhythm regularity. In a speech clip, the standard deviation of fundamental frequencies, the standard deviation of pauses and the mean of zero crossing rates are computed to understand a speaker's emotion. Particularly, the Gaussian mixture model is built and used for classification. In our experimental results, the averaged accuracies associated with the main and refrain parts of songs, and speeches can be 55%, 60% and 80%, respectively. Therefore, the method proposed herein can be employed to analyze songs and speeches downloaded from websites, and then provide emotion information to a user.