Skip to Main Content
In this paper we design a system that adopts a novel approach for emotional classification from human dialogue based on text and speech context. Our main objective is to boost the accuracy of speech emotional classification by accounting for the features extracted from the spoken text. The proposed system concatenates text and speech features and feeds them as one input to the classifier. The work builds on past research on music mood classification based on the combination of lyrics and audio features. The innovation in our approach is in the specific application of text and speech fusion for emotion classification and in the choice of features, Furthermore, in the absence of benchmark data, a dataset of movie quotes was developed for testing of emotional classification and future benchmarking. The comparison of the results obtained in each case shows that the hybrid text-speech approach achieves better accuracy than speech or text mining alone.