Skip to Main Content
Emotion recognition from speech plays an important role in developing affective and intelligent systems. This study investigates sentence-level emotion recognition. We propose to use a two-step approach to leverage information from sub sentence segments for sentence level decision. First we use a segment level emotion classifier to generate predictions for segments within a sentence. A second component combines the predictions from these segments to obtain a sentence level decision. We evaluate different segment units (words, phrases, time-based segments) and different decision combination methods (majority vote, average of probabilities, and a Gaussian Mixture Model (GMM)). Our experimental results on two different data sets show that our proposed method significantly outperforms the standard sentence-based classification approach. In addition, we find that using time-based segments achieves the best performance, and thus no speech recognition or alignment is needed when using our method, which is important to develop language independent emotion recognition systems.