Skip to Main Content
This paper proposes an efficient feature vector classification for Speech Emotion Recognition (SER) in service robots. Since service robots interact with diverse users who are in various emotional states, two important issues should be addressed: acoustically similar characteristics between emotions and variable speaker characteristics due to different user speaking styles. Each of these issues may cause a substantial amount of overlap between emotion models in feature vector space, thus decreasing SER accuracy. In order to reduce the effects caused by such overlaps, this paper proposes an efficient feature vector classification for SER. The conventional feature vector classification applied to speaker identification categorizes feature vectors as overlapped and non-overlapped. Because this method discards all of the overlapped vectors in model reconstruction, it has limitations in constructing robust models when the number of overlapped vectors is significantly increased such as in emotion recognition. The method proposed herein classifies overlapped vectors in a more sophisticated manner, selecting discriminative vectors among overlapped vectors, and adds those vectors in model reconstruction. On SER experiments using an emotional speech corpus, the proposed classification approach exhibited superior performance to conventional methods, and displayed an almost human-level performance. In particular, we achieved commercially applicable performance for two-class (negative vs. non-negative) emotion recognition.