Skip to Main Content
This paper presents a system that automatically labels tones and break indices (ToBI) events. The detection (binary classification) of prosodic events has received significantly more attention from researchers than its classification because of the intrinsic difficulty of classification. We focus on the classification problem, identifying eight types of pitch accent tones, nine types of boundary tones and five types of break indices. The complex multi-class classification problem is divided into several simpler problems, by means of pairwise coupling. We propose to combine two-class classifiers to achieve the multi-class classification because two-class problems provide high accuracy results. Furthermore, complementarity between artificial neural networks and decision trees classifiers has been exploited to improve the final system, combining their outputs using a fusion method. This proposal, together with the adequate feature extraction that includes the use of features such as the Tilt and Bézier parameters, allows us to achieve a total classification accuracy of 70.8% for pitch accents, 84.2% for boundary tones and 74.6% for break indices, on the Boston University Radio News Corpus. The analysis of the misclassified samples shows that the types of mistakes that the system makes do not differ significantly from the common confusions that are observed in manual ToBI inter-transcriber tests.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:20 , Issue: 7 )
Date of Publication: Sept. 2012