Skip to Main Content
Accent is the pattern of pronunciation and acoustic features in speech which can identify a person's linguistic, social or cultural background. It is an important source of inter-speaker variability, and a particular problem for automated speech recognition. Current approaches to the identification of speaker accent may require specialised linguistic knowledge or analysis of the particular speech contrasts, and often extensive pre-processing on large amounts of data. An accent classification system using time-based segments consisting of Mel Frequency Cepstral Coefficients as features and employing Support Vector Machines is studied for a small corpus of two accents of English. On one- to four-second audio samples from three topics, accuracy in the binary classification task is up to 75% to 97.5%, with very high recall and precision. Its use with mis-matched content is at best 85% with a tendency towards majority-class classification if the accent groups are significantly imbalanced.