WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition | IEEE Conference Publication | IEEE Xplore
Scheduled Maintenance: On Monday, 30 June, IEEE Xplore will undergo scheduled maintenance from 1:00-2:00 PM ET (1800-1900 UTC).
On Tuesday, 1 July, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC).
During these times, there may be intermittent impact on performance. We apologize for any inconvenience.

WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition


Abstract:

A significant new speech corpus of British English has been recorded at Cambridge University. Derived from the Wall Street Journal text corpus, WSJCAMO constitutes one of...Show More

Abstract:

A significant new speech corpus of British English has been recorded at Cambridge University. Derived from the Wall Street Journal text corpus, WSJCAMO constitutes one of the largest corpora of spoken British English currently in existence. It has been specifically designed for the construction and evaluation of speaker-independent speech recognition systems. The database consists of 140 speakers each speaking about 110 utterances. This paper describes the motivation for the corpus, the processes undertaken in its construction and the utilities needed as support tools. All utterance transcriptions have been verified and a phonetic dictionary has been developed to cover the training data and evaluation tasks. Two evaluation tasks have been defined using standard 5000 word bigram and 20000 word trigram language models. The paper concludes with comparative results on these tasks for British and American English.
Date of Conference: 09-12 May 1995
Date Added to IEEE Xplore: 06 August 2002
Print ISBN:0-7803-2431-5
Print ISSN: 1520-6149
Conference Location: Detroit, MI, USA

Contact IEEE to Subscribe

References

References is not available for this document.