Skip to Main Content
We motivate and explain the DlSCoH project, which uses a publicly deployed spoken dialogue system for conference services to collect a richly annotated corpus of mixed-initiative human- machine spoken dialogues. System users are able to call a phone number and learn about a conference, including paper submission, program, venue, accommodation options and costs, etc. The collected corpus is (1) usable for training, evaluating and comparing statistical models, (2) naturally spoken and task oriented, (3) extendible / generalizable, (4) collected using state-of-the-art research and commercial technology, (5) freely available to researchers. We explain the principles behind the dialogue context representations and reward signals collected by the system, as well as the overall system design, call types, and call flow. We also present results regarding the initial ASR models and spoken language understanding models. We expect the resulting corpora to be used in advanced dialogue research over the coming years.