Abstract:
High-quality corpora are of great importance for training dialogue generation models. We have developed a Chinese open-domain dialogue corpus which is collected from movi...Show MoreMetadata
Abstract:
High-quality corpora are of great importance for training dialogue generation models. We have developed a Chinese open-domain dialogue corpus which is collected from movie and television scripts, named the Chinese-Script-Dialogue. There are a total amount of 888,967 dialogues in the corpus and each dialogue consists of 4.58 turns on average. We asked human annotators to evaluate the quality of the corpus. Annotation results show that most of the dialogues are qualified. Based on this corpus, we designed a series of multi-turn dialogue systems, named Context-Related Dialogue Systems based on Transformer (CDST). Experiment results show that the CDSTs tend to generate more semantically related replies than the simple Transformer, i.e., to achieve higher BLEU score.
Published in: 2019 Chinese Control Conference (CCC)
Date of Conference: 27-30 July 2019
Date Added to IEEE Xplore: 17 October 2019
ISBN Information: