Abstract:
Today, software developers work on complex and fast-moving projects that often require instant assistance from other domain and subject matter experts. Chat servers such ...Show MoreMetadata
Abstract:
Today, software developers work on complex and fast-moving projects that often require instant assistance from other domain and subject matter experts. Chat servers such as Discord facilitate live communication and collaboration among developers all over the world. With numerous topics discussed in parallel, mining and analyzing the chat data of these platforms would offer researchers and tool makers opportunities to develop software tools and services such as automated virtual assistants, chat bots, chat summarization techniques, Q&A thesaurus, and more. In this paper, we propose a dataset called DISCO consisting of the one-year public DIScord chat COnversations of four software development communities. We have collected the chat data of the channels containing general programming Q&A discussions from the four Discord servers, applied a disentanglement technique [13] to extract conversations from the chat transcripts, and performed a manual validation of conversations on a random sample (500 conversations). Our dataset consists of 28, 712 conversations, 1,508,093 messages posted by 323, 562 users. As a case study on the dataset, we applied a topic modelling technique for extracting the top five general topics that are most discussed in each Discord channel.
Date of Conference: 23-24 May 2022
Date Added to IEEE Xplore: 21 June 2022
ISBN Information: