Curated Email-Based Code Reviews Datasets | IEEE Conference Publication | IEEE Xplore

Curated Email-Based Code Reviews Datasets


Abstract:

Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e....Show More

Abstract:

Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore, this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patch-work data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.
Date of Conference: 15-16 April 2024
Date Added to IEEE Xplore: 18 June 2024
ISBN Information:

ISSN Information:

Conference Location: Lisbon, Portugal

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.