Conferences >2021 IEEE/ACM 43rd Internatio...

Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to lin...Show More

Metadata

Abstract:

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated accuracy and efficiency of three BERT architectures. Results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework outperform classical IR trace models. On the three evaluated real-word OSS projects, the best T-BERT stably outperformed the VSM model with average improvements of 60.31% measured using Mean Average Precision (MAP). RNN severely underperformed on these projects due to insufficient training data, while T-BERT overcame this problem by using pretrained language models and transfer learning.

Published in: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Date of Conference: 22-30 May 2021

Date Added to IEEE Xplore: 07 May 2021

Print ISBN:978-1-6654-0296-5

Print ISSN: 1558-1225

DOI: 10.1109/ICSE43902.2021.00040

Conference Location: Madrid, ES

Funding Agency:

Contents

I. Introduction

Software and systems traceability, is the ability to create and maintain relations between software artifacts and to leverage the resulting network of links to support queries about the product and its development process. Traceability is deemed essential in safety-critical systems where it is prescribed by certifying bodies such as the USA Federal Aviation Administration (FAA), USA Food and Drug Administration (FAA) [1]. When present, trace links support diverse software engineering activities such as impact analysis, compliance validation, and safety assurance. Unfortunately, in practice, the cost and effort of manually creating and maintaining trace links can be inhibitive, and therefore trace links are typically incomplete and inaccurate [2]. As a result, traceability data is often not trusted by developers and is often greatly underutilized.

References is not available for this document.

Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?