Loading [MathJax]/extensions/MathMenu.js
Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models | IEEE Conference Publication | IEEE Xplore

Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models


Abstract:

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to lin...Show More

Abstract:

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated accuracy and efficiency of three BERT architectures. Results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework outperform classical IR trace models. On the three evaluated real-word OSS projects, the best T-BERT stably outperformed the VSM model with average improvements of 60.31% measured using Mean Average Precision (MAP). RNN severely underperformed on these projects due to insufficient training data, while T-BERT overcame this problem by using pretrained language models and transfer learning.
Date of Conference: 22-30 May 2021
Date Added to IEEE Xplore: 07 May 2021
Print ISBN:978-1-6654-0296-5
Print ISSN: 1558-1225
Conference Location: Madrid, ES

Funding Agency:


I. Introduction

Software and systems traceability, is the ability to create and maintain relations between software artifacts and to leverage the resulting network of links to support queries about the product and its development process. Traceability is deemed essential in safety-critical systems where it is prescribed by certifying bodies such as the USA Federal Aviation Administration (FAA), USA Food and Drug Administration (FAA) [1]. When present, trace links support diverse software engineering activities such as impact analysis, compliance validation, and safety assurance. Unfortunately, in practice, the cost and effort of manually creating and maintaining trace links can be inhibitive, and therefore trace links are typically incomplete and inaccurate [2]. As a result, traceability data is often not trusted by developers and is often greatly underutilized.

Contact IEEE to Subscribe

References

References is not available for this document.