Loading [MathJax]/extensions/MathMenu.js
NMT vs MLM: Which is the Best Paradigm for APR? | IEEE Conference Publication | IEEE Xplore

NMT vs MLM: Which is the Best Paradigm for APR?


Abstract:

Automated Program Repair (APR) has garnered significant attention in recent years, especially when combined with the latest advancements in deep learning, further enhanci...Show More

Abstract:

Automated Program Repair (APR) has garnered significant attention in recent years, especially when combined with the latest advancements in deep learning, further enhancing its automation and repair efficiency. Traditional learning-based APR typically employs Neural Machine Translation (NMT) technology, treating the repair task as a translation process from defective code to repaired code, known as the NMT paradigm. With the emergence of large pre-trained models, a novel approach considers the repair process as a fill-in-the-blanks task, masking defective code and using Masked Language Models (MLM) to predict the masked positions, resulting in repaired code, referred to as the MLM paradigm. However, the applicability of this new learning paradigm in APR has not been widely explored, and the differences in repair effectiveness compared to the NMT paradigm remain unclear. This paper delves into the performance differences between the MLM and NMT paradigms in APR through empirical research. Although both paradigms have drawn attention in the APR domain, their methodologies differ, necessitating a comprehensive evaluation and comparison in the same experimental environment, particularly in Natural Language Models (NLM) and Code Language Models (CLM).The results reveal that in NLM, the NMT paradigm excels with higher repair accuracy, leveraging large parallel corpora for supervised training and focusing on machine translation tasks, facilitating better learning of correspondences and semantic representations between languages. However, when dealing with more complex programming languages like CLM, the repair effectiveness of the NMT paradigm lags behind that of the MLM paradigm. NMT struggles to learn the syntax and semantic relationships of real-world programming languages, while the MLM paradigm comprehends code syntax and structure more effectively. It accurately identifies variable scopes, function call relationships, and dependencies between code blocks, allowing th...
Date of Conference: 30 June 2024 - 05 July 2024
Date Added to IEEE Xplore: 08 August 2024
ISBN Information:
Conference Location: Yokohama, Japan

Contact IEEE to Subscribe

References

References is not available for this document.