Improving Bug Localization With Effective Contrastive Learning Representation

Automated localization of buggy files can accelerate developers’ efficiency of software maintenance, improving the quality of software products. State-of-the-art approaches for bug localization is based on neural networks, e.g., RNN or CNN, and can learn semantic feature from the given bug report. However, these simple neural architectures are difficult to learn the deep contextual feature from bug reports, which hurts the semantic mapping between bug reports and their corresponding buggy files. To resolve the above problem, in this paper we propose a bug localization approach that combines pre-trained language models and contrastive learning, namely CoLoc. Specifically, CoLoc first is pre-trained on a large-scale bug report corpus in an unsupervised way, to learn the deep contextual feature of each token in the bug report according to its context. Afterward, CoLoc is further pre-trained by a contrastive learning objective to learn the contrastive learning representations both of bug reports and buggy files. Contrastive learning can help CoLoc to learn the semantic differences between different bug reports and buggy files. To evaluate the effectiveness of CoLoc, we choose five baseline approaches and compare their performance on a public dataset. The experimental results show that CoLoc outperforms all baseline approaches by up to 76.00% in terms of MRR, achieving new results for bug localization.


I. INTRODUCTION
Bug localization is an important process in the cycle of software maintenance [1]. Before fixing a newly reported bug, developers first need to locate where the bug appears. They thus must have a comprehensive understanding of the bug report and retrieve lots of source code files. According to prior research [2], there are more and more bug reports submitted to the bug tracker system, e.g., more than 10K submitted bug reports in the Mozilla 1 project in two months, which brings The associate editor coordinating the review of this manuscript and approving it for publication was Giuseppe Destefanis . 1 https://bugzilla.mozilla.org/home huge time cost to developers for locating newly produced bugs. It reduces the efficiency of software maintenance, causing a negative effect on the quality of software production.
To help developers improve software maintenance efficiency, researchers have proposed to automate the process of manual bug localization by designing automated tools. By using these tools, they can quickly locate the buggy file and reduce the time overhead of retrieving source code files, improving developers' software maintenance efficiency. Currently, deep learning-based bug localization approaches have shown great success due to the strong feature learning ability of neural networks. For example, Lam et al. proposed DNNLOC, a neural model that combines deep VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ neural network, rVSM, and the historical fixing information of the software project, thus it outperforms the state-of-theart information retrieval approach. To further improve the semantics learning ability, Huo et al. [3], [4] proposed a convolutional neural network (CNN) architecture, to learn the unified semantic feature from both natural language and source code. Xiao et al. [5] proposed an enhanced CNN-based approach, namely DeepLocator, which can leverage the textual information and structural information of source code. Additionally, Xiao et al. [6] further improved their proposed DeepLocator by introducing extra embedding techniques (i.e., Sent2Vec [7]) and proposed DeepLoc. Liang et al. proposed CAST, which introduces customized abstract syntax trees of source code as an extra input, and CAST can learn more semantic information from source code. The aforementioned approaches perform well for bug localization, however, there is still room for performance improvement. The reason is that existing neural approaches for bug localization are constructed by simple network architecture (e.g., one or two layers of CNN), which is hard to learn the deep and global contextual information of bug reports [8]. Moreover, as shown in Fig. 1, we found that most parts of the bug reports are composed of both natural language and source code. A simple and shallow architecture is hard to learn the deep semantic interaction between natural language and source code, which may limit the semantic mapping between bug reports and buggy files. To resolve the above problems, we propose CoLoc for more effective buggy file localization. CoLoc integrates pre-trained language model [9], [10] and contrastive learning [11]. Specifically, we build the neural architecture of CoLoc by stacking deep self-attention networks [12], [13]. Then, we pre-train CoLoc on a large-scale and unlabeled bug report corpus by masked language model objective [9], which is an unsupervised pre-training method and enables CoLoc to learn deep and global contextual information of bug reports, as well as the semantic interaction between source code and natural language. When finishing pre-training, we further pre-train CoLoc with a carefully designed contrastive learning objective, learning the semantic difference between unpaired bug reports and buggy files. Specifically, our designed contrastive learning objective makes CoLoc find the semantically matched buggy file for the given bug report from a batch of buggy files. Through such a learning process, CoLoc can effectively build the semantic mapping between bug reports and buggy files.
To evaluate the effectiveness of CoLoc, we select five baseline approaches and conduct a series of comparison experiments on a public dataset released by Xiao et al. [6], which contains about 18,500 bug reports from AspectJ, Eclipse, JDT, SWT, and Tomcat projects, respectively. According to the prior work [8], we split it into training, validation, and test sets. Specifically, we first pre-train CoLoc on a large-scale bug report corpus released by Fang et al. [2]. Afterward, we further pre-train CoLoc on the training set with our designed contrastive learning objective. Finally, we fine-tune CoLoc on the training set for bug localization and evaluate it on the validation and test sets. The experimental results show that CoLoc outperforms all baseline approaches by up to 15.34% (8 points absolute improvement) in terms of MAP (mean average precision). Additionally, our experimental results show that CoLoc is more effective than existing pre-trained language models, like BERT [9] and CodeBERT [14].
To sum up, we make the following contributions: • We propose a novel approach for automated bug localization, namely CoLoc. Our approach is built by combining pre-trained models and contrastive learning.
• We design a new contrastive learning objective to pretrain CoLoc, making it learn the semantic difference between unpaired bug reports and buggy files.
• We conduct extensive experiments to evaluate the effectiveness of CoLoc, and experimental results show that CoLoc outperforms all baseline approaches.
The remaining of this paper is organized as follows. Section II introduces the background knowledge and Section III presents the pipeline of our proposed approach. Section IV and Section V describe the experimental setups and experimental results. Section VI introduces the threats to validity and Section VII discusses the related work. Finally, we conclude this paper and point out the future work in Section VIII.

A. BUG REPORTS
Bug reports are used to describe the newly generated bug and are submitted by developers or users, which are stored in the bug tracking systems (e.g., Bugzilla, LogRocket). By carefully reading the bug report, developers can quickly understand where the bug is and what is wrong. Fig. 2 gives a bug report in the Eclipse project collected from the Bugzilla platform. From the figure, we can observe that a BR is composed of multiple elements, such as Description, Comment, Summary, Status, Product, Component, Version, Assignee, and so on. Each element has its own meaning. For example, the Description element specifically describes the reported bug such as where is the bug generated. The Summary element laconically describes what the submitted bug is. Note that we only use the textual information (i.e., Description and Summary elements) to represent the whole bug report. Although the Comment element also has enriched textual information, it is written by other developers or users and is not always related to the submitted bug. For example, the Comment element in Fig. 2 is simple log information.

B. PRE-TRAINED LANGUAGE MODEL
Pre-trained language models [9], [15] are first proposed in the natural language processing (NLP) community, which can learn the general contextual representation of words by an unsupervised pre-training on a large-scale corpus like Wikipedia. Then, the pre-trained language model can be used in different NLP tasks, such as text classification [16], machine translation [17], [18], text summarization [19], etc, by a supervised fine-tuning [9], [15]. Massive experimental results in the NLP community show that pre-trained language models have achieved state-of-the-art results in all kinds of tasks [9], leading the research tendency.
As the pre-trained language model becomes more popular, many domains start to privatize it [20], [21], [22]. In detail, they use the domain-specific corpus to pre-train the pre-trained language model again, making them serve a specific domain. The main reason is that the pre-trained language model pre-trained on the domain-specific corpus can learn the more precise contextual representation for domain-specific data, which further improves its performance on the domain-specific task. In the biomedical domain, for example, Lee et al. [21] proposed BioBERT, which is pre-trained on a large-scale biomedical corpus and achieves the state-of-the-art results in various biomedical text mining tasks such as biomedical question answering [23] and biomedical relation extraction [24]. In the scientific domain, Beltagy et al. released SCIBERT, which is pre-trained on a large multi-domain corpus of scientific publications and brings new results to a series of scientific tasks like sequence tagging [25] and dependency parsing [26]. COVID-Twitter-BERT [22] is similar to SCIBERT, but it is pre-trained on a large corpus of Twitter messages with the topic of COVID-19.

C. CONTRASTIVE LEARNING
The core concept of contrastive learning [27], [28] is to make samples that are semantically similar close together and push apart samples that are not semantically similar. According to the prior work [11], the training objective of contrastive learning is constructed with a cross-entropy objective with inbatch negatives [29]. Specifically, given a set of paired examples that are semantically similar, namely , we assume that r i and r + i are the representations of b i and b + i , then the training objective of contrastive learning for (x i , x + i ) with batch size N can be calculated as follows: i || and τ is a temperature hyperparameter. From the above equation, we can find that one critical factor of using contrastive learning is how to collect (r i , r + i ) pairs.

III. APPROACH
In this section, we introduce the pipeline of using CoLoc for bug localization, which can be seen in Fig. 3, including model architecture, pre-training CoLoc with masked language objective and contrastive learning objective, finetuning CoLoc for bug localization, and evaluating CoLoc.

A. MODEL ARCHITECTURE
Following the prior pre-trained language models [9], [14], we construct CoLoc by stacking Transformer encoder layer [12], which contains a multi-head self-attention network and a fully connected feed-forward network. For a bug report sequence br = {t 1 , t 2 , . . . , t n } where n is the length of the bug report, we concatenate two special tokens to its beginning and ending, namely br = {[CLS], t 1 , t 2 , . . . , t n , [EOS]}. Before feeding br to the Transformer encoder, we first perform a word embedding [7] and position embedding [30] to it: where E ∈ R d w ×|V| and P ∈ R d w ×|L| are two lookup matrices. d w , |V|, and L are embedding dimension, vocabulary size, and the max length of bug reports, respectively. Afterward, we feed E br in to the stacked Transformer encoder, and get the corresponding output: Note that we do not specifically introduce the Transformer encoder, but you can refer to Harvard Transformer 2 if you want to know more about it.

B. PRE-TRAINING CoLoc WITH MASKED LANGUAGE MODEL
The pre-training state 1 in Fig. 3 gives a simple pipeline of how to train CoLoc with masked language model objective.
We first need to build masked bug report sequences from 2 https://nlp.seas.harvard.edu/2018/04/03/attention.html the original bug reports. Specifically, we randomly select some tokens in each bug report sequence and replace them with a special token [MASK]. Following the prior work [9], we replace 15% tokens in each bug report sequence and mask them in the following three different ways: • Using [MASK] token to perform masking operation with the 80% probability; • Using random tokens to replace them with the 10% probability; • Performing nothing to them with the remaining 10% probability. To further make CoLoc fully learn the contextual representation of bug reports, we perform dynamic masking operation [10] when masking each bug report. Dynamic masking can mask different tokens of the same bug report in different pre-training iterations, by which we can manually increase the scale of the pre-training corpus. As a result, CoLoc can deeply learn the contextual representation of every bug report by predicting different masked tokens according to their context. When CoLoc learns the contextual representation of masked bug report sequences, we let it predict the masked tokens and train it by maximizing the following loglikelihood, which is also called masked language model: where θ is the learned parameters in CoLoc, M represents masked tokens, the probability p(·) is modeled by CoLoc, t i is the masked token, andŜ is the remaining tokens. Table 1 gives the details of pre-training CoLoc. Following the prior work [9], [10], the hyperparameter setting of CoLoc is: We optimize CoLoc by an AdamW optimizer [31] with a learning rate of 5e-5, β 1 = 0.9, β 2 = 0.999, L2 weight decay of 0.01, and a linear decay of the learning rate. We set the batch size and max length of the BR sequence to 16 and 512, respectively. We pre-train CoLoc 40 epochs and initialize CoLoc with the weight of CodeBERT, by which CoLoc can effectively build semantic interaction between natural language and source code in bug reports.

C. PRE-TRAINING CoLoc WITH CONTRASTIVE LEARNING OBJECTIVE
The pre-training state 1 in Fig. 3 gives an introduction to how to train CoLoc with the contrastive learning objective. As we introduced in Section II-C, the key step in contrastive learning is how to build positive and negative samples for the given bug report. For the bug localization task, our goal is to build the semantic alignment between bug reports and their corresponding buggy files. In other words, each buggy file is a positive sample for its corresponding bug report. Additionally, since we train our model in a mini-batch method, bug reports in a batch are negative samples of each other. Therefore, we can pair every bug report and its corresponding buggy file and denote it as (br i , bf i ). For each pair (br i , bf i ) in a batch with size N , we can pair br i with other N − 1 buggy files and regard these N − 1 pair as negative instances of (br i , bf i ). According to Eq. 1, we can calculate the contrastive learning objective by minimizing the following loss function: where sim(·) represents the cosine similarity, r i and f i respectively are the contextual representation of bug reports and buggy files, which are generated from pre-trained CoLoc. τ is a temperature factor used to avoid gradient vanishing. By contrastive learning pre-training, CoLoc can fully learn the semantic difference between different bug reports as well as buggy files.

1) DROPOUT-BASED DATA AUGMENTATION
To further boost the learning ability of CoLoc for semantic difference, we propose using dropout noise [32] to produce the double negative instances for a batch of pairs. As dropout noise works by randomly dropping out some neurons, we can obtain two similar but different representations if we feed a sample to CoLoc twice. By using this method, we can put the buggy files in a batch to CoLoc twice and obtain two batches of representations for buggy files, producing double negative instances for each bug report. In this condition, our contrastive learning objective becomes: where f + i are the contextual representations of buggy files produced by dropout data augmentation. In addition to boosting the learning ability of CoLoc for semantic difference, more negative instances can help CoLoc further build the semantic mapping between bug reports and their corresponding buggy files, improving their semantic alignment.

2) PRE-TRAINING DETAILS
Most hyperparameter settings used to pre-train CoLoc by contrastive learning objective keep consistent by pre-training CoLoc with masked language model objective. We mainly change the batch size, learning rate, and the number of the training epoch to 32, 3e-5, and 5, respectively. Before pre-training CoLoc by contrastive learning objective, we initialize it with the weight of CoLoc pre-trained by the masked language model objective.

D. FINE-TUNING CoLoc FOR BUG LOCALIZATION
When completing all pre-training, we can fine-tune pretrained CoLoc for bug localization. As shown in Fig. 3, we build a siamese CoLoc with shared parameters to perform automated bug localization. Specifically, for pairs of bug reports and buggy files in a batch, we use two CoLoc to learn the contextual representation of bug reports and buggy files respectively, which can be defined as follows, where R and F are the contextual representation of bug reports and buggy files, BR and BF respectively are bug reports and buggy files in a batch, and CoLoc and CoLoc are siamese network. Then, we train the siamese CoLoc by minimizing the following loss function: where N is the batch size. By the above loss function, we can maximize the inner product of a bug report and its corresponding buggy file while minimizing the inner product of this bug report with other buggy files in the batch.

1) TRAINING DETAILS
In the training phase, we mainly change the batch size, learning rate, and the number of the training epoch to 64, 5e-6, and 10, respectively. For other parameters, we keep them unchangeable.

E. EVALUATING CoLoc
When finishing the training of CoLoc for bug localization, we can evaluate CoLoc's performance on the test set. Specifically, as illustrated in Fig. 3, we first feed all buggy files (in the whole dataset) to CoLoc and obtain their semantic vectors, building a buggy file base (BFB). For a new bug report in the test set, we feed it to CoLoc and generate its contextual representation. Next, we calculate the cosine similarity between the bug report and all buggy files in BFB. Finally, we return top-k semantic similarly buggy files.

IV. EXPERIMENTAL SETUPS
In this section, we introduce experimental setups, including research questions, dataset and baselines, evaluation metrics, and experimental environment.

A. RESEARCH QUESTIONS
Our work focus on the following three research questions (RQ): • RQ1: How effective is CoLoc when compared with state-of-the-art approaches?
• RQ2: How effective is CoLoc when compared with pre-trained language models?
• RQ3: How does the dropout rate affect the performance of CoLoc? In RQ1, we mainly explore the effectiveness of CoLoc. By verifying its effectiveness, we can ensure whether CoLoc can learn the deep contextual representation of bug reports and buggy files, as well as build the semantic interaction between them. Since CoLoc is designed based on pre-trained language models, we want to explore whether CoLoc is more effective than existing pre-trained language models, such as BERT [9] and CodeBERT [14]. We use dropout rate as a data augmentation method and build a new contrastive learning objective to pre-train CoLoc. Hence, in RQ3, we mainly explore the effect on CoLoc when setting different dropout rates.

B. DATASET AND BASELINES 1) DATASET
To pre-trial CoLoc by masked language model, we utilize the public dataset released by Fang et al. [2]. Fang et al. collected more than 270,000 bug reports from four projects in BugZilla, including Mozilla, Eclipse, Netbeans, and GNU compiler collection (GCC). Table 2 gives specific statistics of Fang et al.'s dataset. For the remaining experiments, we conduct them on another public dataset released by Xiao et al. [6], which is collected for bug localization. Specifically, Xiao et al. collected bug reports and buggy files from four projects, including AspectJ, Eclipse UI, JDT, SWT, and Tomcat. Table 3 gives specific statistics of this dataset.

2) BASELINES IN RQ1
To verify the effectiveness of CoLoc, we select six baseline approaches, including BugLocator [33], DNNLOC [34], DeepLocator [5], NP-CNN [3], CAST [35], and DeepLoc [6]. BugLocator is a traditional machine learning-based approach and is constructed by a revised vector space model. Based on BugLocator, DNNLOC introduced a deep neural network and considered the historical information of bug-fixing. DeepLocator proposed to utilize CNN to learn semantic information from bug reports and AST of source code. As for NP-CNN, it both considers lexical and structural information of programs to learn unified features from both natural language and source code in bug reports. CAST proposed a customized AST to help CNN learn more useful semantic information. DeepLoc proposed an enhanced CNN to improve the feature learning ability of the original CNN.

3) BASELINES IN RQ2
To further verify the effectiveness of CoLoc, we choose three famous pre-trained language models, i.e., BERT [9], RoBERTa [10], and CodeBERT [14]. All these pre-trained language models are designed by stacking the Transformer encoder layer.

C. EVALUATION METRICS
Following the prior studies, we utilize Accuracy@k, MRR, and MAP to measure the performance of all approaches.
• Accuracy@k measures the buggy file that is correctly searched for the given bug reports within the top-k rank.
• MRR, namely mean reciprocal rank, is the mean of the sum of the inverse of the ranks of the first matched buggy file for the given bug report, which can be computed as follows: 1 crank j (9) where N is the number of bug reports and crank j means the position of the first buggy file that is matched with jth bug report. The higher the value of MRR, the more effective the approach is.
• MAP, namely mean average precision, is a commonly used evaluation metric in information retrieval. We can calculate it by the following equation: AvgP(i) (10) where N is the number of bug reports and AvgP(i) can be computed as follows, where M is the maximum position of the buggy file corresponding to ith bug report, C(j) denotes the number of buggy files that are correctly located in top-j, bool(j) means whether the located file in rank j is the correct file for ith bug report, and BF(i) indicates the number of buggy files corresponding for ith bug report. Similar to MRR, the higher the value of MAP, the more effective the approach is.

D. EXPERIMENTAL ENVIRONMENT
We conduct all experiments on a deep learning server that contains two Intel Xeon 2.20GHz CPUs, 256GB memory, and two NVIDIA Tesla V100 GPUs with 32GB memory.

V. EVALUATION
A. ANSWER TO RQ1: EFFECTIVENESS COMPARISON BETWEEN CoLoc AND BASELINES Table 4 gives the performance comparison of all approaches. We calculate Accuracy@k(k = 1, 5, 10), MAP, and MRR for each approach. From Table, we first can find that CoLoc outperforms all baseline approaches in terms of all metrics on every project. BugLocator is the worst-performing model since it is built by a traditional machine-learning approach, which cannot learn the semantic information from bug reports and buggy files. For other baseline approaches, they all perform much better than BugLocator, which supports the effectiveness of neural networks. We can also find that CAST is the best-performing model in all baseline approaches, supporting the effectiveness of structural information of source code in bug reports. Besides, the effectiveness of CAST also proves that CNN can effectively model semantic information of bug reports and buggy files. By comparison with baseline approaches, since we perform effective pre-training for CoLoc before fine-tuning it on the bug localization dataset, CoLoc can learn contextual information of bug reports and build deep semantic interaction between bug reports and buggy files. Additionally, our built contrastive learning objective enables CoLoc further learn the semantic difference between bug reports and buggy files. As a result, CoLoc achieves new results and outperforms all baseline approaches. One advantage of CoLoc over CNN is that the self-attention network can model the global contextual information for each token in the bug report sequence without considering their distance. As for CNN, it can only model the local semantic information due to the limited size of convolutional kernels. Table 5 contrasts the performance of CoLoc against three existing pre-training language models, i.e., BERT, RoBERTa, and CodeBERT. Similar to RQ1, we calculate Accuracy@k(k = 1, 5, 10), MAP, and MRR to measure each approach.

B. ANSWER TO RQ2: EFFECTIVENESS COMPARISON BETWEEN CoLoc AND PRE-TRAINED LANGUAGE MODEL
From Table, we can note that on AspectJ, Eclipse, and Tomcat projects, CoLoc outperforms all pre-trained language models in terms of all metrics. On JDT and SWT projects, CoLoc almost outperforms all pre-trained language models in terms of all metrics. It supports the effectiveness of CoLoc and shows that pre-training CoLoc on bug reports corpus can effectively enforce it to learn domain-specific knowledge, which helps CoLoc to perform precise matching between bug reports and their corresponding buggy files. We also can observe that all pre-trained language models outperform or have a close performance to baseline approaches in RQ1,  which proves that pre-trained language models can learn the deep contextual representations of bug reports and buggy files. In addition to it, our built contrastive learning objective can help CoLoc to learn the semantic difference between different pairs of bug reports and buggy files, making it effective to recommend accurate buggy files for the given bug report.
C. ANSWER TO RQ3: THE EFFECT OF DROPOUT RATE ON CoLoc Fig. 4 and Fig. 5 show the performance of CoLoc with different dropout settings on AspectJ and Tomcat projects, respectively. Specifically, we set seven different dropout rates and evaluated CoLoc's performance. From the figures, we can observe that when we set the dropout rate to 0.1, CoLoc achieves the highest performance in terms of all metrics. As we continuously increase the value of the dropout rate, CoLoc's performance keeps dropping. We also note that when we increase the dropout rate from 0.4 to 0.5, CoLoc's performance decreases a lot, which means a large dropout rate may seriously hurt the learning ability of CoLoc.

VI. THREATS TO THE VALIDITY
This paper mainly faces two threats to validity. One is the internal validity: how to effectively set the hyperparameters in CoLoc. We mitigate this threat by following the prior studies [9], [10] to set most parameters, which have been verified as optimal. For other hyperparameter settings (i.e., dropout rate), we perform comparison experiments to search for the optimal settings, which can be seen in RQ3.
Another threat is the external validity: we only evaluate CoLoc on five projects. There are many projects in different bug tracking systems, and they also have a requirement for automated bug localization. However, we cannot confirm whether CoLoc can serve other projects. A mitigating factor is that CoLoc can sever for other projects in bug tracking systems by the transfer learning [39]. Specifically, we can directly fine-tune CoLoc on new projects for bug localization.

VII. RELATED WORK A. BUG LOCALIZATION
Early automated bug localization approaches are implemented based on information retrieval technology, which achieves by matching keys between bug reports and source files. For example, Lukins et al. utilized LDA, a generative statistical model, to locate buggy files. Afterward, Gay et al. introduced an explicit relevance feedback mechanism and designed a vector space model-based approach to automatically locate buggy files. Zhou et al. further proposed a revised vector space model-based approach, namely BugLocator, which can use fixed bugs' information to improve the ranking performance. Even though the above-mentioned approaches achieve some success, they cannot learn the semantic information of bug reports and buggy files, thus these approaches cannot correctly match bug reports and source files that are semantically similar.
Currently, more and more researchers start to utilize deep learning techniques to construct their automated bug localization approaches. The reason is that deep learning has no requirement for manual feature engineering and can learn the semantic information of bug reports and buggy files. Lam et al. [34] proposed HyLoc, a model that combines a deep neural network and a revised vector space model. Specifically, HyLoc contains six layers of the neural network: it uses two layers to capture features, two-layer to make projections, one layer for relevancy estimation, and the remaining one for feature fusion. Then, Huo et al. [3] proposed NP-CNN, which is a CNN-based approach that can learn the unified feature from bug reports and source files. Compared with the deep neural network, CNN has a stronger semantic feature extraction ability. Follow by Huo et al. [3], Xiao et al. [5] utilized an enhanced CNN to build DeepLocator, which can utilize bug-fixing history information to perform data augmentation. Afterward, Xiao et al. [6] further proposed DeepLoc, which utilized multiple embedding layers and learn more effective semantic features. CAST, proposed by Liang et al. [35], can boost the input representation by introducing a customized AST, which effectively improves the feature extraction ability of CAST. Different from the above approaches, CoLoc combines pre-trained language models and contrastive learning, which enables it to learn deep contextual representations of bug reports and source files, as well as their contextual interaction. Additionally, contrastive learning helps CoLoc learn the semantic difference between different pairs of bug reports and source files. All these features can help CoLoc perform more accurate bug localization.

VIII. CONCLUSION
In this work, we propose a novel automated bug localization approach that combines pre-training language models and contrastive learning, namely CoLoc. Compared with existing bug localization approaches, CoLoc can better learn contextual information from bug reports and source files, building their deep contextual interaction. Additionally, our built contrastive learning objective enables CoLoc to learn the semantic difference between different pairs of bug reports and source files. To evaluate the effectiveness of CoLoc, we perform extensive comparison experiments on a widely used public dataset. The experimental results show that CoLoc outperforms all baseline approaches. In the future, we plan to further improve CoLoc by introducing structural information of source code and we also plan to apply it to different bug localization datasets for verifying its performance.
ZHENGMAO LUO received the master's degree in electronic and communication engineering from Tongji University, Shanghai, China. He was a Visiting Scholar at the Graphics and Image Processing Experimental Center, Wenzhou University, and an Associate Professor at the Zhejiang College of Security Technology. He has published more than 20 pending papers on image processing AI recognition technology, 18 patents and copyrights, and published three monographs on enterprise informatization. He is responsible for two provincial, ministerial, and national project. CAICHUN CEN received the master's degree in interactive media art from the Faculty of Humanities and Arts, Macau University of Science and Technology. She is an assistant engineer. She has participated in scientific research projects, such as the National Science Foundation, Guangxi Natural Science Foundation, and Wuzhou Science and Technology Development. Her research interests include game design, VR, 3-D visualization, and other research. She is a member of CSIG. VOLUME 11, 2023