CLeBPI: Bug优先级推理的对比学习

Inf. Softw. Technol. Pub Date : 2022-12-02 DOI:10.48550/arXiv.2212.01011

Wen-Yen Wang, Chenhao Wu, Jie He

{"title":"CLeBPI: Bug优先级推理的对比学习","authors":"Wen-Yen Wang, Chenhao Wu, Jie He","doi":"10.48550/arXiv.2212.01011","DOIUrl":null,"url":null,"abstract":"Automated bug priority inference can reduce the time overhead of bug triagers for priority assignments, improving the efficiency of software maintenance. Currently, there are two orthogonal lines for this task, i.e., traditional machine learning based (TML-based) and neural network based (NN-based) approaches. Although these approaches achieve competitive performance, our observation finds that existing approaches face the following two issues: 1) TML-based approaches require much manual feature engineering and cannot learn the semantic information of bug reports; 2) Both TML-based and NN-based approaches cannot effectively address the label imbalance problem because they are difficult to distinguish the semantic difference between bug reports with different priorities. In this paper, we propose CLeBPI (Contrastive Learning for Bug Priority Inference), which leverages pre-trained language model and contrastive learning to tackle the above-mentioned two issues. Specifically, CLeBPI is first pre-trained on a large-scale bug report corpus in a self-supervised way, thus it can automatically learn contextual representations of bug reports without manual feature engineering. Afterward, it is further pre-trained by a contrastive learning objective, which enables it to distinguish semantic differences between bug reports, learning more precise contextual representations for each bug report. When finishing pre-training, we can connect a classification layer to CLeBPI and fine-tune it for bug priority inference in a supervised way. To verify the effectiveness of CLeBPI, we choose four baseline approaches and conduct comparison experiments on a public dataset. The experimental results show that CLeBPI outperforms all baseline approaches by 23.86%-77.80% in terms of weighted average F1-score, showing its effectiveness.","PeriodicalId":133352,"journal":{"name":"Inf. Softw. Technol.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"CLeBPI: Contrastive Learning for Bug Priority Inference\",\"authors\":\"Wen-Yen Wang, Chenhao Wu, Jie He\",\"doi\":\"10.48550/arXiv.2212.01011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated bug priority inference can reduce the time overhead of bug triagers for priority assignments, improving the efficiency of software maintenance. Currently, there are two orthogonal lines for this task, i.e., traditional machine learning based (TML-based) and neural network based (NN-based) approaches. Although these approaches achieve competitive performance, our observation finds that existing approaches face the following two issues: 1) TML-based approaches require much manual feature engineering and cannot learn the semantic information of bug reports; 2) Both TML-based and NN-based approaches cannot effectively address the label imbalance problem because they are difficult to distinguish the semantic difference between bug reports with different priorities. In this paper, we propose CLeBPI (Contrastive Learning for Bug Priority Inference), which leverages pre-trained language model and contrastive learning to tackle the above-mentioned two issues. Specifically, CLeBPI is first pre-trained on a large-scale bug report corpus in a self-supervised way, thus it can automatically learn contextual representations of bug reports without manual feature engineering. Afterward, it is further pre-trained by a contrastive learning objective, which enables it to distinguish semantic differences between bug reports, learning more precise contextual representations for each bug report. When finishing pre-training, we can connect a classification layer to CLeBPI and fine-tune it for bug priority inference in a supervised way. To verify the effectiveness of CLeBPI, we choose four baseline approaches and conduct comparison experiments on a public dataset. The experimental results show that CLeBPI outperforms all baseline approaches by 23.86%-77.80% in terms of weighted average F1-score, showing its effectiveness.\",\"PeriodicalId\":133352,\"journal\":{\"name\":\"Inf. Softw. Technol.\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Inf. Softw. Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2212.01011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inf. Softw. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.01011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

自动化的bug优先级推断可以减少bug触发器分配优先级的时间开销，提高软件维护的效率。目前，该任务有两条正交线，即基于传统机器学习(tm -based)和基于神经网络(NN-based)的方法。虽然这些方法取得了相当的性能，但我们的观察发现，现有的方法面临以下两个问题:1)基于xml的方法需要大量的手动特征工程，并且无法学习bug报告的语义信息;2)由于难以区分不同优先级的bug报告之间的语义差异，基于html和基于nn的方法都不能有效地解决标签不平衡问题。在本文中，我们提出了CLeBPI (Bug优先级推理的对比学习)，它利用预训练语言模型和对比学习来解决上述两个问题。具体来说，CLeBPI首先以一种自监督的方式在大规模的错误报告语料库上进行预训练，因此它可以自动学习错误报告的上下文表示，而无需手动特征工程。之后，通过对比学习目标对它进行进一步的预训练，这使它能够区分错误报告之间的语义差异，为每个错误报告学习更精确的上下文表示。在完成预训练后，我们可以将分类层连接到CLeBPI上，并以监督的方式对其进行微调以进行bug优先级推断。为了验证CLeBPI的有效性，我们选择了四种基线方法，并在公共数据集上进行了对比实验。实验结果表明，在加权平均f1得分方面，CLeBPI优于所有基线方法23.86% ~ 77.80%，显示了其有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CLeBPI: Contrastive Learning for Bug Priority Inference

Automated bug priority inference can reduce the time overhead of bug triagers for priority assignments, improving the efficiency of software maintenance. Currently, there are two orthogonal lines for this task, i.e., traditional machine learning based (TML-based) and neural network based (NN-based) approaches. Although these approaches achieve competitive performance, our observation finds that existing approaches face the following two issues: 1) TML-based approaches require much manual feature engineering and cannot learn the semantic information of bug reports; 2) Both TML-based and NN-based approaches cannot effectively address the label imbalance problem because they are difficult to distinguish the semantic difference between bug reports with different priorities. In this paper, we propose CLeBPI (Contrastive Learning for Bug Priority Inference), which leverages pre-trained language model and contrastive learning to tackle the above-mentioned two issues. Specifically, CLeBPI is first pre-trained on a large-scale bug report corpus in a self-supervised way, thus it can automatically learn contextual representations of bug reports without manual feature engineering. Afterward, it is further pre-trained by a contrastive learning objective, which enables it to distinguish semantic differences between bug reports, learning more precise contextual representations for each bug report. When finishing pre-training, we can connect a classification layer to CLeBPI and fine-tune it for bug priority inference in a supervised way. To verify the effectiveness of CLeBPI, we choose four baseline approaches and conduct comparison experiments on a public dataset. The experimental results show that CLeBPI outperforms all baseline approaches by 23.86%-77.80% in terms of weighted average F1-score, showing its effectiveness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Inf. Softw. Technol.

自引率

0.00%

发文量