应用大型预训练语言模型的轻Bug分类框架

Jaehyung Lee, Kisun Han, Hwanjo Yu
{"title":"应用大型预训练语言模型的轻Bug分类框架","authors":"Jaehyung Lee, Kisun Han, Hwanjo Yu","doi":"10.1145/3551349.3556898","DOIUrl":null,"url":null,"abstract":"Assigning appropriate developers to the bugs is one of the main challenges in bug triage. Demands for automatic bug triage are increasing in the industry, as manual bug triage is labor-intensive and time-consuming in large projects. The key to the bug triage task is extracting semantic information from a bug report. In recent years, large Pre-trained Language Models (PLMs) including BERT [4] have achieved dramatic progress in the natural language processing (NLP) domain. However, applying large PLMs to the bug triage task for extracting semantic information has several challenges. In this paper, we address the challenges and propose a novel framework for bug triage named LBT-P, standing for Light Bug Triage framework with a Pre-trained language model. It compresses a large PLM into small and fast models using knowledge distillation techniques and also prevents catastrophic forgetting of PLM by introducing knowledge preservation fine-tuning. We also develop a new loss function exploiting representations of earlier layers as well as deeper layers in order to handle the overthinking problem. We demonstrate our proposed framework on the real-world private dataset and three public real-world datasets [11]: Google Chromium, Mozilla Core, and Mozilla Firefox. The result of the experiments shows the superiority of LBT-P.","PeriodicalId":197939,"journal":{"name":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A Light Bug Triage Framework for Applying Large Pre-trained Language Model\",\"authors\":\"Jaehyung Lee, Kisun Han, Hwanjo Yu\",\"doi\":\"10.1145/3551349.3556898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Assigning appropriate developers to the bugs is one of the main challenges in bug triage. Demands for automatic bug triage are increasing in the industry, as manual bug triage is labor-intensive and time-consuming in large projects. The key to the bug triage task is extracting semantic information from a bug report. In recent years, large Pre-trained Language Models (PLMs) including BERT [4] have achieved dramatic progress in the natural language processing (NLP) domain. However, applying large PLMs to the bug triage task for extracting semantic information has several challenges. In this paper, we address the challenges and propose a novel framework for bug triage named LBT-P, standing for Light Bug Triage framework with a Pre-trained language model. It compresses a large PLM into small and fast models using knowledge distillation techniques and also prevents catastrophic forgetting of PLM by introducing knowledge preservation fine-tuning. We also develop a new loss function exploiting representations of earlier layers as well as deeper layers in order to handle the overthinking problem. We demonstrate our proposed framework on the real-world private dataset and three public real-world datasets [11]: Google Chromium, Mozilla Core, and Mozilla Firefox. The result of the experiments shows the superiority of LBT-P.\",\"PeriodicalId\":197939,\"journal\":{\"name\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"volume\":\"92 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3551349.3556898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3551349.3556898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

为bug分配合适的开发人员是bug分类的主要挑战之一。在行业中,对自动错误分类的需求正在增加,因为在大型项目中,手动错误分类是劳动密集型和耗时的。错误分类任务的关键是从错误报告中提取语义信息。近年来,包括BERT[4]在内的大型预训练语言模型(PLMs)在自然语言处理(NLP)领域取得了巨大进展。然而,将大型plm应用于bug分类任务以提取语义信息有几个挑战。在本文中,我们解决了这些挑战,并提出了一个新的bug分类框架,名为LBT-P,代表带有预训练语言模型的Light bug triage框架。它利用知识蒸馏技术将大型PLM压缩成小而快速的模型,并通过引入知识保存微调来防止PLM的灾难性遗忘。我们还开发了一个新的损失函数,利用较早层和更深层的表示来处理过度思考问题。我们在真实世界的私有数据集和三个公开的真实世界数据集上展示了我们提出的框架[11]:Google Chromium, Mozilla Core和Mozilla Firefox。实验结果表明了LBT-P的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Light Bug Triage Framework for Applying Large Pre-trained Language Model
Assigning appropriate developers to the bugs is one of the main challenges in bug triage. Demands for automatic bug triage are increasing in the industry, as manual bug triage is labor-intensive and time-consuming in large projects. The key to the bug triage task is extracting semantic information from a bug report. In recent years, large Pre-trained Language Models (PLMs) including BERT [4] have achieved dramatic progress in the natural language processing (NLP) domain. However, applying large PLMs to the bug triage task for extracting semantic information has several challenges. In this paper, we address the challenges and propose a novel framework for bug triage named LBT-P, standing for Light Bug Triage framework with a Pre-trained language model. It compresses a large PLM into small and fast models using knowledge distillation techniques and also prevents catastrophic forgetting of PLM by introducing knowledge preservation fine-tuning. We also develop a new loss function exploiting representations of earlier layers as well as deeper layers in order to handle the overthinking problem. We demonstrate our proposed framework on the real-world private dataset and three public real-world datasets [11]: Google Chromium, Mozilla Core, and Mozilla Firefox. The result of the experiments shows the superiority of LBT-P.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信