Keys4BR: Key sentences-based model fine-tuning for better semantic representation of bug reports

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-10-23 DOI:10.1016/j.infsof.2025.107943

Mengjiao Wang , Biyu Cai , Weiqin Zou, Jingxuan Zhang

{"title":"Keys4BR: Key sentences-based model fine-tuning for better semantic representation of bug reports","authors":"Mengjiao Wang , Biyu Cai , Weiqin Zou, Jingxuan Zhang","doi":"10.1016/j.infsof.2025.107943","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Large language models have been increasingly applied to semantic representation of bug reports due to their deep understanding of natural language. Fine-tuning large language models using bug report text is a common practice to enable models to learn domain-specific knowledge. However, the varying quality of the bug reports can introduce noise, leading to poor performance in downstream tasks.</div></div><div><h3>Objective:</h3><div>To improve the quality of semantic representation for bug reports, we propose Keys4BR, a key sentences-based model fine-tuning for better semantic representation of bug reports.</div></div><div><h3>Method:</h3><div>Specifically, we use keywords that help accurately localize bugs as anchors, designing and applying a key sentences selection strategy to choose portions of the text containing these keywords as the key information. Then we select the lightweight fine-tuning approach to fine-tune the large language model.</div></div><div><h3>Results:</h3><div>Experiments on bug reports from five open-source projects demonstrate that Keys4BR significantly improves the performance of four downstream tasks. The results indicate that Keys4BR achieves superior semantic representation of bug reports compared to the VSM model, the model pre-trained on the general corpus, and the model fine-tuned on original bug reports, with an average F1 score improvement of 9%, 9%, and 6%, respectively. Additionally, we further validate the effectiveness of the key sentences selection and fine-tuning strategies.</div></div><div><h3>Conclusion:</h3><div>Keys4BR can effectively extract key semantic information from bug reports, thereby enhancing the representation capability and generalization performance of large language models in bug report management tasks.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107943"},"PeriodicalIF":4.3000,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002824","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Large language models have been increasingly applied to semantic representation of bug reports due to their deep understanding of natural language. Fine-tuning large language models using bug report text is a common practice to enable models to learn domain-specific knowledge. However, the varying quality of the bug reports can introduce noise, leading to poor performance in downstream tasks.

Objective:

To improve the quality of semantic representation for bug reports, we propose Keys4BR, a key sentences-based model fine-tuning for better semantic representation of bug reports.

Method:

Specifically, we use keywords that help accurately localize bugs as anchors, designing and applying a key sentences selection strategy to choose portions of the text containing these keywords as the key information. Then we select the lightweight fine-tuning approach to fine-tune the large language model.

Results:

Experiments on bug reports from five open-source projects demonstrate that Keys4BR significantly improves the performance of four downstream tasks. The results indicate that Keys4BR achieves superior semantic representation of bug reports compared to the VSM model, the model pre-trained on the general corpus, and the model fine-tuned on original bug reports, with an average F1 score improvement of 9%, 9%, and 6%, respectively. Additionally, we further validate the effectiveness of the key sentences selection and fine-tuning strategies.

Conclusion:

Keys4BR can effectively extract key semantic information from bug reports, thereby enhancing the representation capability and generalization performance of large language models in bug report management tasks.

查看原文本刊更多论文

key4br：基于关键句子的模型微调，以更好地表示bug报告的语义

上下文：由于对自然语言的深刻理解，大型语言模型已经越来越多地应用于bug报告的语义表示。使用bug报告文本对大型语言模型进行微调是一种常见的做法，它使模型能够学习特定于领域的知识。然而，不同质量的bug报告可能会带来噪音，导致下游任务的性能不佳。目的：为了提高bug报告的语义表示质量，我们提出了key4br——一种基于关键句子的模型微调，以更好地表达bug报告的语义。方法：具体而言，我们使用有助于准确定位bug的关键词作为锚点，设计并应用关键句选择策略，选择包含这些关键词的文本部分作为关键信息。然后选择轻量级微调方法对大型语言模型进行微调。结果：对五个开源项目bug报告的实验表明，key4br显著提高了四个下游任务的性能。结果表明，与VSM模型、在通用语料库上预训练的模型和在原始bug报告上微调的模型相比，key4br在bug报告的语义表示上取得了更好的效果，平均F1分数分别提高了9%、9%和6%。此外，我们进一步验证了关键句选择和微调策略的有效性。结论：key4br可以有效地从bug报告中提取关键语义信息，从而增强大型语言模型在bug报告管理任务中的表示能力和泛化性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.