Mengjiao Wang , Biyu Cai , Weiqin Zou, Jingxuan Zhang
{"title":"Keys4BR: Key sentences-based model fine-tuning for better semantic representation of bug reports","authors":"Mengjiao Wang , Biyu Cai , Weiqin Zou, Jingxuan Zhang","doi":"10.1016/j.infsof.2025.107943","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Large language models have been increasingly applied to semantic representation of bug reports due to their deep understanding of natural language. Fine-tuning large language models using bug report text is a common practice to enable models to learn domain-specific knowledge. However, the varying quality of the bug reports can introduce noise, leading to poor performance in downstream tasks.</div></div><div><h3>Objective:</h3><div>To improve the quality of semantic representation for bug reports, we propose Keys4BR, a key sentences-based model fine-tuning for better semantic representation of bug reports.</div></div><div><h3>Method:</h3><div>Specifically, we use keywords that help accurately localize bugs as anchors, designing and applying a key sentences selection strategy to choose portions of the text containing these keywords as the key information. Then we select the lightweight fine-tuning approach to fine-tune the large language model.</div></div><div><h3>Results:</h3><div>Experiments on bug reports from five open-source projects demonstrate that Keys4BR significantly improves the performance of four downstream tasks. The results indicate that Keys4BR achieves superior semantic representation of bug reports compared to the VSM model, the model pre-trained on the general corpus, and the model fine-tuned on original bug reports, with an average F1 score improvement of 9%, 9%, and 6%, respectively. Additionally, we further validate the effectiveness of the key sentences selection and fine-tuning strategies.</div></div><div><h3>Conclusion:</h3><div>Keys4BR can effectively extract key semantic information from bug reports, thereby enhancing the representation capability and generalization performance of large language models in bug report management tasks.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107943"},"PeriodicalIF":4.3000,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002824","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context:
Large language models have been increasingly applied to semantic representation of bug reports due to their deep understanding of natural language. Fine-tuning large language models using bug report text is a common practice to enable models to learn domain-specific knowledge. However, the varying quality of the bug reports can introduce noise, leading to poor performance in downstream tasks.
Objective:
To improve the quality of semantic representation for bug reports, we propose Keys4BR, a key sentences-based model fine-tuning for better semantic representation of bug reports.
Method:
Specifically, we use keywords that help accurately localize bugs as anchors, designing and applying a key sentences selection strategy to choose portions of the text containing these keywords as the key information. Then we select the lightweight fine-tuning approach to fine-tune the large language model.
Results:
Experiments on bug reports from five open-source projects demonstrate that Keys4BR significantly improves the performance of four downstream tasks. The results indicate that Keys4BR achieves superior semantic representation of bug reports compared to the VSM model, the model pre-trained on the general corpus, and the model fine-tuned on original bug reports, with an average F1 score improvement of 9%, 9%, and 6%, respectively. Additionally, we further validate the effectiveness of the key sentences selection and fine-tuning strategies.
Conclusion:
Keys4BR can effectively extract key semantic information from bug reports, thereby enhancing the representation capability and generalization performance of large language models in bug report management tasks.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.