BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-06-12 DOI:10.1109/TSE.2025.3579574

Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan

{"title":"BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning","authors":"Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan","doi":"10.1109/TSE.2025.3579574","DOIUrl":null,"url":null,"abstract":"Software bugs require developers to expend significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is crucial in reducing this effort. Existing bug localization tools, typically reliant on deep learning techniques, face limitations in both cross-project applicability and multi-language environments. Recent advancements with Large Language Models (LLMs) offer detailed representations for bug localization that may help to overcome such limitations. However, these models are known to encounter challenges with 1) limited context windows and 2) mapping accuracy. To address these challenges, we propose <monospace>BLAZE</monospace>, an approach that employs dynamic chunking and hard example learning. First, <monospace>BLAZE</monospace> dynamically segments source code to minimize continuity loss. Then, <monospace>BLAZE</monospace> fine-tunes a GPT-based model using complex bug reports in order to enhance cross-project and cross-language bug localization. To support the capability of <monospace>BLAZE</monospace>, we create the <monospace>BeetleBox</monospace> dataset, which comprises 23,782 bugs from 29 large and thriving open-source projects across five programming languages (Java, C<inline-formula><tex-math>$++$</tex-math></inline-formula>, Python, Go, and JavaScript). Our evaluation of <monospace>BLAZE</monospace> on three benchmark datasets—<monospace>BeetleBox</monospace>, SWE-Bench, and Ye et al.—demonstrates substantial improvements compared to six state-of-the-art baselines. Specifically, <monospace>BLAZE</monospace> achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR). Furthermore, an extensive ablation study confirms the contributions of our pipeline components to the overall performance enhancement.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2254-2267"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11034690/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software bugs require developers to expend significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is crucial in reducing this effort. Existing bug localization tools, typically reliant on deep learning techniques, face limitations in both cross-project applicability and multi-language environments. Recent advancements with Large Language Models (LLMs) offer detailed representations for bug localization that may help to overcome such limitations. However, these models are known to encounter challenges with 1) limited context windows and 2) mapping accuracy. To address these challenges, we propose BLAZE, an approach that employs dynamic chunking and hard example learning. First, BLAZE dynamically segments source code to minimize continuity loss. Then, BLAZE fine-tunes a GPT-based model using complex bug reports in order to enhance cross-project and cross-language bug localization. To support the capability of BLAZE, we create the BeetleBox dataset, which comprises 23,782 bugs from 29 large and thriving open-source projects across five programming languages (Java, C

$++$

, Python, Go, and JavaScript). Our evaluation of BLAZE on three benchmark datasets—BeetleBox, SWE-Bench, and Ye et al.—demonstrates substantial improvements compared to six state-of-the-art baselines. Specifically, BLAZE achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR). Furthermore, an extensive ablation study confirms the contributions of our pipeline components to the overall performance enhancement.

查看原文本刊更多论文

BLAZE：通过动态分块和难示例学习实现跨语言和跨项目的Bug定位

软件错误需要开发人员花费大量的精力来识别和解决它们，通常花费大约三分之一的时间。Bug本地化，即精确定位需要修改的源代码文件的过程，对于减少这种工作量至关重要。现有的漏洞定位工具通常依赖于深度学习技术，在跨项目适用性和多语言环境中都面临局限性。大型语言模型（llm）的最新进展为bug定位提供了详细的表示，这可能有助于克服此类限制。然而，众所周知，这些模型面临着1)有限的上下文窗口和2)映射准确性的挑战。为了应对这些挑战，我们提出了BLAZE，这是一种采用动态分块和难示例学习的方法。首先，BLAZE动态地分割源代码，以尽量减少连续性损失。然后，BLAZE使用复杂的bug报告对基于gpt的模型进行微调，以增强跨项目和跨语言的bug本地化。为了支持BLAZE的功能，我们创建了BeetleBox数据集，其中包含来自29个大型开源项目的23,782个bug，涉及五种编程语言（Java， C$++$, Python， Go和JavaScript）。我们在三个基准数据集（beetlebox、SWE-Bench和Ye等）上对BLAZE进行了评估，结果表明与六个最先进的基线相比，BLAZE有了实质性的改进。具体来说，BLAZE在Top 1精度上提高了120%，在平均平均精度（MAP）上提高了144%，在平均倒数秩（MRR）上提高了100%。此外，一项广泛的烧蚀研究证实了我们的管道组件对整体性能提升的贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.