Chuyang Xu;Zhongxin Liu;Xiaoxue Ren;Gehao Zhang;Ming Liang;David Lo
{"title":"FlexFL:基于开源大型语言模型的灵活有效的故障定位","authors":"Chuyang Xu;Zhongxin Liu;Xiaoxue Ren;Gehao Zhang;Ming Liang;David Lo","doi":"10.1109/TSE.2025.3553363","DOIUrl":null,"url":null,"abstract":"Fault localization (FL) targets identifying bug locations within a software system, which can enhance debugging efficiency and improve software quality. Due to the impressive code comprehension ability of Large Language Models (LLMs), a few studies have proposed to leverage LLMs to locate bugs, i.e., LLM-based FL, and demonstrated promising performance. However, first, these methods are limited in flexibility. They rely on bug-triggering test cases to perform FL and cannot make use of other available bug-related information, e.g., bug reports. Second, they are built upon proprietary LLMs, which are, although powerful, confronted with risks in data privacy. To address these limitations, we propose a novel LLM-based FL framework named FlexFL, which can flexibly leverage different types of bug-related information and effectively work with open-source LLMs. FlexFL is composed of two stages. In the first stage, FlexFL reduces the search space of buggy code using state-of-the-art FL techniques of different families and provides a candidate list of bug-related methods. In the second stage, FlexFL leverages LLMs to delve deeper to double-check the code snippets of methods suggested by the first stage and refine fault localization results. In each stage, FlexFL constructs agents based on open-source LLMs, which share the same pipeline that does not postulate any type of bug-related information and can interact with function calls without the out-of-the-box capability. Extensive experimental results on Defects4J demonstrate that FlexFL outperforms the baselines and can work with different open-source LLMs. Specifically, FlexFL with a lightweight open-source LLM Llama3-8B can locate 42 and 63 more bugs than two state-of-the-art LLM-based FL approaches AutoFL and AgentFL that both use GPT-3.5. In addition, FlexFL can localize 93 bugs that cannot be localized by non-LLM-based FL techniques at the top 1. Furthermore, to mitigate potential data contamination, we conduct experiments on a dataset which Llama3-8B has not seen before, and the evaluation results show that FlexFL can also achieve good performance.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1455-1471"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FlexFL: Flexible and Effective Fault Localization With Open-Source Large Language Models\",\"authors\":\"Chuyang Xu;Zhongxin Liu;Xiaoxue Ren;Gehao Zhang;Ming Liang;David Lo\",\"doi\":\"10.1109/TSE.2025.3553363\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fault localization (FL) targets identifying bug locations within a software system, which can enhance debugging efficiency and improve software quality. Due to the impressive code comprehension ability of Large Language Models (LLMs), a few studies have proposed to leverage LLMs to locate bugs, i.e., LLM-based FL, and demonstrated promising performance. However, first, these methods are limited in flexibility. They rely on bug-triggering test cases to perform FL and cannot make use of other available bug-related information, e.g., bug reports. Second, they are built upon proprietary LLMs, which are, although powerful, confronted with risks in data privacy. To address these limitations, we propose a novel LLM-based FL framework named FlexFL, which can flexibly leverage different types of bug-related information and effectively work with open-source LLMs. FlexFL is composed of two stages. In the first stage, FlexFL reduces the search space of buggy code using state-of-the-art FL techniques of different families and provides a candidate list of bug-related methods. In the second stage, FlexFL leverages LLMs to delve deeper to double-check the code snippets of methods suggested by the first stage and refine fault localization results. In each stage, FlexFL constructs agents based on open-source LLMs, which share the same pipeline that does not postulate any type of bug-related information and can interact with function calls without the out-of-the-box capability. Extensive experimental results on Defects4J demonstrate that FlexFL outperforms the baselines and can work with different open-source LLMs. Specifically, FlexFL with a lightweight open-source LLM Llama3-8B can locate 42 and 63 more bugs than two state-of-the-art LLM-based FL approaches AutoFL and AgentFL that both use GPT-3.5. In addition, FlexFL can localize 93 bugs that cannot be localized by non-LLM-based FL techniques at the top 1. Furthermore, to mitigate potential data contamination, we conduct experiments on a dataset which Llama3-8B has not seen before, and the evaluation results show that FlexFL can also achieve good performance.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 5\",\"pages\":\"1455-1471\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10934742/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10934742/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
FlexFL: Flexible and Effective Fault Localization With Open-Source Large Language Models
Fault localization (FL) targets identifying bug locations within a software system, which can enhance debugging efficiency and improve software quality. Due to the impressive code comprehension ability of Large Language Models (LLMs), a few studies have proposed to leverage LLMs to locate bugs, i.e., LLM-based FL, and demonstrated promising performance. However, first, these methods are limited in flexibility. They rely on bug-triggering test cases to perform FL and cannot make use of other available bug-related information, e.g., bug reports. Second, they are built upon proprietary LLMs, which are, although powerful, confronted with risks in data privacy. To address these limitations, we propose a novel LLM-based FL framework named FlexFL, which can flexibly leverage different types of bug-related information and effectively work with open-source LLMs. FlexFL is composed of two stages. In the first stage, FlexFL reduces the search space of buggy code using state-of-the-art FL techniques of different families and provides a candidate list of bug-related methods. In the second stage, FlexFL leverages LLMs to delve deeper to double-check the code snippets of methods suggested by the first stage and refine fault localization results. In each stage, FlexFL constructs agents based on open-source LLMs, which share the same pipeline that does not postulate any type of bug-related information and can interact with function calls without the out-of-the-box capability. Extensive experimental results on Defects4J demonstrate that FlexFL outperforms the baselines and can work with different open-source LLMs. Specifically, FlexFL with a lightweight open-source LLM Llama3-8B can locate 42 and 63 more bugs than two state-of-the-art LLM-based FL approaches AutoFL and AgentFL that both use GPT-3.5. In addition, FlexFL can localize 93 bugs that cannot be localized by non-LLM-based FL techniques at the top 1. Furthermore, to mitigate potential data contamination, we conduct experiments on a dataset which Llama3-8B has not seen before, and the evaluation results show that FlexFL can also achieve good performance.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.