When Deep Learning Meets Information Retrieval-based Bug Localization: A Survey

IF 23.8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

ACM Computing Surveys Pub Date : 2025-05-05 DOI:10.1145/3734217

Feifei Niu, Chuanyi Li, Kui Liu, Xin Xia, David Lo

{"title":"When Deep Learning Meets Information Retrieval-based Bug Localization: A Survey","authors":"Feifei Niu, Chuanyi Li, Kui Liu, Xin Xia, David Lo","doi":"10.1145/3734217","DOIUrl":null,"url":null,"abstract":"Bug localization is a crucial aspect of software maintenance, running through the entire software lifecycle. Information retrieval-based bug localization (IRBL) identifies buggy code based on bug reports, expediting the bug resolution process for developers. Recent years have witnessed significant achievements in IRBL, propelled by the widespread adoption of deep learning (DL). To provide a comprehensive overview of the current state of the art and delve into key issues, we conduct a survey encompassing 61 IRBL studies leveraging DL. We summarize best practices in each phase of the IRBL workflow, undertake a meta-analysis of prior studies, and suggest future research directions. This exploration aims to guide further advancements in the field, fostering a deeper understanding and refining practices for effective bug localization. Our study suggests that the integration of DL in IRBL enhances the model’s capacity to extract semantic and syntactic information from both bug reports and source code, addressing issues such as lexical gaps, neglect of code structure information, and cold-start problems. Future research avenues for IRBL encompass exploring diversity in programming languages, adopting fine-grained granularity, and focusing on real-world applications. Most importantly, although some studies have started using large language models for IRBL, there is still a need for more in-depth exploration and thorough investigation in this area.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"18 1","pages":""},"PeriodicalIF":23.8000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3734217","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Bug localization is a crucial aspect of software maintenance, running through the entire software lifecycle. Information retrieval-based bug localization (IRBL) identifies buggy code based on bug reports, expediting the bug resolution process for developers. Recent years have witnessed significant achievements in IRBL, propelled by the widespread adoption of deep learning (DL). To provide a comprehensive overview of the current state of the art and delve into key issues, we conduct a survey encompassing 61 IRBL studies leveraging DL. We summarize best practices in each phase of the IRBL workflow, undertake a meta-analysis of prior studies, and suggest future research directions. This exploration aims to guide further advancements in the field, fostering a deeper understanding and refining practices for effective bug localization. Our study suggests that the integration of DL in IRBL enhances the model’s capacity to extract semantic and syntactic information from both bug reports and source code, addressing issues such as lexical gaps, neglect of code structure information, and cold-start problems. Future research avenues for IRBL encompass exploring diversity in programming languages, adopting fine-grained granularity, and focusing on real-world applications. Most importantly, although some studies have started using large language models for IRBL, there is still a need for more in-depth exploration and thorough investigation in this area.

查看原文本刊更多论文

当深度学习遇到基于信息检索的Bug定位：一个调查

Bug定位是软件维护的一个关键方面，贯穿于整个软件生命周期。基于信息检索的错误定位（IRBL）根据错误报告识别错误代码，加快了开发人员的错误解决过程。近年来，在深度学习（DL）广泛采用的推动下，IRBL取得了重大成就。为了全面概述当前的技术状况并深入研究关键问题，我们对61项利用DL的IRBL研究进行了调查。我们总结了IRBL工作流程中每个阶段的最佳实践，对先前的研究进行了荟萃分析，并提出了未来的研究方向。这一探索旨在指导该领域的进一步发展，为有效的bug定位培养更深层次的理解和改进实践。我们的研究表明，在IRBL中集成DL增强了模型从bug报告和源代码中提取语义和句法信息的能力，解决了词汇空白、忽略代码结构信息和冷启动问题等问题。IRBL未来的研究方向包括探索编程语言的多样性，采用细粒度，以及关注现实世界的应用。最重要的是，虽然一些研究已经开始使用大型语言模型来研究IRBL，但在这一领域仍需要更深入的探索和深入的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.