深度神经网络故障定位技术的实证研究。

IF 3.6 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-06-10 DOI:10.1007/s10664-025-10657-7

Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella

{"title":"深度神经网络故障定位技术的实证研究。","authors":"Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella","doi":"10.1007/s10664-025-10657-7","DOIUrl":null,"url":null,"abstract":"With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g. the human-defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.33 and precision of 0.21). However, such figures increase when considering alternative, equivalent patches that exist for a given faulty DNN. The results indicate that DeepFD is the most effective tool, achieving an average recall of 0.55 and a precision of 0.37 on our benchmark.","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 5","pages":"124"},"PeriodicalIF":3.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12152046/pdf/","citationCount":"0","resultStr":"{\"title\":\"An empirical study of fault localisation techniques for deep neural networks.\",\"authors\":\"Nargiz Humbatova, Jinhan Kim, Gunel Jahangirova, Shin Yoo, Paolo Tonella\",\"doi\":\"10.1007/s10664-025-10657-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g. the human-defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.33 and precision of 0.21). However, such figures increase when considering alternative, equivalent patches that exist for a given faulty DNN. The results indicate that DeepFD is the most effective tool, achieving an average recall of 0.55 and a precision of 0.37 on our benchmark.\",\"PeriodicalId\":11525,\"journal\":{\"name\":\"Empirical Software Engineering\",\"volume\":\"30 5\",\"pages\":\"124\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12152046/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Empirical Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10664-025-10657-7\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-025-10657-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/10 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

随着深度神经网络（DNN）的日益普及，对工具的需求也在增加，以帮助开发人员在DNN的实现、测试和调试过程中。已经提出了几种自动分析和定位被测深度神经网络潜在故障的方法。在这项工作中，我们评估和比较了现有的最先进的故障定位技术，这些技术基于深度神经网络的动态和静态分析。评估是在一个基准上执行的，该基准包括从bug报告平台获得的真实故障和由突变工具产生的故障模型。我们的研究结果表明，使用单个特定的基础真值（例如人类定义的真值）来评估DNN故障定位工具的性能非常低（最大平均召回率为0.33，精度为0.21）。然而，当考虑到一个给定错误的深度神经网络存在的替代、等效补丁时，这个数字会增加。结果表明，DeepFD是最有效的工具，在我们的基准上实现了0.55的平均召回率和0.37的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

An empirical study of fault localisation techniques for deep neural networks.

查看原文本刊更多论文

An empirical study of fault localisation techniques for deep neural networks.

With the increased popularity of Deep Neural Networks (DNNs), increases also the need for tools to assist developers in the DNN implementation, testing and debugging process. Several approaches have been proposed that automatically analyse and localise potential faults in DNNs under test. In this work, we evaluate and compare existing state-of-the-art fault localisation techniques, which operate based on both dynamic and static analysis of the DNN. The evaluation is performed on a benchmark consisting of both real faults obtained from bug reporting platforms and faulty models produced by a mutation tool. Our findings indicate that the usage of a single, specific ground truth (e.g. the human-defined one) for the evaluation of DNN fault localisation tools results in pretty low performance (maximum average recall of 0.33 and precision of 0.21). However, such figures increase when considering alternative, equivalent patches that exist for a given faulty DNN. The results indicate that DeepFD is the most effective tool, achieving an average recall of 0.55 and a precision of 0.37 on our benchmark.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.