Neuron importance-aware coverage analysis for deep neural network testing

IF 3.5 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering Pub Date : 2024-07-25 DOI:10.1007/s10664-024-10524-x

Hongjing Guo, Chuanqi Tao, Zhiqiu Huang

{"title":"Neuron importance-aware coverage analysis for deep neural network testing","authors":"Hongjing Guo, Chuanqi Tao, Zhiqiu Huang","doi":"10.1007/s10664-024-10524-x","DOIUrl":null,"url":null,"abstract":"<p>Deep Neural Network (DNN) models are widely used in many cutting-edge domains, such as medical diagnostics and autonomous driving. However, an urgent need to test DNN models thoroughly has increasingly risen. Recent research proposes various structural and non-structural coverage criteria to measure test adequacy. Structural coverage criteria quantify the degree to which the internal elements of DNN models are covered by a test suite. However, they convey little information about individual inputs and exhibit limited correlation with defect detection. Additionally, existing non-structural coverage criteria are unaware of neurons’ importance to decision-making. This paper addresses these limitations by proposing novel non-structural coverage criteria. By tracing neurons’ cumulative contribution to the final decision on the training set, this paper identifies important neurons of DNN models. A novel metric is proposed to quantify the difference in important neuron behavior between a test input and the training set, which provides a measured way at individual test input granularity. Additionally, two non-structural coverage criteria are introduced that allow for the quantification of test adequacy by examining differences in important neuron behavior between the testing and the training set. The empirical evaluation of image datasets demonstrates that the proposed metric outperforms the existing non-structural adequacy metrics by up to 14.7% accuracy improvement in capturing error-revealing test inputs. Compared with state-of-the-art coverage criteria, the proposed coverage criteria are more sensitive to errors, including natural errors and adversarial examples.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"90 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10524-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Neural Network (DNN) models are widely used in many cutting-edge domains, such as medical diagnostics and autonomous driving. However, an urgent need to test DNN models thoroughly has increasingly risen. Recent research proposes various structural and non-structural coverage criteria to measure test adequacy. Structural coverage criteria quantify the degree to which the internal elements of DNN models are covered by a test suite. However, they convey little information about individual inputs and exhibit limited correlation with defect detection. Additionally, existing non-structural coverage criteria are unaware of neurons’ importance to decision-making. This paper addresses these limitations by proposing novel non-structural coverage criteria. By tracing neurons’ cumulative contribution to the final decision on the training set, this paper identifies important neurons of DNN models. A novel metric is proposed to quantify the difference in important neuron behavior between a test input and the training set, which provides a measured way at individual test input granularity. Additionally, two non-structural coverage criteria are introduced that allow for the quantification of test adequacy by examining differences in important neuron behavior between the testing and the training set. The empirical evaluation of image datasets demonstrates that the proposed metric outperforms the existing non-structural adequacy metrics by up to 14.7% accuracy improvement in capturing error-revealing test inputs. Compared with state-of-the-art coverage criteria, the proposed coverage criteria are more sensitive to errors, including natural errors and adversarial examples.

Abstract Image

查看原文本刊更多论文

用于深度神经网络测试的神经元重要性感知覆盖率分析

深度神经网络（DNN）模型被广泛应用于许多前沿领域，如医疗诊断和自动驾驶。然而，对 DNN 模型进行全面测试的迫切需求日益高涨。最近的研究提出了各种结构性和非结构性覆盖标准来衡量测试的充分性。结构覆盖标准量化 DNN 模型内部元素被测试套件覆盖的程度。然而，这些标准传达的单个输入信息很少，与缺陷检测的相关性有限。此外，现有的非结构覆盖标准没有意识到神经元对决策的重要性。本文通过提出新型非结构覆盖标准来解决这些局限性。通过追踪神经元对训练集最终决策的累积贡献，本文确定了 DNN 模型的重要神经元。本文提出了一种新的度量标准，用于量化测试输入与训练集之间重要神经元行为的差异，从而提供了单个测试输入粒度的测量方法。此外，还引入了两个非结构覆盖标准，通过检查测试集和训练集之间重要神经元行为的差异，量化测试的充分性。对图像数据集的实证评估表明，在捕捉揭示错误的测试输入方面，所提出的指标优于现有的非结构充分性指标，准确率提高了 14.7%。与最先进的覆盖率标准相比，所提出的覆盖率标准对错误（包括自然错误和对抗性示例）更加敏感。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.