深度边界:基于决策边界表示的深度学习软件覆盖率测试方法

2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C) Pub Date : 2022-12-01 DOI:10.1109/QRS-C57518.2022.00032

Yue Liu, Lichao Feng, Xingya Wang, Shiyu Zhang

{"title":"深度边界:基于决策边界表示的深度学习软件覆盖率测试方法","authors":"Yue Liu, Lichao Feng, Xingya Wang, Shiyu Zhang","doi":"10.1109/QRS-C57518.2022.00032","DOIUrl":null,"url":null,"abstract":"With the increasing application of Deep Learning (DL) Software in safety-critical fields such as autonomous driving, we need adequate testing to ensure software quality. Observing the decision-making behavior of a Deep Neural Network (DNN) is an essential step in DL software testing. Taking Guiding Deep Learning System Testing Using Surprise Adequacy (SADL) as an example, it uses the independent neuron activation values in the DNN to represent the decision-making behavior. However, the behavior of the DNN needs to be jointly determined by the continuous outputs of all neurons. As a result, the coverage value of SADL constant volatility and lack of stability. To mitigate this problem, we propose a coverage testing method based on the decision boundary representation, DeepBoundary, for the decision-making behavior of DL software. Unlike SADL, DeepBoundary generates decision boundary data to represent the decision behavior of the DNN, which makes the testing results more stable. On this basis, we calculate the kernel density between the testing data and the decision boundary data. It measures the position of the testing data in the decision space and the distance from the decision boundary. Finally, as an adequacy indicator, we calculate the decision boundary density coverage (DBC) of the entire testing set. The experiment on the dataset MNIST and two DL software shows that DeepBoundary can generate actual decision boundary data. The average confidence error in the DNNs output layer is only 4.20E-05. Compared with SADL, DeepBoundary has a stronger correlation with the defect detection ratio, which can more accurately represent testing adequacy.","PeriodicalId":183728,"journal":{"name":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DeepBoundary: A Coverage Testing Method of Deep Learning Software based on Decision Boundary Representation\",\"authors\":\"Yue Liu, Lichao Feng, Xingya Wang, Shiyu Zhang\",\"doi\":\"10.1109/QRS-C57518.2022.00032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing application of Deep Learning (DL) Software in safety-critical fields such as autonomous driving, we need adequate testing to ensure software quality. Observing the decision-making behavior of a Deep Neural Network (DNN) is an essential step in DL software testing. Taking Guiding Deep Learning System Testing Using Surprise Adequacy (SADL) as an example, it uses the independent neuron activation values in the DNN to represent the decision-making behavior. However, the behavior of the DNN needs to be jointly determined by the continuous outputs of all neurons. As a result, the coverage value of SADL constant volatility and lack of stability. To mitigate this problem, we propose a coverage testing method based on the decision boundary representation, DeepBoundary, for the decision-making behavior of DL software. Unlike SADL, DeepBoundary generates decision boundary data to represent the decision behavior of the DNN, which makes the testing results more stable. On this basis, we calculate the kernel density between the testing data and the decision boundary data. It measures the position of the testing data in the decision space and the distance from the decision boundary. Finally, as an adequacy indicator, we calculate the decision boundary density coverage (DBC) of the entire testing set. The experiment on the dataset MNIST and two DL software shows that DeepBoundary can generate actual decision boundary data. The average confidence error in the DNNs output layer is only 4.20E-05. Compared with SADL, DeepBoundary has a stronger correlation with the defect detection ratio, which can more accurately represent testing adequacy.\",\"PeriodicalId\":183728,\"journal\":{\"name\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QRS-C57518.2022.00032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS-C57518.2022.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着深度学习(DL)软件在自动驾驶等安全关键领域的应用越来越多，我们需要充分的测试来确保软件质量。观察深度神经网络(DNN)的决策行为是深度学习软件测试中必不可少的一步。以SADL (Guiding Deep Learning System Testing Using Surprise Adequacy，引导深度学习系统测试)为例，它使用DNN中独立的神经元激活值来表示决策行为。然而，DNN的行为需要由所有神经元的连续输出共同决定。因此，SADL的覆盖值不断波动，缺乏稳定性。为了缓解这一问题，我们提出了一种基于决策边界表示的深度边界覆盖测试方法，用于深度学习软件的决策行为。与SADL不同，DeepBoundary生成决策边界数据来表示DNN的决策行为，使得测试结果更加稳定。在此基础上，计算了测试数据与决策边界数据之间的核密度。它测量测试数据在决策空间中的位置以及到决策边界的距离。最后，作为充分性指标，计算整个测试集的决策边界密度覆盖率(DBC)。在数据集MNIST和两种深度学习软件上的实验表明，DeepBoundary可以生成实际的决策边界数据。dnn输出层的平均置信误差仅为4.20E-05。与SADL相比，DeepBoundary与缺陷检测率的相关性更强，可以更准确地表示测试的充分性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DeepBoundary: A Coverage Testing Method of Deep Learning Software based on Decision Boundary Representation

With the increasing application of Deep Learning (DL) Software in safety-critical fields such as autonomous driving, we need adequate testing to ensure software quality. Observing the decision-making behavior of a Deep Neural Network (DNN) is an essential step in DL software testing. Taking Guiding Deep Learning System Testing Using Surprise Adequacy (SADL) as an example, it uses the independent neuron activation values in the DNN to represent the decision-making behavior. However, the behavior of the DNN needs to be jointly determined by the continuous outputs of all neurons. As a result, the coverage value of SADL constant volatility and lack of stability. To mitigate this problem, we propose a coverage testing method based on the decision boundary representation, DeepBoundary, for the decision-making behavior of DL software. Unlike SADL, DeepBoundary generates decision boundary data to represent the decision behavior of the DNN, which makes the testing results more stable. On this basis, we calculate the kernel density between the testing data and the decision boundary data. It measures the position of the testing data in the decision space and the distance from the decision boundary. Finally, as an adequacy indicator, we calculate the decision boundary density coverage (DBC) of the entire testing set. The experiment on the dataset MNIST and two DL software shows that DeepBoundary can generate actual decision boundary data. The average confidence error in the DNNs output layer is only 4.20E-05. Compared with SADL, DeepBoundary has a stronger correlation with the defect detection ratio, which can more accurately represent testing adequacy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C)

自引率

0.00%

发文量