注入漏洞如何影响深度学习?

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2022-03-01 DOI:10.1109/saner53432.2022.00097

Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, Zexuan Li

{"title":"注入漏洞如何影响深度学习?","authors":"Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, Zexuan Li","doi":"10.1109/saner53432.2022.00097","DOIUrl":null,"url":null,"abstract":"In recent years, deep learning obtains amazing achievements in various fields, and has been used in safety-critical scenarios. In such scenarios, bugs in deep learning software can introduce disastrous consequences. To deepen the understanding on bugs in deep learning software, researchers have conducted several empirical studies on their bug characteristics. In the prior studies, researchers analyzed the source code, bug reports, pull requests, and fixes of deep learning bugs. Although these studies provide meaningful findings, to the best of our knowledge, no prior studies have explored the runtime behaviors of deep learning bugs, because it is rather expensive to collect runtime impacts of deep learning bugs. As a result, some fundamental questions along with deep learning bugs are still open. For example, do most such bugs introduce significant impacts on prediction accuracy? The answers to these open questions are useful to a wide range of audience. In this paper, we conducted the first empirical study to analyze the runtime impacts of deep learning bugs. Our basic idea is to inject deliberately designed bugs into a typical deep learning application and its libraries with a mutation tool, and to compare the runtime differences between clean and buggy versions. In this way, we constructed 1,832 buggy versions, and compared their execution results with corresponding clean versions. Based on our comparison, we summarize 9 findings, and present our answers to 3 research questions. For example, we find that more than half of buggy versions do not lead to any observable errors, and most of them introduce only insignificant differences on the accuracy of their trained models. We interpret the significance of our findings from the perspectives of application programmers, API developers, and researchers. For example, based on our findings, better results alone are insufficient to prove better parameters nor better treatments, and researchers shall build strong theories to explain their improvements.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"How Do Injected Bugs Affect Deep Learning?\",\"authors\":\"Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, Zexuan Li\",\"doi\":\"10.1109/saner53432.2022.00097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, deep learning obtains amazing achievements in various fields, and has been used in safety-critical scenarios. In such scenarios, bugs in deep learning software can introduce disastrous consequences. To deepen the understanding on bugs in deep learning software, researchers have conducted several empirical studies on their bug characteristics. In the prior studies, researchers analyzed the source code, bug reports, pull requests, and fixes of deep learning bugs. Although these studies provide meaningful findings, to the best of our knowledge, no prior studies have explored the runtime behaviors of deep learning bugs, because it is rather expensive to collect runtime impacts of deep learning bugs. As a result, some fundamental questions along with deep learning bugs are still open. For example, do most such bugs introduce significant impacts on prediction accuracy? The answers to these open questions are useful to a wide range of audience. In this paper, we conducted the first empirical study to analyze the runtime impacts of deep learning bugs. Our basic idea is to inject deliberately designed bugs into a typical deep learning application and its libraries with a mutation tool, and to compare the runtime differences between clean and buggy versions. In this way, we constructed 1,832 buggy versions, and compared their execution results with corresponding clean versions. Based on our comparison, we summarize 9 findings, and present our answers to 3 research questions. For example, we find that more than half of buggy versions do not lead to any observable errors, and most of them introduce only insignificant differences on the accuracy of their trained models. We interpret the significance of our findings from the perspectives of application programmers, API developers, and researchers. For example, based on our findings, better results alone are insufficient to prove better parameters nor better treatments, and researchers shall build strong theories to explain their improvements.\",\"PeriodicalId\":437520,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/saner53432.2022.00097\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/saner53432.2022.00097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

近年来，深度学习在各个领域取得了惊人的成就，并在安全关键场景中得到了应用。在这种情况下，深度学习软件中的漏洞可能会带来灾难性的后果。为了加深对深度学习软件中bug的认识，研究者对其bug特征进行了多次实证研究。在之前的研究中，研究人员分析了源代码、错误报告、pull请求和深度学习错误的修复。尽管这些研究提供了有意义的发现，但据我们所知，之前没有研究探索深度学习错误的运行时行为，因为收集深度学习错误的运行时影响是相当昂贵的。因此，一些基本问题和深度学习漏洞仍然是开放的。例如，大多数这样的错误会对预测准确性产生重大影响吗?这些开放性问题的答案对广大读者都很有用。在本文中，我们首次进行了实证研究来分析深度学习漏洞对运行时的影响。我们的基本想法是，使用变异工具将故意设计的bug注入到典型的深度学习应用程序及其库中，并比较干净版本和有bug版本之间的运行时差异。通过这种方式，我们构建了1832个有bug的版本，并将它们的执行结果与相应的干净版本进行了比较。基于我们的比较，我们总结了9个发现，并对3个研究问题给出了我们的答案。例如，我们发现超过一半的错误版本不会导致任何可观察到的错误，并且它们中的大多数只会在其训练模型的准确性上引入微不足道的差异。我们从应用程序程序员、API开发人员和研究人员的角度来解释我们的发现的意义。例如，根据我们的发现，仅凭更好的结果不足以证明更好的参数或更好的治疗，研究人员应该建立强有力的理论来解释他们的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How Do Injected Bugs Affect Deep Learning?

In recent years, deep learning obtains amazing achievements in various fields, and has been used in safety-critical scenarios. In such scenarios, bugs in deep learning software can introduce disastrous consequences. To deepen the understanding on bugs in deep learning software, researchers have conducted several empirical studies on their bug characteristics. In the prior studies, researchers analyzed the source code, bug reports, pull requests, and fixes of deep learning bugs. Although these studies provide meaningful findings, to the best of our knowledge, no prior studies have explored the runtime behaviors of deep learning bugs, because it is rather expensive to collect runtime impacts of deep learning bugs. As a result, some fundamental questions along with deep learning bugs are still open. For example, do most such bugs introduce significant impacts on prediction accuracy? The answers to these open questions are useful to a wide range of audience. In this paper, we conducted the first empirical study to analyze the runtime impacts of deep learning bugs. Our basic idea is to inject deliberately designed bugs into a typical deep learning application and its libraries with a mutation tool, and to compare the runtime differences between clean and buggy versions. In this way, we constructed 1,832 buggy versions, and compared their execution results with corresponding clean versions. Based on our comparison, we summarize 9 findings, and present our answers to 3 research questions. For example, we find that more than half of buggy versions do not lead to any observable errors, and most of them introduce only insignificant differences on the accuracy of their trained models. We interpret the significance of our findings from the perspectives of application programmers, API developers, and researchers. For example, based on our findings, better results alone are insufficient to prove better parameters nor better treatments, and researchers shall build strong theories to explain their improvements.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量