The Unit Test Quality of Deep Learning Libraries: A Mutation Analysis

2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2021-09-01 DOI:10.26226/morressier.613b5418842293c031b5b5cb

Li Jia, Hao Zhong, Linpeng Huang

{"title":"The Unit Test Quality of Deep Learning Libraries: A Mutation Analysis","authors":"Li Jia, Hao Zhong, Linpeng Huang","doi":"10.26226/morressier.613b5418842293c031b5b5cb","DOIUrl":null,"url":null,"abstract":"In recent years, with the flourish of deep learning techniques, deep learning libraries have been used by many smart applications. As smart applications are used in critical scenarios, their bugs become a concern, and bugs in deep learning libraries have far-reaching impacts on their built-on applications. Although programmers write many test cases for deep learning libraries, to the best of our knowledge, no prior study has ever explored to what degree such test cases are sufficient. As a result, some fundamental questions about these test cases are still open. For example, to what degree can existing test cases detect bugs in deep libraries? How to improve such test cases? To help programmers improve their test cases and to shed light on the detection techniques of deep learning bugs, there is a strong need for a study on the test quality of deep learning libraries. To meet the strong need, in this paper, we conduct the first empirical study on this issue. Our basic idea is to inject bugs into deep learning libraries, and to check to what degree existing test cases can detect our injected bugs. With a mutation tool, we constructed 1,545 buggy versions (i.e., mutants). By comparing the testing results between clean and buggy versions, our study leads to 11 findings, and we summarize them into the answers to three research questions. For example, we find that although existing test cases detected 60% of our injected bugs, only 30% of such bugs were detected by the assertions of these test cases. As another example, we find that some exceptions were thrown only in specific learning phases. Furthermore, we interpret our results from the perspectives of researchers, library developers, and application programmers.","PeriodicalId":205629,"journal":{"name":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26226/morressier.613b5418842293c031b5b5cb","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

In recent years, with the flourish of deep learning techniques, deep learning libraries have been used by many smart applications. As smart applications are used in critical scenarios, their bugs become a concern, and bugs in deep learning libraries have far-reaching impacts on their built-on applications. Although programmers write many test cases for deep learning libraries, to the best of our knowledge, no prior study has ever explored to what degree such test cases are sufficient. As a result, some fundamental questions about these test cases are still open. For example, to what degree can existing test cases detect bugs in deep libraries? How to improve such test cases? To help programmers improve their test cases and to shed light on the detection techniques of deep learning bugs, there is a strong need for a study on the test quality of deep learning libraries. To meet the strong need, in this paper, we conduct the first empirical study on this issue. Our basic idea is to inject bugs into deep learning libraries, and to check to what degree existing test cases can detect our injected bugs. With a mutation tool, we constructed 1,545 buggy versions (i.e., mutants). By comparing the testing results between clean and buggy versions, our study leads to 11 findings, and we summarize them into the answers to three research questions. For example, we find that although existing test cases detected 60% of our injected bugs, only 30% of such bugs were detected by the assertions of these test cases. As another example, we find that some exceptions were thrown only in specific learning phases. Furthermore, we interpret our results from the perspectives of researchers, library developers, and application programmers.

查看原文本刊更多论文

深度学习库的单元测试质量:一个突变分析

近年来，随着深度学习技术的蓬勃发展，深度学习库被许多智能应用所使用。随着智能应用在关键场景中的使用，其bug成为人们关注的焦点，而深度学习库中的bug对其内置的应用有着深远的影响。尽管程序员为深度学习库编写了许多测试用例，但据我们所知，之前没有研究探索过这些测试用例在多大程度上是足够的。因此，关于这些测试用例的一些基本问题仍然没有解决。例如，现有的测试用例在多大程度上可以检测到深层库中的bug ?如何改进这样的测试用例?为了帮助程序员改进他们的测试用例并阐明深度学习错误的检测技术，对深度学习库的测试质量进行研究是非常必要的。为了满足这一强烈的需求，本文对这一问题进行了首次实证研究。我们的基本想法是将bug注入到深度学习库中，并检查现有的测试用例在多大程度上可以检测到我们注入的bug。使用突变工具，我们构建了1,545个有bug的版本(即突变)。通过比较干净版本和bug版本的测试结果，我们的研究得出了11个发现，并将它们总结为三个研究问题的答案。例如，我们发现尽管现有的测试用例检测了60%的注入错误，但是只有30%的错误被这些测试用例的断言检测到。作为另一个例子，我们发现一些异常只在特定的学习阶段被抛出。此外，我们从研究人员、库开发人员和应用程序程序员的角度来解释我们的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量