Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI:10.1109/ICSE43902.2021.00138

Song Wang, Nishtha Shrestha, Abarna Kucheri Subburaman, Junjie Wang, Moshi Wei, Nachiappan Nagappan

{"title":"Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?","authors":"Song Wang, Nishtha Shrestha, Abarna Kucheri Subburaman, Junjie Wang, Moshi Wei, Nachiappan Nagappan","doi":"10.1109/ICSE43902.2021.00138","DOIUrl":null,"url":null,"abstract":"Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

Abstract

Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.

查看原文本刊更多论文

机器学习库的自动单元测试生成:我们走了多远?

探索输入空间并为给定程序生成有效测试用例的自动单元测试生成已经研究了几十年。许多单元测试生成工具可以帮助生成在程序上具有高结构覆盖率的单元测试用例。然而，现有的测试生成工具主要在通用软件程序上进行评估，这一事实引发了对机器学习库的实际有效性和有用性的质疑，机器学习库是面向统计的，并且与通用软件项目具有根本不同的性质和结构。在本文中，我们着手研究现有单元测试生成技术在机器学习库上的有效性。为了研究这个问题，我们对五个广泛使用的机器学习库进行了实证研究，这些库使用了两个流行的单元测试用例生成工具，即EVOSUITE和Randoop。我们发现(1)大多数机器学习库没有维护一个高质量的单元测试套件，关于常用的质量指标，如代码覆盖率(平均为34.1%)和突变分数(平均为21.3%)，(2)单元测试用例生成工具，即EVOSUITE和Randoop，导致代码覆盖率和突变分数的明显改善，但是，改进是有限的。(3)在五个机器学习库的未覆盖代码中存在通用模式，可用于改进单元测试用例生成任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量