Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T)

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) Pub Date : 2015-11-09 DOI:10.1109/ASE.2015.86

S. Shamshiri, René Just, J. Rojas, G. Fraser, Phil McMinn, Andrea Arcuri

{"title":"Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T)","authors":"S. Shamshiri, René Just, J. Rojas, G. Fraser, Phil McMinn, Andrea Arcuri","doi":"10.1109/ASE.2015.86","DOIUrl":null,"url":null,"abstract":"Rather than tediously writing unit tests manually, tools can be used to generate them automatically - sometimes even resulting in higher code coverage than manual testing. But how good are these tests at actually finding faults? To answer this question, we applied three state-of-the-art unit test generation tools for Java (Randoop, EvoSuite, and Agitar) to the 357 real faults in the Defects4J dataset and investigated how well the generated test suites perform at detecting these faults. Although the automatically generated test suites detected 55.7% of the faults overall, only 19.9% of all the individual test suites detected a fault. By studying the effectiveness and problems of the individual tools and the tests they generate, we derive insights to support the development of automated unit test generators that achieve a higher fault detection rate. These insights include 1) improving the obtained code coverage so that faulty statements are executed in the first instance, 2) improving the propagation of faulty program states to an observable output, coupled with the generation of more sensitive assertions, and 3) improving the simulation of the execution environment to detect faults that are dependent on external factors such as date and time.","PeriodicalId":6586,"journal":{"name":"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"55 1","pages":"201-211"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"192","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASE.2015.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 192

Abstract

Rather than tediously writing unit tests manually, tools can be used to generate them automatically - sometimes even resulting in higher code coverage than manual testing. But how good are these tests at actually finding faults? To answer this question, we applied three state-of-the-art unit test generation tools for Java (Randoop, EvoSuite, and Agitar) to the 357 real faults in the Defects4J dataset and investigated how well the generated test suites perform at detecting these faults. Although the automatically generated test suites detected 55.7% of the faults overall, only 19.9% of all the individual test suites detected a fault. By studying the effectiveness and problems of the individual tools and the tests they generate, we derive insights to support the development of automated unit test generators that achieve a higher fault detection rate. These insights include 1) improving the obtained code coverage so that faulty statements are executed in the first instance, 2) improving the propagation of faulty program states to an observable output, coupled with the generation of more sensitive assertions, and 3) improving the simulation of the execution environment to detect faults that are dependent on external factors such as date and time.

查看原文本刊更多论文

自动生成的单元测试能发现真正的错误吗?有效性与挑战的实证研究(T)

与其单调乏味地手工编写单元测试，不如使用工具自动生成单元测试——有时甚至比手工测试获得更高的代码覆盖率。但是这些测试在发现错误方面有多好呢?为了回答这个问题，我们将三个最先进的Java单元测试生成工具(Randoop、EvoSuite和Agitar)应用于缺陷4j数据集中的357个真实错误，并研究生成的测试套件在检测这些错误方面的表现。尽管自动生成的测试套件检测到55.7%的错误，但是只有19.9%的单个测试套件检测到一个错误。通过研究单个工具及其生成的测试的有效性和问题，我们获得了支持自动化单元测试生成器开发的见解，从而实现更高的故障检测率。这些见解包括:1)改进获得的代码覆盖率，以便在第一个实例中执行错误语句;2)改进将错误程序状态传播到可观察的输出，同时生成更敏感的断言;3)改进执行环境的模拟，以检测依赖于外部因素(如日期和时间)的错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

自引率

0.00%

发文量