Measuring the cost of regression testing in practice: a study of Java projects using continuous integration

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering Pub Date : 2017-08-21 DOI:10.1145/3106237.3106288

Adriaan Labuschagne, Laura Inozemtseva, Reid Holmes

{"title":"Measuring the cost of regression testing in practice: a study of Java projects using continuous integration","authors":"Adriaan Labuschagne, Laura Inozemtseva, Reid Holmes","doi":"10.1145/3106237.3106288","DOIUrl":null,"url":null,"abstract":"Software defects cost time and money to diagnose and fix. Consequently, developers use a variety of techniques to avoid introducing defects into their systems. However, these techniques have costs of their own; the benefit of using a technique must outweigh the cost of applying it. In this paper we investigate the costs and benefits of automated regression testing in practice. Specifically, we studied 61 projects that use Travis CI, a cloud-based continuous integration tool, in order to examine real test failures that were encountered by the developers of those projects. We determined how the developers resolved the failures they encountered and used this information to classify the failures as being caused by a flaky test, by a bug in the system under test, or by a broken or obsolete test. We consider that test failures caused by bugs represent a benefit of the test suite, while failures caused by broken or obsolete tests represent a test suite maintenance cost. We found that 18% of test suite executions fail and that 13% of these failures are flaky. Of the non-flaky failures, only 74% were caused by a bug in the system under test; the remaining 26% were due to incorrect or obsolete tests. In addition, we found that, in the failed builds, only 0.38% of the test case executions failed and 64% of failed builds contained more than one failed test. Our findings contribute to a wider understanding of the unforeseen costs that can impact the overall cost effectiveness of regression testing in practice. They can also inform research into test case selection techniques, as we have provided an approximate empirical bound on the practical value that could be extracted from such techniques. This value appears to be large, as the 61 systems under study contained nearly 3 million lines of test code and yet over 99% of test case executions could have been eliminated with a perfect oracle.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"78","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106237.3106288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 78

Abstract

Software defects cost time and money to diagnose and fix. Consequently, developers use a variety of techniques to avoid introducing defects into their systems. However, these techniques have costs of their own; the benefit of using a technique must outweigh the cost of applying it. In this paper we investigate the costs and benefits of automated regression testing in practice. Specifically, we studied 61 projects that use Travis CI, a cloud-based continuous integration tool, in order to examine real test failures that were encountered by the developers of those projects. We determined how the developers resolved the failures they encountered and used this information to classify the failures as being caused by a flaky test, by a bug in the system under test, or by a broken or obsolete test. We consider that test failures caused by bugs represent a benefit of the test suite, while failures caused by broken or obsolete tests represent a test suite maintenance cost. We found that 18% of test suite executions fail and that 13% of these failures are flaky. Of the non-flaky failures, only 74% were caused by a bug in the system under test; the remaining 26% were due to incorrect or obsolete tests. In addition, we found that, in the failed builds, only 0.38% of the test case executions failed and 64% of failed builds contained more than one failed test. Our findings contribute to a wider understanding of the unforeseen costs that can impact the overall cost effectiveness of regression testing in practice. They can also inform research into test case selection techniques, as we have provided an approximate empirical bound on the practical value that could be extracted from such techniques. This value appears to be large, as the 61 systems under study contained nearly 3 million lines of test code and yet over 99% of test case executions could have been eliminated with a perfect oracle.

查看原文本刊更多论文

软件缺陷的诊断和修复需要花费时间和金钱。因此，开发人员使用各种技术来避免将缺陷引入他们的系统。然而，这些技术本身也有成本;使用一种技术的好处必须大于应用它的成本。在本文中，我们研究了自动化回归测试在实践中的成本和收益。具体来说，我们研究了61个使用Travis CI(一种基于云的持续集成工具)的项目，以便检查这些项目的开发人员遇到的真实测试失败。我们确定开发人员如何解决他们遇到的失败，并使用该信息将失败分类为由不可靠的测试、由测试系统中的错误、或由破碎或过时的测试引起的失败。我们认为，由bug引起的测试失败代表了测试套件的好处，而由破碎或过时的测试引起的失败则代表了测试套件维护成本。我们发现18%的测试套件执行失败了，其中13%的失败是不可靠的。在非片状故障中，只有74%是由被测系统中的错误引起的;剩下的26%是由于不正确或过时的测试。另外，我们发现，在失败的构建中，只有0.38%的测试用例执行失败，64%的失败构建包含不止一个失败的测试。我们的发现有助于更广泛地理解不可预见的成本，这些成本可以影响实践中回归测试的总体成本效益。它们还可以为测试用例选择技术的研究提供信息，因为我们已经提供了可以从这些技术中提取的实用价值的近似经验边界。这个值看起来很大，因为所研究的61个系统包含近300万行测试代码，但是99%以上的测试用例执行可以用一个完美的oracle消除。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

自引率

0.00%

发文量