在大规模工业环境中导致测试掉片的根本原因

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis Pub Date : 2019-07-10 DOI:10.1145/3293882.3330570

Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, Suresh Thummalapenta

{"title":"在大规模工业环境中导致测试掉片的根本原因","authors":"Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, Suresh Thummalapenta","doi":"10.1145/3293882.3330570","DOIUrl":null,"url":null,"abstract":"In today’s agile world, developers often rely on continuous integration pipelines to help build and validate their changes by executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may pass and fail with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies. As developers in a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our results show that although the number of distinct flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing runs. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. By sharing the findings from our study, our framework and tool, and a dataset of logs, we hope to encourage more research on this important problem.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"80 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"92","resultStr":"{\"title\":\"Root causing flaky tests in a large-scale industrial setting\",\"authors\":\"Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, Suresh Thummalapenta\",\"doi\":\"10.1145/3293882.3330570\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today’s agile world, developers often rely on continuous integration pipelines to help build and validate their changes by executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may pass and fail with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies. As developers in a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our results show that although the number of distinct flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing runs. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. By sharing the findings from our study, our framework and tool, and a dataset of logs, we hope to encourage more research on this important problem.\",\"PeriodicalId\":20624,\"journal\":{\"name\":\"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"volume\":\"80 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"92\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3293882.3330570\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3293882.3330570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 92

摘要

在当今的敏捷世界中，开发人员通常依赖于持续集成管道，通过以有效的方式执行测试来帮助构建和验证他们的更改。阻碍开发人员生产力的一个重要因素是不稳定的测试——对于同一版本的代码，测试可能通过，也可能失败。由于零星的测试失败不能确定地重现，开发人员经常不得不花费数小时才发现偶尔的失败与他们的更改无关。然而，忽略零散测试的失败可能是危险的，因为这些失败可能代表生产代码中的真正错误。此外，确定脆弱的根本原因是乏味和麻烦的，因为它们通常是由于各种因素(如并发性和外部依赖)导致的意外和不确定性行为的结果。作为大型工业环境中的开发人员，我们首先通过对它们进行研究来描述我们使用片状测试的经验。我们的结果表明，尽管不同的片状测试的数量可能很低，但是由于片状测试而导致的构建失败的百分比可能很大。为了减轻不稳定测试对开发人员的负担，我们描述了我们的端到端框架，该框架有助于识别不稳定的测试并了解其根本原因。我们的框架使用片状测试和所有相关代码来记录各种运行时属性，然后使用一个称为RootFinder的初步工具来查找通过和失败运行的日志中的差异。使用我们的框架，我们收集并发布真实世界的数据集，匿名的片状测试执行日志。通过分享我们的研究结果，我们的框架和工具，以及日志数据集，我们希望鼓励对这个重要问题进行更多的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Root causing flaky tests in a large-scale industrial setting

In today’s agile world, developers often rely on continuous integration pipelines to help build and validate their changes by executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may pass and fail with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies. As developers in a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our results show that although the number of distinct flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing runs. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. By sharing the findings from our study, our framework and tool, and a dataset of logs, we hope to encourage more research on this important problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

自引率

0.00%

发文量