Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA

Alexander Berndt, Thomas Bach, Sebastian Baltes
{"title":"Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA","authors":"Alexander Berndt, Thomas Bach, Sebastian Baltes","doi":"arxiv-2409.10062","DOIUrl":null,"url":null,"abstract":"Background: Test flakiness is a major problem in the software industry. Flaky\ntests fail seemingly at random without changes to the code and thus impede\ncontinuous integration (CI). Some researchers argue that all tests can be\nconsidered flaky and that tests only differ in their frequency of flaky\nfailures. Aims: With the goal of developing mitigation strategies to reduce the\nnegative impact of test flakiness, we study characteristics of tests and the\ntest environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a\n12-week period: one based on production data, the other based on targeted test\nexecutions from a dedicated flakiness experiment. We conduct correlation\nanalysis for test and test environment characteristics with respect to their\ninfluence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest\npositive correlation with the test flakiness rate (r = 0.79), which confirms\nprevious studies. Potential reasons for higher flakiness include the larger\ntest scope of long-running tests or test executions on a slower test\ninfrastructure. Interestingly, the load on the testing infrastructure was not\ncorrelated with test flakiness. The relationship between test flakiness and\nrequired resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running\ntests can be an important measure for practitioners to cope with test\nflakiness, as it enables parallelization of test executions and also reduces\nthe cost of re-executions. This effectively decreases the negative effects of\ntest flakiness in complex testing environments. However, when splitting\nlong-running tests, practitioners need to consider the potential test setup\noverhead of test splits.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Test flakiness is a major problem in the software industry. Flaky tests fail seemingly at random without changes to the code and thus impede continuous integration (CI). Some researchers argue that all tests can be considered flaky and that tests only differ in their frequency of flaky failures. Aims: With the goal of developing mitigation strategies to reduce the negative impact of test flakiness, we study characteristics of tests and the test environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a 12-week period: one based on production data, the other based on targeted test executions from a dedicated flakiness experiment. We conduct correlation analysis for test and test environment characteristics with respect to their influence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest positive correlation with the test flakiness rate (r = 0.79), which confirms previous studies. Potential reasons for higher flakiness include the larger test scope of long-running tests or test executions on a slower test infrastructure. Interestingly, the load on the testing infrastructure was not correlated with test flakiness. The relationship between test flakiness and required resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running tests can be an important measure for practitioners to cope with test flakiness, as it enables parallelization of test executions and also reduces the cost of re-executions. This effectively decreases the negative effects of test flakiness in complex testing environments. However, when splitting long-running tests, practitioners need to consider the potential test setup overhead of test splits.
测试和环境复杂性会增加缺陷吗?SAP HANA 的实证研究
背景介绍测试缺陷是软件行业的一个主要问题。在不修改代码的情况下,虚假测试似乎是随机失败的,因此阻碍了持续集成(CI)。一些研究人员认为,所有测试都可以被视为缺陷测试,只是缺陷测试失败的频率不同而已。目的:为了制定缓解策略以降低测试易错性的负面影响,我们研究了可能影响测试易错性的测试和测试环境的特征。研究方法:我们根据 SAP HANA 在 12 周内的测试结果构建了两个数据集:一个数据集基于生产数据,另一个数据集基于专门的弱点实验中的目标测试执行。我们就测试和测试环境特征对片状测试失败频率的影响进行了相关性分析。研究结果在我们的研究中,平均测试执行时间与测试易错率的正相关性最强(r = 0.79),这证实了之前的研究。造成测试不稳定率较高的潜在原因包括长期运行测试的测试范围较大,或测试在速度较慢的测试基础设施上执行。有趣的是,测试基础设施的负载与测试易错性无关。测试易损性与测试执行所需资源之间的关系尚无定论。结论根据我们的研究结果,我们得出结论:对于从业人员来说,拆分长期运行的测试是应对测试易损性的一项重要措施,因为它可以实现测试执行的并行化,还能降低重新执行的成本。在复杂的测试环境中,这能有效降低测试松散性的负面影响。然而,在拆分长期运行的测试时,实践者需要考虑测试拆分可能带来的测试设置开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信