Empirically evaluating readily available information for regression test optimization in continuous integration

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis Pub Date : 2021-07-11 DOI:10.1145/3460319.3464834

Daniel Elsner, Florian Hauer, A. Pretschner, Silke Reimer

{"title":"Empirically evaluating readily available information for regression test optimization in continuous integration","authors":"Daniel Elsner, Florian Hauer, A. Pretschner, Silke Reimer","doi":"10.1145/3460319.3464834","DOIUrl":null,"url":null,"abstract":"Regression test selection (RTS) and prioritization (RTP) techniques aim to reduce testing efforts and developer feedback time after a change to the code base. Using various information sources, including test traces, build dependencies, version control data, and test histories, they have been shown to be effective. However, not all of these sources are guaranteed to be available and accessible for arbitrary continuous integration (CI) environments. In contrast, metadata from version control systems (VCSs) and CI systems are readily available and inexpensive. Yet, corresponding RTP and RTS techniques are scattered across research and often only evaluated on synthetic faults or in a specific industrial context. It is cumbersome for practitioners to identify insights that apply to their context, let alone to calibrate associated parameters for maximum cost-effectiveness. This paper consolidates existing work on RTP and unsafe RTS into an actionable methodology to build and evaluate such approaches that exclusively rely on CI and VCS metadata. To investigate how these approaches from prior research compare in heterogeneous settings, we apply the methodology in a large-scale empirical study on a set of 23 projects covering 37,000 CI logs and 76,000 VCS commits. We find that these approaches significantly outperform established RTP baselines and, while still triggering 90% of the failures, we show that practitioners can expect to save on average 84% of test execution time for unsafe RTS. We also find that it can be beneficial to limit training data, features from test history work better than change-based features, and, somewhat surprisingly, simple and well-known heuristics often outperform complex machine-learned models.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460319.3464834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Regression test selection (RTS) and prioritization (RTP) techniques aim to reduce testing efforts and developer feedback time after a change to the code base. Using various information sources, including test traces, build dependencies, version control data, and test histories, they have been shown to be effective. However, not all of these sources are guaranteed to be available and accessible for arbitrary continuous integration (CI) environments. In contrast, metadata from version control systems (VCSs) and CI systems are readily available and inexpensive. Yet, corresponding RTP and RTS techniques are scattered across research and often only evaluated on synthetic faults or in a specific industrial context. It is cumbersome for practitioners to identify insights that apply to their context, let alone to calibrate associated parameters for maximum cost-effectiveness. This paper consolidates existing work on RTP and unsafe RTS into an actionable methodology to build and evaluate such approaches that exclusively rely on CI and VCS metadata. To investigate how these approaches from prior research compare in heterogeneous settings, we apply the methodology in a large-scale empirical study on a set of 23 projects covering 37,000 CI logs and 76,000 VCS commits. We find that these approaches significantly outperform established RTP baselines and, while still triggering 90% of the failures, we show that practitioners can expect to save on average 84% of test execution time for unsafe RTS. We also find that it can be beneficial to limit training data, features from test history work better than change-based features, and, somewhat surprisingly, simple and well-known heuristics often outperform complex machine-learned models.

查看原文本刊更多论文

经验性地评估持续集成中回归测试优化的可用信息

回归测试选择(RTS)和优先级排序(RTP)技术的目的是减少测试工作和更改代码库后开发人员的反馈时间。使用各种信息源，包括测试跟踪、构建依赖关系、版本控制数据和测试历史，它们已被证明是有效的。然而，并不是所有这些源都保证对任意持续集成(CI)环境可用和可访问。相反，来自版本控制系统(vcs)和CI系统的元数据很容易获得，而且价格低廉。然而，相应的RTP和RTS技术分散在研究中，通常只在合成故障或特定工业环境中进行评估。对于从业者来说，识别应用于其环境的见解是很麻烦的，更不用说校准相关参数以获得最大的成本效益了。本文将RTP和不安全RTS的现有工作整合为一种可操作的方法，以构建和评估仅依赖CI和VCS元数据的方法。为了研究这些来自先前研究的方法在异构环境中的比较，我们将该方法应用于一项大规模的实证研究，该研究涵盖了23个项目，涵盖37,000个CI日志和76,000个VCS提交。我们发现这些方法明显优于已建立的RTP基线，虽然仍然触发90%的失败，但我们表明从业者可以期望为不安全的RTS节省平均84%的测试执行时间。我们还发现限制训练数据是有益的，来自测试历史的特征比基于变化的特征工作得更好，并且，有点令人惊讶的是，简单和众所周知的启发式通常优于复杂的机器学习模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

自引率

0.00%

发文量