Towards language-independent Brown Build Detection

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) Pub Date : 2022-05-01 DOI:10.1145/3510003.3510122

Doriane Olewicki, Mathieu Nayrolles, Bram Adams

{"title":"Towards language-independent Brown Build Detection","authors":"Doriane Olewicki, Mathieu Nayrolles, Bram Adams","doi":"10.1145/3510003.3510122","DOIUrl":null,"url":null,"abstract":"In principle, continuous integration (CI) practices allow modern software organizations to build and test their products after each code change to detect quality issues as soon as possible. In reality, issues with the build scripts (e.g., missing dependencies) and/or the presence of “flaky tests” lead to build failures that essentially are false positives, not indicative of actual quality problems of the source code. For our industrial partner, which is active in the video game industry, such “brown builds” not only require multidisci-plinary teams to spend more effort interpreting or even re-running the build, leading to substantial redundant build activity, but also slows down the integration pipeline. Hence, this paper aims to prototype and evaluate approaches for early detection of brown build results based on textual similarity to build logs of prior brown builds. The approach is tested on 7 projects (6 closed-source from our industrial collaborators and 1 open-source, Graphviz). We find that our model manages to detect brown builds with a mean F1-score of 53% on the studied projects, which is three times more than the best baseline considered, and at least as good as human experts (but with less effort). Furthermore, we found that cross-project prediction can be used for a project's onboarding phase, that a training set of 30-weeks works best, and that our retraining heuristics keep the F1-score higher than the baseline, while retraining only every 4–5 weeks.","PeriodicalId":202896,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510003.3510122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In principle, continuous integration (CI) practices allow modern software organizations to build and test their products after each code change to detect quality issues as soon as possible. In reality, issues with the build scripts (e.g., missing dependencies) and/or the presence of “flaky tests” lead to build failures that essentially are false positives, not indicative of actual quality problems of the source code. For our industrial partner, which is active in the video game industry, such “brown builds” not only require multidisci-plinary teams to spend more effort interpreting or even re-running the build, leading to substantial redundant build activity, but also slows down the integration pipeline. Hence, this paper aims to prototype and evaluate approaches for early detection of brown build results based on textual similarity to build logs of prior brown builds. The approach is tested on 7 projects (6 closed-source from our industrial collaborators and 1 open-source, Graphviz). We find that our model manages to detect brown builds with a mean F1-score of 53% on the studied projects, which is three times more than the best baseline considered, and at least as good as human experts (but with less effort). Furthermore, we found that cross-project prediction can be used for a project's onboarding phase, that a training set of 30-weeks works best, and that our retraining heuristics keep the F1-score higher than the baseline, while retraining only every 4–5 weeks.

查看原文本刊更多论文

走向与语言无关的棕色构建检测

原则上，持续集成(CI)实践允许现代软件组织在每次代码更改后构建和测试他们的产品，以尽快检测质量问题。实际上，构建脚本的问题(例如，缺少依赖)和/或“不可靠的测试”的存在会导致构建失败，这些失败本质上是误报，而不是源代码实际质量问题的指示。对于我们活跃于电子游戏行业的工业合作伙伴来说，这样的“棕色构建”不仅需要多学科团队花更多的精力来解释甚至重新运行构建，导致大量冗余的构建活动，而且还会减慢集成管道。因此，本文旨在基于先前棕色构建的构建日志的文本相似性来原型化和评估早期检测棕色构建结果的方法。该方法在7个项目上进行了测试(6个来自我们的工业合作伙伴的闭源项目和1个开源项目，Graphviz)。我们发现，我们的模型能够在研究项目中以平均f1分数53%的成绩检测棕色构建，这是所考虑的最佳基线的三倍以上，并且至少与人类专家一样好(但更少的努力)。此外，我们发现跨项目预测可以用于项目的入职阶段，30周的训练集效果最好，我们的再训练启发式保持f1得分高于基线，而每4-5周才进行一次再训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量