Divya M. Kamath, Eduardo Fernandes, Bram Adams, Ahmed E. Hassan
{"title":"On combining commit grouping and build skip prediction to reduce redundant continuous integration activity","authors":"Divya M. Kamath, Eduardo Fernandes, Bram Adams, Ahmed E. Hassan","doi":"10.1007/s10664-024-10477-1","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Context</h3><p>Continuous Integration (CI) is a resource intensive, widely used industry practice. The two most commonly used heuristics to reduce the number of builds are either by grouping multiple builds together or by skipping builds predicted to be safe. Yet, both techniques have their disadvantages in terms of missing build failures and respectively higher build turn-around time (delays).</p><h3 data-test=\"abstract-sub-heading\">Objective</h3><p>We aim to bring together these two lines of research, empirically comparing their advantages and disadvantages over time, and proposing and evaluating two ways in which these build avoidance heuristics can be combined more effectively, i.e., the ML-CI model based on machine learning and the Timeout Rule.</p><h3 data-test=\"abstract-sub-heading\">Method</h3><p>We empirically study the trade-off between reduction in the number of builds required and the speed of recognition of failing builds on a dataset of 79,482 builds from 20 open-source projects.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>We find that both of our hybrid heuristics can provide a significant improvement in terms of less missed build failures and lower delays than the baseline heuristics. They substantially reduce the turn-around-time of commits by 96% in comparison to skipping heuristics, the Timeout Rule also enables a median of 26.10% less builds to be scheduled than grouping heuristics.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Our hybrid approaches offer build engineers a better flexibility in terms of scheduling builds during CI without compromising the quality of the resulting software.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"70 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-024-10477-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Context
Continuous Integration (CI) is a resource intensive, widely used industry practice. The two most commonly used heuristics to reduce the number of builds are either by grouping multiple builds together or by skipping builds predicted to be safe. Yet, both techniques have their disadvantages in terms of missing build failures and respectively higher build turn-around time (delays).
Objective
We aim to bring together these two lines of research, empirically comparing their advantages and disadvantages over time, and proposing and evaluating two ways in which these build avoidance heuristics can be combined more effectively, i.e., the ML-CI model based on machine learning and the Timeout Rule.
Method
We empirically study the trade-off between reduction in the number of builds required and the speed of recognition of failing builds on a dataset of 79,482 builds from 20 open-source projects.
Results
We find that both of our hybrid heuristics can provide a significant improvement in terms of less missed build failures and lower delays than the baseline heuristics. They substantially reduce the turn-around-time of commits by 96% in comparison to skipping heuristics, the Timeout Rule also enables a median of 26.10% less builds to be scheduled than grouping heuristics.
Conclusions
Our hybrid approaches offer build engineers a better flexibility in terms of scheduling builds during CI without compromising the quality of the resulting software.
期刊介绍:
Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories.
The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings.
Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.