Merge Conflict Prediction Using Feature Selection and Stacking Heterogeneous Ensembles: An Empirical Investigation

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Software-Evolution and Process Pub Date : 2025-09-23 DOI:10.1002/smr.70047

Reem Alfayez, Amal Alazba

{"title":"Merge Conflict Prediction Using Feature Selection and Stacking Heterogeneous Ensembles: An Empirical Investigation","authors":"Reem Alfayez, Amal Alazba","doi":"10.1002/smr.70047","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Merge conflicts arise when multiple developers simultaneously modify the same part of a codebase and attempt to merge their changes. These conflicts occur because the version control system (VCS) cannot automatically determine which changes should take precedence. Resolving such conflicts involves manually reviewing the conflicting changes and deciding how to integrate them to maintain a functional and coherent codebase. This process is often time-consuming, complex, and prone to errors. Consequently, the software engineering community has focused on predicting merge conflicts to warn developers early and allow them to address conflicts before they escalate. Despite several efforts to predict merge conflicts, no perfect solution has been identified. Fortunately, many machine learning techniques have demonstrated potential in improving prediction performance across various contexts. This study aims to empirically investigate the effectiveness of stacking heterogeneous ensembles in enhancing merge conflict prediction performance. We empirically compared the prediction performance of the following individual models: decision trees (DT); support vector machine (SVM) with a linear kernel; naive Bayes (NB) with Bernoulli, Gaussian, and Multinomial variants; logistic regression (LR); multilayer perceptron (MLP); stochastic gradient descent (SGD); and k-nearest neighbors (KNN). Additionally, we evaluated three heterogeneous stacking ensembles: Stack-DT, Stack-SVM, and Stack-LR, which were constructed using the aforementioned individual models as base models. We utilized gain ratio (GR) to identify the most important technical and social features for predicting merge conflicts and assessed the impact of using only these important features on the performance of both individual and stacking models. The study revealed variability in the performance of individual models, with DT demonstrating the best predictive performance among them. Heterogeneous stacking ensembles demonstrated potential to enhance merge conflict prediction, with Stack-SVM emerging as the top-performing model. GR analysis highlighted the importance of both social and technical features in predicting merge conflicts. However, using only the most important features identified by GR led to a decline in the performance of most models compared to using all features. Heterogeneous stacking ensembles significantly improve prediction performance over individual models. Both social and technical features are important in predicting merge conflicts, and utilizing the full set of features instead of only the most important ones generally yields better results.</p>\n </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.70047","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Merge conflicts arise when multiple developers simultaneously modify the same part of a codebase and attempt to merge their changes. These conflicts occur because the version control system (VCS) cannot automatically determine which changes should take precedence. Resolving such conflicts involves manually reviewing the conflicting changes and deciding how to integrate them to maintain a functional and coherent codebase. This process is often time-consuming, complex, and prone to errors. Consequently, the software engineering community has focused on predicting merge conflicts to warn developers early and allow them to address conflicts before they escalate. Despite several efforts to predict merge conflicts, no perfect solution has been identified. Fortunately, many machine learning techniques have demonstrated potential in improving prediction performance across various contexts. This study aims to empirically investigate the effectiveness of stacking heterogeneous ensembles in enhancing merge conflict prediction performance. We empirically compared the prediction performance of the following individual models: decision trees (DT); support vector machine (SVM) with a linear kernel; naive Bayes (NB) with Bernoulli, Gaussian, and Multinomial variants; logistic regression (LR); multilayer perceptron (MLP); stochastic gradient descent (SGD); and k-nearest neighbors (KNN). Additionally, we evaluated three heterogeneous stacking ensembles: Stack-DT, Stack-SVM, and Stack-LR, which were constructed using the aforementioned individual models as base models. We utilized gain ratio (GR) to identify the most important technical and social features for predicting merge conflicts and assessed the impact of using only these important features on the performance of both individual and stacking models. The study revealed variability in the performance of individual models, with DT demonstrating the best predictive performance among them. Heterogeneous stacking ensembles demonstrated potential to enhance merge conflict prediction, with Stack-SVM emerging as the top-performing model. GR analysis highlighted the importance of both social and technical features in predicting merge conflicts. However, using only the most important features identified by GR led to a decline in the performance of most models compared to using all features. Heterogeneous stacking ensembles significantly improve prediction performance over individual models. Both social and technical features are important in predicting merge conflicts, and utilizing the full set of features instead of only the most important ones generally yields better results.

Abstract Image

查看原文本刊更多论文

使用特征选择和堆叠异构集成的合并冲突预测：一个实证研究

当多个开发人员同时修改代码库的同一部分并试图合并他们的更改时，就会出现合并冲突。这些冲突的发生是因为版本控制系统（VCS）不能自动确定哪些更改应该优先处理。解决这样的冲突需要手动检查冲突的变更，并决定如何集成它们以维护一个功能性和一致的代码库。这个过程通常很耗时，很复杂，而且容易出错。因此，软件工程社区关注于预测合并冲突，以便尽早警告开发人员，并允许他们在冲突升级之前解决冲突。尽管在预测合并冲突方面做出了一些努力，但还没有找到完美的解决方案。幸运的是，许多机器学习技术已经证明了在各种情况下提高预测性能的潜力。本研究旨在实证研究异构集成叠加在提高合并冲突预测性能方面的有效性。我们对以下模型的预测性能进行了实证比较：决策树（DT）；线性核支持向量机；具有伯努利、高斯和多项变量的朴素贝叶斯（NB）；逻辑回归；多层感知器（MLP）；随机梯度下降法；和k近邻（KNN）。此外，我们评估了三种异构堆叠集成：Stack-DT， Stack-SVM和Stack-LR，它们是使用上述单个模型作为基础模型构建的。我们利用增益比（GR）来识别预测合并冲突的最重要的技术和社会特征，并评估仅使用这些重要特征对单个和堆叠模型性能的影响。该研究揭示了个体模型性能的可变性，其中DT显示出最佳的预测性能。异构堆叠集成显示出增强合并冲突预测的潜力，其中堆叠支持向量机成为表现最好的模型。GR分析强调了社会特征和技术特征在预测合并冲突中的重要性。然而，与使用所有特征相比，只使用GR识别的最重要的特征会导致大多数模型的性能下降。异质叠加集成显著提高了单个模型的预测性能。在预测合并冲突时，社会特征和技术特征都很重要，利用完整的特征集而不是只利用最重要的特征集通常会产生更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Software-Evolution and Process COMPUTER SCIENCE, SOFTWARE ENGINEERING-

自引率

10.00%

发文量

109