An assessment of heterogenous ensemble classifiers for analyzing change-proneness in open-source software systems

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Software-Evolution and Process Pub Date : 2024-02-24 DOI:10.1002/smr.2660

Megha Khanna, Ankita Bansal

{"title":"An assessment of heterogenous ensemble classifiers for analyzing change-proneness in open-source software systems","authors":"Megha Khanna, Ankita Bansal","doi":"10.1002/smr.2660","DOIUrl":null,"url":null,"abstract":"<p>Software managers constantly look out for methods that ensure cost effective development of good quality software products. An important means of accomplishing this is by allocating more resources to weak classes of a software product, which are prone to changes. Therefore, correct prediction of these change-prone classes is critical. Though various researchers have investigated the performance of several algorithms for identifying them, the search for an optimum classifier still persists. To this end, this study critically investigates the use of six Heterogenous Ensemble Classifiers (HEC) for Software Change Prediction (SCP) by empirically validating datasets obtained from 12 open-source software systems. The results of the study are statistically assessed using three robust performance indicators (AUC, F-measure and Mathew Correlation Coefficient) in two different validation scenarios (within project and cross-project). They indicate the superiority of Average Probability Voting Ensemble, a heterogenous classifier for determining change-proneness in the investigated systems. The average AUC values of software change prediction models developed using this ensemble classifier exhibited an improvement of 3%-9% and 3%-11% respectively when compared with its base learners and homogeneous counter parts. Similar observations were inferred using other investigated performance measures. Furthermore, the evidence obtained from the results suggests that the change in number of base learners or type of meta-learner does not exhibit significant change in the performance of corresponding heterogenous ensemble classifiers.</p>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"36 8","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.2660","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software managers constantly look out for methods that ensure cost effective development of good quality software products. An important means of accomplishing this is by allocating more resources to weak classes of a software product, which are prone to changes. Therefore, correct prediction of these change-prone classes is critical. Though various researchers have investigated the performance of several algorithms for identifying them, the search for an optimum classifier still persists. To this end, this study critically investigates the use of six Heterogenous Ensemble Classifiers (HEC) for Software Change Prediction (SCP) by empirically validating datasets obtained from 12 open-source software systems. The results of the study are statistically assessed using three robust performance indicators (AUC, F-measure and Mathew Correlation Coefficient) in two different validation scenarios (within project and cross-project). They indicate the superiority of Average Probability Voting Ensemble, a heterogenous classifier for determining change-proneness in the investigated systems. The average AUC values of software change prediction models developed using this ensemble classifier exhibited an improvement of 3%-9% and 3%-11% respectively when compared with its base learners and homogeneous counter parts. Similar observations were inferred using other investigated performance measures. Furthermore, the evidence obtained from the results suggests that the change in number of base learners or type of meta-learner does not exhibit significant change in the performance of corresponding heterogenous ensemble classifiers.

Abstract Image

查看原文本刊更多论文

评估用于分析开源软件系统易变性的异源集合分类器

软件管理者一直在寻找各种方法，以确保经济高效地开发出高质量的软件产品。实现这一目标的一个重要手段就是为软件产品中容易发生变化的薄弱类分配更多资源。因此，正确预测这些易变类至关重要。虽然不同的研究人员已经研究了几种识别算法的性能，但对最佳分类器的探索仍在继续。为此，本研究通过对从 12 个开源软件系统中获取的数据集进行经验验证，批判性地研究了六种异源集合分类器（HEC）在软件变更预测（SCP）中的应用。在两种不同的验证场景（项目内和跨项目）中，使用三个稳健的性能指标（AUC、F-measure 和 Mathew 相关系数）对研究结果进行了统计评估。结果表明，平均概率投票合集这种异质分类器在确定所研究系统的易变性方面具有优势。使用这种集合分类器开发的软件变更预测模型的平均 AUC 值与基础学习器和同质分类器相比，分别提高了 3%-9% 和 3%-11%。使用其他调查性能指标也得出了类似的结论。此外，从结果中获得的证据表明，基础学习器数量或元学习器类型的变化不会对相应的异质集合分类器的性能产生显著影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Software-Evolution and Process COMPUTER SCIENCE, SOFTWARE ENGINEERING-

自引率

10.00%

发文量

109