Project Features That Make Machine-Learning Based Fault Proneness Analysis Successful

2022 IEEE 29th Annual Software Technology Conference (STC) Pub Date : 2022-10-01 DOI:10.1109/STC55697.2022.00018

Marios Grigoriou, K. Kontogiannis

{"title":"Project Features That Make Machine-Learning Based Fault Proneness Analysis Successful","authors":"Marios Grigoriou, K. Kontogiannis","doi":"10.1109/STC55697.2022.00018","DOIUrl":null,"url":null,"abstract":"Over the past years, we have witnessed the extensive use of various software fault proneness prediction techniques utilizing machine learning. These techniques use data from multiple sources representing various facets of the software systems being investigated. In spite of the complexity and performance of all such techniques and approaches proposed by the research community, we cannot yet expertly reason on the features which may render a software system a good or bad candidate for their application. In this paper, we build on the corpus of established machine learning approaches, and we perform an evaluation of system-wide process metrics versus the results acquired by the indiscriminate application of a published best set of classifiers. More specifically, we analyze the fault proneness prediction results obtained by applying a combination of the best classifiers and file features to 207 open source projects in order to identify which project features make a system suitable for Machine Learning based fault proneness analysis or not. Based on this analysis, we propose a meta-evaluator of the overall nature of a system that can be used to gauge in advance the performance that can be expected when applying the selected technique in terms of the key performance measures namely: Accuracy, Fl-measure, Precision, Recall and ROC-AUC.","PeriodicalId":170123,"journal":{"name":"2022 IEEE 29th Annual Software Technology Conference (STC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th Annual Software Technology Conference (STC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STC55697.2022.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Over the past years, we have witnessed the extensive use of various software fault proneness prediction techniques utilizing machine learning. These techniques use data from multiple sources representing various facets of the software systems being investigated. In spite of the complexity and performance of all such techniques and approaches proposed by the research community, we cannot yet expertly reason on the features which may render a software system a good or bad candidate for their application. In this paper, we build on the corpus of established machine learning approaches, and we perform an evaluation of system-wide process metrics versus the results acquired by the indiscriminate application of a published best set of classifiers. More specifically, we analyze the fault proneness prediction results obtained by applying a combination of the best classifiers and file features to 207 open source projects in order to identify which project features make a system suitable for Machine Learning based fault proneness analysis or not. Based on this analysis, we propose a meta-evaluator of the overall nature of a system that can be used to gauge in advance the performance that can be expected when applying the selected technique in terms of the key performance measures namely: Accuracy, Fl-measure, Precision, Recall and ROC-AUC.

查看原文本刊更多论文

使基于机器学习的故障倾向分析成功的项目特征

在过去的几年里，我们见证了各种利用机器学习的软件故障倾向预测技术的广泛使用。这些技术使用来自多个数据源的数据，这些数据源表示正在研究的软件系统的各个方面。尽管研究团体提出的所有这些技术和方法的复杂性和性能，我们还不能熟练地推断出可能使软件系统成为其应用程序的好或坏候选的特征。在本文中，我们建立在已建立的机器学习方法的语料库上，并对系统范围的过程度量与不加区分地应用已发布的最佳分类器集所获得的结果进行评估。更具体地说，我们分析了通过将最佳分类器和文件特征组合应用于207个开源项目获得的故障倾向性预测结果，以确定哪些项目特征使系统适合基于机器学习的故障倾向性分析。基于这一分析，我们提出了一个系统整体性质的元评估器，可用于提前衡量应用所选技术时在关键性能指标方面的预期性能，即:准确性，fll -measure，精密度，召回率和ROC-AUC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 29th Annual Software Technology Conference (STC)

自引率

0.00%

发文量