{"title":"Project Features That Make Machine-Learning Based Fault Proneness Analysis Successful","authors":"Marios Grigoriou, K. Kontogiannis","doi":"10.1109/STC55697.2022.00018","DOIUrl":null,"url":null,"abstract":"Over the past years, we have witnessed the extensive use of various software fault proneness prediction techniques utilizing machine learning. These techniques use data from multiple sources representing various facets of the software systems being investigated. In spite of the complexity and performance of all such techniques and approaches proposed by the research community, we cannot yet expertly reason on the features which may render a software system a good or bad candidate for their application. In this paper, we build on the corpus of established machine learning approaches, and we perform an evaluation of system-wide process metrics versus the results acquired by the indiscriminate application of a published best set of classifiers. More specifically, we analyze the fault proneness prediction results obtained by applying a combination of the best classifiers and file features to 207 open source projects in order to identify which project features make a system suitable for Machine Learning based fault proneness analysis or not. Based on this analysis, we propose a meta-evaluator of the overall nature of a system that can be used to gauge in advance the performance that can be expected when applying the selected technique in terms of the key performance measures namely: Accuracy, Fl-measure, Precision, Recall and ROC-AUC.","PeriodicalId":170123,"journal":{"name":"2022 IEEE 29th Annual Software Technology Conference (STC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th Annual Software Technology Conference (STC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STC55697.2022.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Over the past years, we have witnessed the extensive use of various software fault proneness prediction techniques utilizing machine learning. These techniques use data from multiple sources representing various facets of the software systems being investigated. In spite of the complexity and performance of all such techniques and approaches proposed by the research community, we cannot yet expertly reason on the features which may render a software system a good or bad candidate for their application. In this paper, we build on the corpus of established machine learning approaches, and we perform an evaluation of system-wide process metrics versus the results acquired by the indiscriminate application of a published best set of classifiers. More specifically, we analyze the fault proneness prediction results obtained by applying a combination of the best classifiers and file features to 207 open source projects in order to identify which project features make a system suitable for Machine Learning based fault proneness analysis or not. Based on this analysis, we propose a meta-evaluator of the overall nature of a system that can be used to gauge in advance the performance that can be expected when applying the selected technique in terms of the key performance measures namely: Accuracy, Fl-measure, Precision, Recall and ROC-AUC.