一个人的失败，许多人的失败:缺陷预测软件特性的探索性研究

2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2020-09-01 DOI:10.1109/SCAM51674.2020.00016

G. E. D. Santos, Eduardo Figueiredo

{"title":"一个人的失败，许多人的失败:缺陷预测软件特性的探索性研究","authors":"G. E. D. Santos, Eduardo Figueiredo","doi":"10.1109/SCAM51674.2020.00016","DOIUrl":null,"url":null,"abstract":"Software defect prediction represents an area of interest in both academia and the software industry. Thus, software defects are prevalent in software development and might generate numerous difficulties for users and developers apart. The current literature offers multiple alternative approaches to predict the likelihood of defects in the source code. Most of these studies concentrate on predicting defects from a broad set of software features. As a result, the individual discriminating power of software features is still unknown as some perform well only with specific projects or metrics. In this study, we applied machine learning techniques in a popular dataset. This data has information about software defects in five Java projects, containing 5,371 classes and 37 software features. To this aim, we convey an exploratory investigation that produced hundreds of thousands of machine learning models from a diverse collection of software features. These models are random in the sense that they promptly select the features from the entire pool of features. Even though the immense majority of models are ineffective, we could produce several models that yield accurate predictions, thus classifying defects from Java project classes. Among these accurate models, our results indicate that change metric features are more present than entropy or class-level metrics. We concentrated our analysis on models that rank a randomly chosen defective class higher than a casually selected clean class with over 80% accuracy. We also report and discuss some features contributing to the explanation of model decisions. Therefore, our study promotes reasoning on which features support predicting defects in these projects. Finally, we present the implications of our work to practitioners.","PeriodicalId":410351,"journal":{"name":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Failure of One, Fall of Many: An Exploratory Study of Software Features for Defect Prediction\",\"authors\":\"G. E. D. Santos, Eduardo Figueiredo\",\"doi\":\"10.1109/SCAM51674.2020.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software defect prediction represents an area of interest in both academia and the software industry. Thus, software defects are prevalent in software development and might generate numerous difficulties for users and developers apart. The current literature offers multiple alternative approaches to predict the likelihood of defects in the source code. Most of these studies concentrate on predicting defects from a broad set of software features. As a result, the individual discriminating power of software features is still unknown as some perform well only with specific projects or metrics. In this study, we applied machine learning techniques in a popular dataset. This data has information about software defects in five Java projects, containing 5,371 classes and 37 software features. To this aim, we convey an exploratory investigation that produced hundreds of thousands of machine learning models from a diverse collection of software features. These models are random in the sense that they promptly select the features from the entire pool of features. Even though the immense majority of models are ineffective, we could produce several models that yield accurate predictions, thus classifying defects from Java project classes. Among these accurate models, our results indicate that change metric features are more present than entropy or class-level metrics. We concentrated our analysis on models that rank a randomly chosen defective class higher than a casually selected clean class with over 80% accuracy. We also report and discuss some features contributing to the explanation of model decisions. Therefore, our study promotes reasoning on which features support predicting defects in these projects. Finally, we present the implications of our work to practitioners.\",\"PeriodicalId\":410351,\"journal\":{\"name\":\"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM51674.2020.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM51674.2020.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

软件缺陷预测是学术界和软件业都感兴趣的一个领域。因此，软件缺陷在软件开发中很普遍，并且可能会给用户和开发人员带来许多困难。目前的文献提供了多种替代方法来预测源代码中缺陷的可能性。这些研究大多集中在从广泛的软件特性集预测缺陷上。因此，软件特性的个体辨别能力仍然是未知的，因为有些特性只在特定的项目或度量标准中表现良好。在这项研究中，我们将机器学习技术应用于一个流行的数据集。该数据包含五个Java项目中的软件缺陷信息，包含5371个类和37个软件特性。为此，我们进行了一项探索性调查，从不同的软件功能集合中产生了数十万个机器学习模型。从某种意义上说，这些模型是随机的，它们迅速地从整个特征池中选择特征。尽管绝大多数模型都是无效的，但是我们可以生成一些模型，这些模型可以产生准确的预测，从而从Java项目类中对缺陷进行分类。在这些精确的模型中，我们的结果表明，变化度量特征比熵或类级别度量更存在。我们将分析集中在对随机选择的有缺陷类别的排名高于随机选择的干净类别的模型上，准确率超过80%。我们还报告和讨论了一些有助于解释模型决策的特征。因此，我们的研究促进了对哪些特性支持预测这些项目中的缺陷的推理。最后，我们提出了我们的工作对从业者的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Failure of One, Fall of Many: An Exploratory Study of Software Features for Defect Prediction

Software defect prediction represents an area of interest in both academia and the software industry. Thus, software defects are prevalent in software development and might generate numerous difficulties for users and developers apart. The current literature offers multiple alternative approaches to predict the likelihood of defects in the source code. Most of these studies concentrate on predicting defects from a broad set of software features. As a result, the individual discriminating power of software features is still unknown as some perform well only with specific projects or metrics. In this study, we applied machine learning techniques in a popular dataset. This data has information about software defects in five Java projects, containing 5,371 classes and 37 software features. To this aim, we convey an exploratory investigation that produced hundreds of thousands of machine learning models from a diverse collection of software features. These models are random in the sense that they promptly select the features from the entire pool of features. Even though the immense majority of models are ineffective, we could produce several models that yield accurate predictions, thus classifying defects from Java project classes. Among these accurate models, our results indicate that change metric features are more present than entropy or class-level metrics. We concentrated our analysis on models that rank a randomly chosen defective class higher than a casually selected clean class with over 80% accuracy. We also report and discuss some features contributing to the explanation of model decisions. Therefore, our study promotes reasoning on which features support predicting defects in these projects. Finally, we present the implications of our work to practitioners.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)

自引率

0.00%

发文量