用于软件缺陷预测的可解释AI框架

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Software-Evolution and Process Pub Date : 2025-04-13 DOI:10.1002/smr.70018

Bahar Gezici Geçer, Ayça Kolukısa Tarhan

{"title":"用于软件缺陷预测的可解释AI框架","authors":"Bahar Gezici Geçer, Ayça Kolukısa Tarhan","doi":"10.1002/smr.70018","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Software engineering plays a critical role in improving the quality of software systems, because identifying and correcting defects is one of the most expensive tasks in software development life cycle. For instance, determining whether a software product still has defects before distributing it is crucial. The customer's confidence in the software product will decline if the defects are discovered after it has been deployed. Machine learning-based techniques for predicting software defects have lately started to yield encouraging results. The software defect prediction system's prediction results are raised by machine learning models. More accurate models tend to be more complicated, which makes them harder to interpret. As the rationale behind machine learning models' decisions are obscure, it is challenging to employ them in actual production. In this study, we employ five different machine learning models which are random forest (RF), gradient boosting (GB), naive Bayes (NB), multilayer perceptron (MLP), and neural network (NN) to predict software defects and also provide an explainable artificial intelligence (XAI) framework to both locally and globally increase openness throughout the machine learning pipeline. While global explanations identify general trends and feature importance, local explanations provide insights into individual instances, and their combination allows for a holistic understanding of the model. This is accomplished through the utilization of Explainable AI algorithms, which aim to reduce the “black-boxiness” of ML models by explaining the reasoning behind a prediction. The explanations provide quantifiable information about the characteristics that affect defect prediction. These justifications are produced using six XAI methods, namely, SHAP, anchor, ELI5, LIME, partial dependence plot (PDP), and ProtoDash. We use the KC2 dataset to apply these methods to the software defect prediction (SDP) system, and provide and discuss the results.</p>\n </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 4","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Explainable AI Framework for Software Defect Prediction\",\"authors\":\"Bahar Gezici Geçer, Ayça Kolukısa Tarhan\",\"doi\":\"10.1002/smr.70018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Software engineering plays a critical role in improving the quality of software systems, because identifying and correcting defects is one of the most expensive tasks in software development life cycle. For instance, determining whether a software product still has defects before distributing it is crucial. The customer's confidence in the software product will decline if the defects are discovered after it has been deployed. Machine learning-based techniques for predicting software defects have lately started to yield encouraging results. The software defect prediction system's prediction results are raised by machine learning models. More accurate models tend to be more complicated, which makes them harder to interpret. As the rationale behind machine learning models' decisions are obscure, it is challenging to employ them in actual production. In this study, we employ five different machine learning models which are random forest (RF), gradient boosting (GB), naive Bayes (NB), multilayer perceptron (MLP), and neural network (NN) to predict software defects and also provide an explainable artificial intelligence (XAI) framework to both locally and globally increase openness throughout the machine learning pipeline. While global explanations identify general trends and feature importance, local explanations provide insights into individual instances, and their combination allows for a holistic understanding of the model. This is accomplished through the utilization of Explainable AI algorithms, which aim to reduce the “black-boxiness” of ML models by explaining the reasoning behind a prediction. The explanations provide quantifiable information about the characteristics that affect defect prediction. These justifications are produced using six XAI methods, namely, SHAP, anchor, ELI5, LIME, partial dependence plot (PDP), and ProtoDash. We use the KC2 dataset to apply these methods to the software defect prediction (SDP) system, and provide and discuss the results.</p>\\n </div>\",\"PeriodicalId\":48898,\"journal\":{\"name\":\"Journal of Software-Evolution and Process\",\"volume\":\"37 4\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Software-Evolution and Process\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/smr.70018\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.70018","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

软件工程在提高软件系统质量方面起着至关重要的作用，因为识别和纠正缺陷是软件开发生命周期中最昂贵的任务之一。例如，在发布软件产品之前确定它是否仍然存在缺陷是至关重要的。如果在软件产品部署之后才发现缺陷，那么客户对软件产品的信心将会下降。用于预测软件缺陷的基于机器学习的技术最近开始产生令人鼓舞的结果。软件缺陷预测系统的预测结果是通过机器学习模型提出的。更精确的模型往往更复杂，这使得它们更难以解释。由于机器学习模型决策背后的基本原理是模糊的，因此在实际生产中使用它们是具有挑战性的。在本研究中，我们采用五种不同的机器学习模型，分别是随机森林（RF）、梯度增强（GB）、朴素贝叶斯（NB）、多层感知器（MLP）和神经网络（NN）来预测软件缺陷，并提供一个可解释的人工智能（XAI）框架，以在本地和全球范围内增加整个机器学习管道的开放性。虽然全局解释确定了总体趋势和特征的重要性，但局部解释提供了对单个实例的见解，并且它们的组合允许对模型进行整体理解。这是通过使用可解释的人工智能算法来实现的，该算法旨在通过解释预测背后的推理来减少机器学习模型的“黑盒性”。这些解释提供了有关影响缺陷预测的特性的可量化信息。这些证明是使用六种XAI方法生成的，即SHAP、anchor、ELI5、LIME、部分依赖图（PDP）和ProtoDash。利用KC2数据集将这些方法应用于软件缺陷预测（SDP）系统，并给出了结果并进行了讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Explainable AI Framework for Software Defect Prediction

查看原文本刊更多论文

Explainable AI Framework for Software Defect Prediction

Software engineering plays a critical role in improving the quality of software systems, because identifying and correcting defects is one of the most expensive tasks in software development life cycle. For instance, determining whether a software product still has defects before distributing it is crucial. The customer's confidence in the software product will decline if the defects are discovered after it has been deployed. Machine learning-based techniques for predicting software defects have lately started to yield encouraging results. The software defect prediction system's prediction results are raised by machine learning models. More accurate models tend to be more complicated, which makes them harder to interpret. As the rationale behind machine learning models' decisions are obscure, it is challenging to employ them in actual production. In this study, we employ five different machine learning models which are random forest (RF), gradient boosting (GB), naive Bayes (NB), multilayer perceptron (MLP), and neural network (NN) to predict software defects and also provide an explainable artificial intelligence (XAI) framework to both locally and globally increase openness throughout the machine learning pipeline. While global explanations identify general trends and feature importance, local explanations provide insights into individual instances, and their combination allows for a holistic understanding of the model. This is accomplished through the utilization of Explainable AI algorithms, which aim to reduce the “black-boxiness” of ML models by explaining the reasoning behind a prediction. The explanations provide quantifiable information about the characteristics that affect defect prediction. These justifications are produced using six XAI methods, namely, SHAP, anchor, ELI5, LIME, partial dependence plot (PDP), and ProtoDash. We use the KC2 dataset to apply these methods to the software defect prediction (SDP) system, and provide and discuss the results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Software-Evolution and Process COMPUTER SCIENCE, SOFTWARE ENGINEERING-

自引率

10.00%

发文量

109