Alfredo Daza , Gonzalo Apaza-Perez , Katherine Samanez-Torres , Juan Benites-Noriega , Orlando Llanos Gonzales , Pablo Cesar Condori-Cutipa
{"title":"人工智能在软件缺陷预测中的工业应用:系统回顾、挑战和未来工作","authors":"Alfredo Daza , Gonzalo Apaza-Perez , Katherine Samanez-Torres , Juan Benites-Noriega , Orlando Llanos Gonzales , Pablo Cesar Condori-Cutipa","doi":"10.1016/j.compeleceng.2025.110411","DOIUrl":null,"url":null,"abstract":"<div><div>Software defect prediction is a constant challenge in industrial software engineering and represents a significant problem for quality and cost in software development worldwide<strong>.</strong> The purpose of this study is to gain a deeper understanding of the quartiles, countries, keywords, techniques, metrics, tools, platforms or languages, variables, data sources, and datasets used in software defect prediction. A comprehensive search of 45 articles from 2019 to 2023, using 5 databases (Scopus, ProQuest, ScienceDirect, EBSCOhost, and Web of Science), was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology. Results show that 60.00 % of the studies were carried out in 2023, and 68.89 % of journals were in the Q1 and Q2 quartiles. The most common techniques were Support Vector Machine (42.22 %) and Random Forest (35.56 %). The most commonly used evaluation metrics were Accuracy and F1-Score (68.89 %). Python was the main programming language (35.56 %), with Kilo (thousands) of lines of code (31.11 %) and Cyclomatic complexity (26.67 %) as key variables. Finally, NASA's Metrics Data Program Data Repository was the most used data source (31.11 %) with a dataset ranging from a minimum of 759 instances and 37 attributes to a maximum of 3579 instances and 38 attributes from 5 projects: CM1, MW1, PC1, PC3, and PC4. This systematic review provides scientific evidence on how machine learning algorithms aid in predicting software defects and improving development processes. In addition, it offers a detailed discussion by identifying trends, limitations, successful approaches, and areas for improvement, providing valuable recommendations for future research.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"124 ","pages":"Article 110411"},"PeriodicalIF":4.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Industrial applications of artificial intelligence in software defects prediction: Systematic review, challenges, and future works\",\"authors\":\"Alfredo Daza , Gonzalo Apaza-Perez , Katherine Samanez-Torres , Juan Benites-Noriega , Orlando Llanos Gonzales , Pablo Cesar Condori-Cutipa\",\"doi\":\"10.1016/j.compeleceng.2025.110411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Software defect prediction is a constant challenge in industrial software engineering and represents a significant problem for quality and cost in software development worldwide<strong>.</strong> The purpose of this study is to gain a deeper understanding of the quartiles, countries, keywords, techniques, metrics, tools, platforms or languages, variables, data sources, and datasets used in software defect prediction. A comprehensive search of 45 articles from 2019 to 2023, using 5 databases (Scopus, ProQuest, ScienceDirect, EBSCOhost, and Web of Science), was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology. Results show that 60.00 % of the studies were carried out in 2023, and 68.89 % of journals were in the Q1 and Q2 quartiles. The most common techniques were Support Vector Machine (42.22 %) and Random Forest (35.56 %). The most commonly used evaluation metrics were Accuracy and F1-Score (68.89 %). Python was the main programming language (35.56 %), with Kilo (thousands) of lines of code (31.11 %) and Cyclomatic complexity (26.67 %) as key variables. Finally, NASA's Metrics Data Program Data Repository was the most used data source (31.11 %) with a dataset ranging from a minimum of 759 instances and 37 attributes to a maximum of 3579 instances and 38 attributes from 5 projects: CM1, MW1, PC1, PC3, and PC4. This systematic review provides scientific evidence on how machine learning algorithms aid in predicting software defects and improving development processes. In addition, it offers a detailed discussion by identifying trends, limitations, successful approaches, and areas for improvement, providing valuable recommendations for future research.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"124 \",\"pages\":\"Article 110411\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790625003544\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625003544","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
软件缺陷预测是工业软件工程中不断面临的挑战,也是影响软件开发质量和成本的重要问题。本研究的目的是更深入地了解软件缺陷预测中使用的四分位数、国家、关键词、技术、度量、工具、平台或语言、变量、数据源和数据集。使用5个数据库(Scopus、ProQuest、ScienceDirect、EBSCOhost和Web of Science)对2019年至2023年的45篇文章进行了全面检索,采用PRISMA(系统评价和元分析的首选报告项目)方法。结果显示,60.00 %的研究是在2023年进行的,68.89%的期刊在Q1和Q2四分位数。最常用的技术是支持向量机(42.22%)和随机森林(35.56%)。最常用的评价指标是准确率和F1-Score(68.89%)。Python是主要的编程语言(35.56%),关键变量是千行代码(31.11%)和圈复杂度(26.67%)。最后,NASA的Metrics数据计划数据存储库是最常用的数据源(31.11%),其数据集范围从最少759个实例和37个属性到最多3579个实例和38个属性,来自5个项目:CM1, MW1, PC1, PC3和PC4。这个系统的回顾提供了关于机器学习算法如何帮助预测软件缺陷和改进开发过程的科学证据。此外,它还通过确定趋势、限制、成功的方法和改进的领域提供了详细的讨论,为未来的研究提供了有价值的建议。
Industrial applications of artificial intelligence in software defects prediction: Systematic review, challenges, and future works
Software defect prediction is a constant challenge in industrial software engineering and represents a significant problem for quality and cost in software development worldwide. The purpose of this study is to gain a deeper understanding of the quartiles, countries, keywords, techniques, metrics, tools, platforms or languages, variables, data sources, and datasets used in software defect prediction. A comprehensive search of 45 articles from 2019 to 2023, using 5 databases (Scopus, ProQuest, ScienceDirect, EBSCOhost, and Web of Science), was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology. Results show that 60.00 % of the studies were carried out in 2023, and 68.89 % of journals were in the Q1 and Q2 quartiles. The most common techniques were Support Vector Machine (42.22 %) and Random Forest (35.56 %). The most commonly used evaluation metrics were Accuracy and F1-Score (68.89 %). Python was the main programming language (35.56 %), with Kilo (thousands) of lines of code (31.11 %) and Cyclomatic complexity (26.67 %) as key variables. Finally, NASA's Metrics Data Program Data Repository was the most used data source (31.11 %) with a dataset ranging from a minimum of 759 instances and 37 attributes to a maximum of 3579 instances and 38 attributes from 5 projects: CM1, MW1, PC1, PC3, and PC4. This systematic review provides scientific evidence on how machine learning algorithms aid in predicting software defects and improving development processes. In addition, it offers a detailed discussion by identifying trends, limitations, successful approaches, and areas for improvement, providing valuable recommendations for future research.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.